Skip to content

linuteresa/compendia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Compendia

Overview

Untitled design

Compendia is a web application designed to combat "doom scrolling" and information overload. Instead of mindlessly consuming content, Compendia helps users build a Personal Curriculum around a specific interest. You enter a topic, machine learning, ceramics, gardening, and Compendia builds a clear multi week curriculum around it.

Each plan is structured, intentional, and distraction free, using high quality open resources, filtering out shorts, reactions, and noise, so learning feels calm, focused, and meaningful.

How to Set-Up

Prerequisites

  • Node.js (includes npm)
  • Python 3.10+
  • MongoDB (local or hosted)

1) Clone and enter the repo

git clone "github.com/linuteresa/compendia"

2) Configure environment variables

Create a .env file at the repo root:

MONGODB_URI=<your-mongodb-connection-string>
PORT=5001

3) Install backend dependencies

python -m venv .venv
.\.venv\Scripts\activate
pip install -r backend\requirements.txt

4) Run the backend API

cd backend
python -m uvicorn main:app --reload --port 5001

The API will be available at http://localhost:5001.

5) Install frontend dependencies

cd ..\Client
npm install

6) Run the frontend

npm run dev

Open http://localhost:5173 in your browser.

Core Features

Topic-Driven Curriculum

  • Users provide a free text topic and optional context, which is normalized and expanded into core concepts using open and verifiable sources.
  • Curriculum depth is defined by Bloom’s Taxonomy, with adjacent cognitive levels combined into a single progression stage to control rigor.
  • The total number of weeks directly shapes scope and pacing, producing a structured syllabus grounded in the user’s input.

Intelligent Video Curation

  • Every video is a direct YouTube watch link that plays immediately.
  • Titles are compared for similarity to avoid near duplicate introductions and repetitive content.

Deep Reading Retrieval

  • Pulls readings from open web, high authority domains such as .edu sites, nih.gov, mit.edu, and britannica.com.
  • Explicitly excludes paywalled or gated academic aggregators like scribd, researchgate, and coursehero so every link is immediately accessible.

Pedagogical Scaling

Depth Levels (1-3): The curriculum adapts its complexity based on user selection. Bloom's Taxonomy Integration:

  • Level 1: Focuses on "Remember & Understand"
  • Level 2: Focuses on "Apply & Analyze"
  • Level 3: Focuses on "Evaluate & Create"

Backend Pipeline

Phase 1 Topic Intake and Seed Discovery

  • Input is the raw user topic string and the number of weeks.
  • The topic is normalized and tokenized to extract keywords.
  • MediaWiki API is queried to select a Seed Title, the most relevant Wikipedia page for the topic.
  • Related Wikipedia pages are added to a pool using targeted searches like topic overview and topic history.
  • The Seed Title is parsed for section headers to use as anchors for week naming.

Phase 2 Resource Retrieval and Filtering

  • YouTube fetcher queries the YouTube Data API when a key is available, and falls back to YouTube search HTML parsing on 403 or missing key.
  • Every video is stored as an exact watch URL and is deduped globally across the full curriculum.
  • Video titles are filtered with a cosine similarity threshold to avoid near duplicates.
  • Results are filtered for junk terms such as shorts, reaction, memes, and boosted for channels like MIT OpenCourseWare and StatQuest.
  • Open web readings are pulled from Wikipedia externallinks on the Seed Title and related pages.
  • External links are filtered through an allowlist of high authority domains and a blocklist for paywalls and academic aggregators, plus file type filters like pdf and ppt.
  • Each week includes one Wikipedia reading first, then open web links when available, with domain diversity enforced per week.

Phase 3 Curriculum Assembly and Output

  • Weeks are generated from the requested week count, with depth level controlling items per week, 1 2 or 3.
  • Weekly themes rotate through fixed phases like orientation, history, science, methods, practice, risks, policy, ethics, future, synthesis.
  • Bloom focus is set by depth level, with adjacent levels grouped, 1 is Remember and Understand, 2 is Apply and Analyze, 3 is Analyze Evaluate Create.
  • Output is a single JSON object with meta, a course summary paragraph, and a weeks array containing week title, 50 word summary, videos, and reading links.

Hackathon Scope

Primary Tech Stack: Python, Google Gemini, YouTube Data API.

Original Work: All curriculum logic and filtering algorithms were built during the hackathon period.

Goal: To demonstrate how AI can curate safe, educational pathways on the open web.

Demonstration Video

Watch the full walkthrough:Demo video

About

A custom app to set a study curriculum.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors