The Modular Automation Engine for Academic Assessments, Powered by Learning Science.
Edmate Lab_QA is a headless, open-source service platform designed to transform unstructured educational materials (PDF, Excel, Docx) into high-fidelity, curriculum-aligned Q&A, explanations, and 3D flashcards.
Built on a "Plug & Play" architecture, it empowers teachers, publishers, and developers to serve external platforms using their own AI logic and API keys.
Edmate is a Content Factory Infrastructure. Its mission ends where the learner's experience begins.
- β IN-SCOPE: Source ingestion, AI generation (Q&A/Explanations), human-in-the-loop review, and DB/File persistence.
- β OUT-OF-SCOPE: Learner test-taking UI, live grading, student progress tracking, or proctoring.
- π‘οΈ Economic Kill-Switch: Real-time token tracking with automatic pipeline halts when daily USD budgets are reached.
- π§© Intelligence-Blind & BYOK: LLM-agnostic routing via LiteLLM. Support for 100+ providers. External platforms can Bring Your Own Key (BYOK) to dictate their own model selection and billing.
- πΎ Adapter-Driven Persistence: Swap between Postgres, Vector DBs, or JSON exports with zero changes to core logic.
- β‘ MCP Ready: Plug Edmate directly into Agentic IDEs (Cursor/Windsurf) as a native tool for instant content generation.
- π Automation Hub: A sleek, dark-mode dashboard for managing drafts, review workflows, and cost analytics.
- π‘οΈ High-Integrity (HIA) First: Specialized engine for generating AI-resilient assessments (AI Critique, Isomorphic Variants, Viva Prompts) that combat AI cheating.
- π€ Output adapters (teacher exports): From the Automation Hub, download any processed draft as JSON, CSV, compact Markdown (
.md), Markdown + images (.zipwithquestions.md+images/), or Word (.docx) β no database required.
Get Edmate running locally in seconds.
# 1. Clone & Install
git clone https://github.com/shmukit/Edmate.git
cd Edmate
pip install -r content_gen/requirements.txt
# 2. Configure (Set your keys)
cp content_gen/.env.example content_gen/.env
# 3. Workspace + routing (copy template, then edit)
cp edmate_config.yaml.example edmate_config.yaml
# 4. (Optional) PDF-Extract-Kit β only if extraction_settings.engine is pdf_extract_kit
./scripts/setup_pdf_extract_kit.sh
# Otherwise set extraction_settings.engine to vision or pymupdf in edmate_config.yamlBy default, large PDF files are ignored by git (
.gitignoreexcludessample.pdf) to prevent repository bloat. Please keep your heavy exam papers local to your machine!
Edmate is designed to be highly adaptable. Before processing your first PDF, define curriculums, target database tables, and optional defaults (workspace.default_subject, workspace.default_curriculum) in edmate_config.yaml or edmate_config.json.
Edmate is database-agnostic. While it includes a production-ready Postgres/Supabase adapter by default, you can use any database (MySQL, MongoDB, Firebase, etc.).
- Using Postgres/Supabase? Simply set your
DATABASE_URLincontent_gen/.env. - Using a different database? Edmate uses the Adapter Pattern. You can swap the persistence layer by implementing a new Storage Adapter in
content_gen/adapters/.
Navigate to the workspace section in edmate_config.yaml or edmate_config.json to tell Edmate which tables exist in your database:
workspace:
default_subject: "General"
default_curriculum: "General"
curriculums:
- "Your Custom Curriculum"
- "Standard Level"
target_tables:
- id: "questions" # Must match your actual DB table name
label: "Main Hub" # How it appears in the UI
- id: "biology_vault" # Add as many as you need
label: "Biology Bank"Caution
Database Schema Consistency: Edmate's database_service.py currently expects tables to have specific columns (e.g., title, options, correct_options). If your database uses different column names, you must update the SQL queries in content_gen/scripts/processing/database_service.py to match your schema.
The Automation Hub provides several "Admin" settings to handle diverse document formats:
- Extraction Guardrails: Adjust "Detection Mode" to Strict for standard papers or Open for noisy documents.
- Model Routing: Strategies to balance cost and quality. Edmate can use cheaper models (like Gemini Flash) for extraction and switch to high-precision models (like GPT-4o) for final content generation.
- Pedagogy Profiles: Choose profiles like
exam_preporbeginnerto change how the AI writes explanations and scaffolds content.
Start the FastAPI backend to access the drag-and-drop dashboard:
uvicorn qc_viewer.main:app --host 0.0.0.0 --port 8000Navigate to http://localhost:8000/automate in your browser.
Process a PDF headlessly via terminal. --subject is optional β it defaults to workspace.default_subject in edmate_config.yaml (falls back to General). Batch mode still requires --input-dir.
python3 content_gen/scripts/pipeline/pipeline_orchestrator.py \
--input-dir content_gen/data/inputs \
--output-dir content_gen/data/extracted \
--single-pdf path/to/your_paper.pdf
# Optional explicit subject label:
python3 content_gen/scripts/pipeline/pipeline_orchestrator.py \
--input-dir content_gen/data/inputs \
--output-dir content_gen/data/extracted \
--single-pdf path/to/your_paper.pdf \
--subject "Chemistry"If you are a developer looking to integrate Edmate directly into your own platform using our Bring Your Own Key (BYOK) architecture, you can interact with the API directly.
- Start the API Server:
uvicorn qc_viewer.main:app --host 0.0.0.0 --port 8000 - Interactive API Docs: Navigate to
http://localhost:8000/docsto view the auto-generated Swagger UI which interactively documents all available endpoints. - Python Example: Check out the fully runnable Python example at
examples/client_request.py. It demonstrates how to hit the/api/v1/extractendpoint, pass a provider-agnostic BYOK key via HTTP headers, and poll the job status until completion.
If a partner platform wants end-users to provide their own API key in the platform UI, use this pattern:
- User enters key in partner UI.
- Partner backend stores key securely (encrypted at rest / secret manager).
- Partner backend sends requests to Edmate with
X-API-Key. - Edmate processes the file and returns job status/results.
- Preferred:
X-API-Key - Backward-compatible:
X-Gemini-Key,X-OpenAI-Key
- Minimum (required for secure BYOK operation):
- API key input mapped to
X-API-Key
- API key input mapped to
- Recommended (materially changes output quality/behavior):
curriculumls_profilehia_modequestion_detection_modemin_question_numbermax_question_number
- Optional advanced:
X-LLM-Provider(provider-family preference)X-Model-ID(exact model pinning)
- Not recommended to expose yet (currently preview-only in local UI):
target_languagerouting_profile
- Direct server-to-server (recommended): Partner backend calls Edmate and injects
X-API-Keyper request. - Manual API call (no partner UI): Integrator sends multipart form +
X-API-Keydirectly from their backend/script.
curl -X POST "http://localhost:8000/api/v1/extract" \
-H "X-API-Key: $LITELLM_API_KEY" \
-F "file=@/path/to/paper.pdf" \
-F "curriculum=Cambridge O/Level" \
-F "subject=Biology"curl -X POST "http://localhost:8000/api/automate/draft" \
-H "X-API-Key: $LITELLM_API_KEY" \
-H "X-LLM-Provider: openai" \
-H "X-Model-ID: gpt-4o-mini" \
-F "file=@/path/to/paper.pdf" \
-F "subject=Biology" \
-F "paper_code=questions" \
-F "curriculum=Cambridge O/Level" \
-F "ls_profile=exam_prep" \
-F "hia_mode=High" \
-F "question_detection_mode=balanced" \
-F "min_question_number=1" \
-F "max_question_number=120"- Keep API keys on the server side (do not expose raw keys in browser logs or frontend bundles).
- Mask key input in UI and never return full keys in API responses.
- Avoid persisting keys in plain text; use encryption/secret vault where possible.
- Do not write keys to app logs, job metadata, or analytics events.
- CORS & BYOK: When calling Edmate from a different origin (e.g., your own dashboard), ensure your backend allows the custom headers (
X-API-Key, etc.). Edmate's default configuration is permissive for local development but must be restricted in production.
After a draft finishes processing, you can download it without publishing to Postgres.
- Where: Automation Hub at
/automate(e.g.http://localhost:8000/automate) β each draft card has Export (with a menu), and the Review overlay has the same Export menu. - Endpoint:
GET /api/automate/draft/{draft_id}/export?format=... - Formats:
jsonβ fullmetadata.json(includes diagram data URIs).csvβ one row per question; diagrams as adiagram_data_uricolumn.markdownormdβ readable Markdown; diagrams are not inlined as base64 (see blockquote notes); usemdzipor JSON/DOCX for images.mdzipβ ZIP containingquestions.md,README.txt, andimages/Q{n}.png|jpgfor PNG/JPEG diagrams.docxβ Microsoft Word document with embedded PNG/JPEG diagrams.
Example:
curl -f -o export.zip "http://localhost:8000/api/automate/draft/draft_abc12345/export?format=mdzip"Implementation: qc_viewer/services/draft_export.py and route in qc_viewer/routers/automation.py.
If you are integrating Edmate into a custom frontend (like a React or Vite app) and see CORS errors in your browser console:
Browsers block Access-Control-Allow-Origin: * if the request includes credentials (cookies/auth) or certain custom headers. Edmate handles this by reflecting the requesting origin automatically.
If you send custom headers like X-Gemini-Key, the browser will send an OPTIONS request first.
- Problem: In many frameworks, if the CORS middleware is registered after the routes, the router will return a
405 Method Not Allowedfor theOPTIONSrequest, causing a CORS error. - Solution: Edmate's
app_factory.pyensures CORS middleware is at the top of the stack. If you modify the codebase, never move the CORS middleware below the routers.
Ensure your client-side fetch/axios configuration explicitly allows these headers if your environment is restrictive:
X-API-Key, X-LLM-Provider, X-Model-ID, X-Gemini-Key, X-OpenAI-Key.
Edmate is built for extreme extensibility. It uses the Adapter Pattern to remain decoupled across all layers of the platform, from data ingestion to database schemas.
graph TD
%% 1. Ingestion
subgraph Input ["1. Multi-Modal Ingestion"]
A[PDF / Docx / Excel]
O[Pipeline Orchestrator]
A --> O
end
%% 2. Extraction Adapters
subgraph Extraction ["2. Extraction Layer (Adapters)"]
O -->|vision| V[VisionExtractionAdapter]
O -->|kit| K[KitExtractionAdapter]
O -->|lightweight| P[PyMuPDFAdapter]
V -->|Vision AI| S1[Structured Data]
K -->|Local ML| S1
P -->|Regex| S1
end
%% 3. Intelligence
subgraph Intelligence ["3. Generation Layer"]
R{LLM Router: BYOK}
S1 --> R
R --> G[ContentGenerator]
G -->|Explainers| O2[Enriched Output]
G -->|Flashcards| O2
G -->|HIA Logic| O2
end
%% 4. Storage
subgraph Persistence ["4. Persistence Layer"]
SA{Storage Adapters}
O2 --> SA
SA -->|Postgres| DB1[(Production DB)]
SA -->|JSON| DB3[Drafts]
end
style O fill:#fbbf24,stroke:#111827,color:#111827
style R fill:#fbbf24,stroke:#111827,color:#111827
style SA fill:#1e1b4b,stroke:#fbbf24,color:#fff
- Multi-Modal Ingestion (Input): Accepts Unstructured PDFs, Docx, and Excel/CSV files.
- Pluggable Extraction Engines:
- Vision (High-Fidelity): Multimodal LLMs "see" the page to capture complex layouts and diagrams.
- Kit (Local ML): Uses YOLO-based layout detection for local, GPU-accelerated extraction.
- Lightweight: Regex-based extraction for fast, CPU-only processing.
- Pedagogical Engine: Applies Learning Science techniques (like our HIA engine) dynamically during the generation stage.
- Curriculum Agnostic: Plug and play your specific curriculum format (e.g., GCSE A/O level, or any National Curriculum).
- Model Router (BYOK): Bring Your Own Key. Route tasks to any LLM supported by your configured provider/router.
- Multi-Tier Output Generation: Extracts simple raw content (Q/A, Diagrams, Tables as-is) alongside enriched metadata (rationales for right/wrong answers, concept gaps, and 3D flashcards).
| Path | Description |
|---|---|
content_gen/ |
The Brain: Core AI pipeline for ingestion, extraction, and content generation. |
qc_viewer/ |
The Heart: FastAPI backend and Automation Hub dashboard for review. |
docs/ |
Comprehensive documentation on system design, pedagogy, and brand. |
examples/ |
Client integration examples (BYOK usage, API polling). |
credentials/ |
Secure storage for optional cloud credentials (when needed). |
edmate_config.yaml / .json |
Global project configuration (models, budgets, engines). |
| Path | Description |
|---|---|
adapters/ |
Pluggable connectors for storage (Postgres, JSON) and extraction. |
core/ |
Internal logic: LLM routing, daily budgeting, and data schemas. |
data/ |
Pipeline workspace: inputs/, extracted/ text, and outputs/. |
scripts/ |
CLI entry points for orchestrating the full pipeline. |
tools/ |
Utility toolbox for PDF manipulation and image handling. |
tests/ |
Unit and integration tests for the generation engine. |
| Path | Description |
|---|---|
static/ |
Frontend assets (Vanilla HTML/JS/CSS) for the dashboard. |
drafts/ |
Persistence for human-in-the-loop content review tasks. |
jobs/ |
Real-time tracking for asynchronous background generation tasks. |
main.py |
Entry point for the FastAPI server. |
Edmate is committed to keeping its core engine free and open-source forever. We follow an Open Core model where the essential tools are free, while advanced institutional features are part of our Studio/Enterprise offerings.
| Feature | Community (Free) | Studio / Enterprise |
|---|---|---|
| Core AI Pipeline | β | β |
| PDF/Excel Ingestion | β | β |
| Standard Assessment (MCQ/TF) | β | β |
| High-Integrity Assessments (HIA) | β (Basic) | β (Advanced) |
| Custom Prompts | β | β |
| Collaboration & Teams | β | β |
| Advanced Institutional Analytics | β | β |
| Managed Cloud Hosting | β | β |
| SSO & RBAC | β | β |
In the era of Generative AI, traditional "recall-based" homework is becoming obsolete. Edmate's mission is to help teachers and platforms move toward Authentic Assessment β content designed to ensure students "lift the weights" of their own education.
Edmate's HIA engine generates:
- AI Critique Exercises: Students must find errors in deliberately flawed AI answers.
- Isomorphic Variants: Unique numerical/contextual versions of the same concept per student.
- Viva Defense Prompts: Structured probing questions for verbal reasoning verification.
- Scaffolded Sequences: Breaking single tasks into mandatory intellectual process steps.
We welcome contributions of all kinds! Whether it's a new Storage Adapter, an extraction prompt, or a bug fix.
- πΊοΈ Product Roadmap: Where we're going and how to help get there.
- π― Use Cases: How different users (Platforms vs. Teachers) adopt Edmate.
- π Contributing Guide: How to get started.
- π Code of Conduct: Our community standards.
- ποΈ Modular Architecture Guide: Deep dive for developers.
- π§ Pedagogy & Learning Science: The "How It Works" behind our content generation.
Licensed under the MIT License. See LICENSE.
Built with β€οΈ for an accessible, AI-powered education system.
