Local-first, AI-powered meeting summarization platform.
Built for privacy, speed, and modularity using FastAPI, Next.js, MLX (Whisper), vLLM, PostgreSQL, and MongoDB.
The system follows a Polyglot Persistence architecture, decoupling metadata management from high-volume text data.
-
Meeting Capture
- Frontend: A Next.js dashboard connects to the Backend.
- Bot Service: Uses MeetingBaas API to spawn a bot that joins Google Meet/Zoom/Teams calls.
- Audio Stream: The bot streams raw audio (16kHz PCM) via WebSocket to the FastAPI Gateway.
-
Audio Processing Pipeline
- Gateway: Buffers audio into 30-second chunks.
- Transcription Worker:
- Consumes audio chunks.
- Runs Whisper Large v3 Turbo locally via MLX.
- Pushes raw text to MongoDB (
transcriptscollection) and notifies the frontend via SSE.
-
Intelligence Layer
- Intelligence Worker:
- Monitors new transcripts for active sessions.
- Aggregates context (sliding window).
- Queries a local Qwen 2.5 (1.5B) LLM via vLLM for summarization and action item extraction.
- Saves insights to MongoDB (
summariescollection) and notifies the frontend via SSE.
- Intelligence Worker:
-
Data Persistence Layer
- PostgreSQL: Stores relational metadata (Sessions, Bot IDs, Meeting Status).
- MongoDB: Stores unstructured, high-volume data (Transcripts, Append-only Summaries).
-
Real-Time UI
- The Dashboard listens to Server-Sent Events (SSE).
- Updates are filtered by
session_idto show only relevant live data.
- Frontend: Next.js 14, TailwindCSS, Framer Motion, SSE.
- Backend API: FastAPI, WebSockets (
/stream), SSE (/events). - AI/ML:
- ASR:
mlx-whisper(Optimized for Apple Silicon). - LLM:
vLLMservingQwen/Qwen2.5-1.5B-Instruct.
- ASR:
- Database:
- PostgreSQL: Session Management.
- MongoDB: Content Storage.
- Python 3.11+
- Node.js 18+
- Docker/Podman (for DBs) or local installs of Postgres & Mongo.
Ensure PostgreSQL and MongoDB are running.
# Example Docker command
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=password postgres
docker run -d -p 27017:27017 mongocd app
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Start vLLM Server (Separate Terminal)
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-1.5B-Instruct --port 8000
# Start Gateway
uvicorn main:app --reload --port 8008cd frontend
npm install
npm run devVisit http://localhost:3000 to start recording.
- Polyglot Persistence: chosen to optimize for relational integrity (sessions) vs. rapid write throughput (transcriptions).
- Event-Driven: Workers communicate via internal queues, decoupled from the API loop to prevent blocking.
- Local-First: All inference (ASR & LLM) runs on-device for privacy and zero latency.