Full-stack RAG (Retrieval-Augmented Generation) chat application for Vanorak winter road maintenance equipment support. Built with Next.js (frontend) and FastAPI (backend), streaming responses over the Vercel AI SDK Data Stream Protocol.
| File | Audience |
|---|---|
docs/overview.md |
Client / management — what was built, results, next steps |
docs/documentation.md |
Technical — full requirements spec, design decisions, evaluation |
docs/system_architecture.md |
Technical — component diagram and data flow |
docs/technical_notes.md |
Developer — thresholds, design decisions, known limitations |
- Two query modes: General (direct LLM) and Technical (full RAG pipeline)
- Technical mode retrieves context via hybrid search — dense vector (Qdrant) + keyword (BM25) fused with RRF
- Streams assistant output token-by-token
- Returns source citations and a confidence score (0.0–1.0) with every Technical response
- Frontend: Next.js 16, React 19, TypeScript, Vercel AI SDK, Jotai
- Backend: FastAPI, Uvicorn, OpenAI SDK (gpt-4o-mini + text-embedding-3-small)
- Document DB: PostgreSQL (stores uploaded
txt,pdf, andcsvfiles plus extracted text) - Vector DB: Qdrant (Docker, local)
- Keyword search: rank_bm25 (BM25 Okapi, persisted as
.pkl) - PDF parsing: PyMuPDF
- Streaming: Server-Sent Events, Data Stream Protocol
src/app/ Next.js App Router entry
src/components/chat/ Chat UI, composer, mode toggle
src/store/chat.ts Jotai atom — chat mode state
api/index.py FastAPI route — mode routing (General / Technical)
api/routes/documents.py Document CRUD routes (PostgreSQL)
api/services/vector_sync.py Sync uploaded docs into Qdrant + BM25
api/db.py SQLAlchemy engine/session/bootstrap
api/utils/rag.py RAG pipeline orchestrator
api/utils/bm25.py BM25 index load/save/query
api/utils/embedder.py text-embedding-3-small query embedder
api/utils/retrieval.py Hybrid search (dense + BM25 + RRF)
api/utils/reranker.py Cross-encoder reranking
api/utils/confidence.py Confidence scoring (RRF score + LLM self-eval)
api/utils/generator.py Context formatting for the LLM prompt
api/utils/stream.py SSE stream formatting
generation/ Synthetic corpus generators and outputs
scripts/bulk_upload.py Uploads all corpus files to the CMS API
docs/ Documentation (architecture, implementation notes, full spec)
- Node.js 20+
- pnpm
- Python 3.10+
- Docker (for Qdrant)
Create .env in the project root:
Create .env in the project root with the following values:
OPENAI_API_KEY=your_openai_api_key
DATABASE_URL=postgresql+psycopg://postgres:postgres@localhost:5432/rag_documents.env is gitignored.
You can start from .env.example.
pnpm installpython3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtdocker run -d \
--name qdrant \
-p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrantThis persists the vector database to qdrant_storage/ (gitignored).
Verify it's running: http://localhost:6333/dashboard
If you do not already have PostgreSQL running locally, you can start it with Docker:
docker run -d \
--name rag-postgres \
-e POSTGRES_DB=rag_documents \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-p 5432:5432 \
postgres:17The FastAPI app creates the documents table automatically on startup.
pnpm devWith the app running, upload all corpus documents in one command:
python scripts/bulk_upload.pyThis uploads every file under generation/outputs/ to POST /api/documents. The backend chunks, embeds, and indexes each file automatically. Skip this step if the Qdrant collection and BM25 snapshot are already populated.
- App:
http://localhost:3000 - FastAPI docs:
http://127.0.0.1:8000/docs - Qdrant dashboard:
http://localhost:6333/dashboard
The backend now exposes PostgreSQL-backed CRUD endpoints for txt, pdf, and csv documents.
Upload a single document using multipart form data:
curl -X POST "http://127.0.0.1:8000/api/documents" \
-F "files=@generation/outputs/call_log.csv"The API stores:
- original file bytes
- extracted text content
- filename, type, checksum, source path, and upload timestamp
- synchronized Qdrant vectors and BM25 snapshot entries for Technical mode retrieval
List documents with optional filters:
curl "http://127.0.0.1:8000/api/documents?file_type=pdf&search=hydraulic&limit=25"Fetch one document and its extracted text:
curl "http://127.0.0.1:8000/api/documents/<document_id>"Download the original uploaded file:
curl -OJ "http://127.0.0.1:8000/api/documents/<document_id>/download"Delete a document from PostgreSQL:
curl -X DELETE "http://127.0.0.1:8000/api/documents/<document_id>"Use the frontend Documents page to upload the corpus. Each uploaded file is parsed, stored in PostgreSQL, embedded, and indexed into Qdrant/BM25 automatically.
# 1. Start Qdrant (if not already running)
docker start qdrant
# 2. Start Postgres (if not already running)
docker start rag-postgres
# 3. Start Next.js + FastAPI
pnpm devApp: http://localhost:3000 — FastAPI docs: http://127.0.0.1:8000/docs
| Command | What it runs |
|---|---|
pnpm dev |
Next.js + FastAPI together |
pnpm next-dev |
Next.js only |
pnpm fastapi-dev |
FastAPI only (installs Python deps first) |
Request body:
{
"messages": [
{ "role": "user", "content": "What does error E-32 mean?" }
],
"mode": "technical"
}mode is "general" or "technical" (defaults to "general" if omitted).
Response: Content-Type: text/event-stream — Data Stream Protocol events:
start,text-start,text-delta,text-end,finish,[DONE]
Technical mode responses also include response headers:
x-confidence-score— float 0.0–1.0x-citations— list of{ source, reference }objects
curl -N -X POST "http://localhost:3000/api/chat?protocol=data" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What does error E-32 mean?"}],"mode":"technical"}'If you update any documents in generation/outputs/, upload the new files through the Documents UI or POST /api/documents. The backend keeps PostgreSQL, Qdrant, and the BM25 snapshot aligned during upload and delete operations.
OPENAI_API_KEY is required
Ensure .env exists in the project root with a valid key.
Connection refused on port 6333
Qdrant is not running. Start it with the Docker command in step 3.
Check with: docker ps | grep qdrant
FileNotFoundError: bm25_index.pkl
The BM25 snapshot is created automatically the first time documents are uploaded. If it is missing in an existing setup, upload or re-upload a document to regenerate it.
Collection 'rag_poc' not found
Qdrant is running but no documents have been uploaded yet. Upload a document from the frontend or POST /api/documents to initialize the collection.
Port 3000 or 8000 already in use
Stop the existing process and rerun pnpm dev.
Backend not reachable from Next.js in dev
Confirm Uvicorn is running on 127.0.0.1:8000.