Skip to content

artalatarta/rag

Repository files navigation

OVSoftware RAG — Next.js + FastAPI

Full-stack RAG (Retrieval-Augmented Generation) chat application for Vanorak winter road maintenance equipment support. Built with Next.js (frontend) and FastAPI (backend), streaming responses over the Vercel AI SDK Data Stream Protocol.

Documentation

File Audience
docs/overview.md Client / management — what was built, results, next steps
docs/documentation.md Technical — full requirements spec, design decisions, evaluation
docs/system_architecture.md Technical — component diagram and data flow
docs/technical_notes.md Developer — thresholds, design decisions, known limitations

What It Does

  • Two query modes: General (direct LLM) and Technical (full RAG pipeline)
  • Technical mode retrieves context via hybrid search — dense vector (Qdrant) + keyword (BM25) fused with RRF
  • Streams assistant output token-by-token
  • Returns source citations and a confidence score (0.0–1.0) with every Technical response

Tech Stack

  • Frontend: Next.js 16, React 19, TypeScript, Vercel AI SDK, Jotai
  • Backend: FastAPI, Uvicorn, OpenAI SDK (gpt-4o-mini + text-embedding-3-small)
  • Document DB: PostgreSQL (stores uploaded txt, pdf, and csv files plus extracted text)
  • Vector DB: Qdrant (Docker, local)
  • Keyword search: rank_bm25 (BM25 Okapi, persisted as .pkl)
  • PDF parsing: PyMuPDF
  • Streaming: Server-Sent Events, Data Stream Protocol

Architecture

src/app/                        Next.js App Router entry
src/components/chat/            Chat UI, composer, mode toggle
src/store/chat.ts               Jotai atom — chat mode state
api/index.py                    FastAPI route — mode routing (General / Technical)
api/routes/documents.py         Document CRUD routes (PostgreSQL)
api/services/vector_sync.py     Sync uploaded docs into Qdrant + BM25
api/db.py                       SQLAlchemy engine/session/bootstrap
api/utils/rag.py                RAG pipeline orchestrator
api/utils/bm25.py               BM25 index load/save/query
api/utils/embedder.py           text-embedding-3-small query embedder
api/utils/retrieval.py          Hybrid search (dense + BM25 + RRF)
api/utils/reranker.py           Cross-encoder reranking
api/utils/confidence.py         Confidence scoring (RRF score + LLM self-eval)
api/utils/generator.py          Context formatting for the LLM prompt
api/utils/stream.py             SSE stream formatting
generation/                     Synthetic corpus generators and outputs
scripts/bulk_upload.py          Uploads all corpus files to the CMS API
docs/                           Documentation (architecture, implementation notes, full spec)

Prerequisites

  • Node.js 20+
  • pnpm
  • Python 3.10+
  • Docker (for Qdrant)

Environment Variables

Create .env in the project root:

Create .env in the project root with the following values:

OPENAI_API_KEY=your_openai_api_key
DATABASE_URL=postgresql+psycopg://postgres:postgres@localhost:5432/rag_documents

.env is gitignored. You can start from .env.example.

Local Setup

1. Install JavaScript dependencies

pnpm install

2. Set up Python environment

python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Start Qdrant via Docker

docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

This persists the vector database to qdrant_storage/ (gitignored).

Verify it's running: http://localhost:6333/dashboard

Optional: Start PostgreSQL for the document API

If you do not already have PostgreSQL running locally, you can start it with Docker:

docker run -d \
  --name rag-postgres \
  -e POSTGRES_DB=rag_documents \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -p 5432:5432 \
  postgres:17

The FastAPI app creates the documents table automatically on startup.

4. Start the app

pnpm dev

5. Load the corpus

With the app running, upload all corpus documents in one command:

python scripts/bulk_upload.py

This uploads every file under generation/outputs/ to POST /api/documents. The backend chunks, embeds, and indexes each file automatically. Skip this step if the Qdrant collection and BM25 snapshot are already populated.

  • App: http://localhost:3000
  • FastAPI docs: http://127.0.0.1:8000/docs
  • Qdrant dashboard: http://localhost:6333/dashboard

Document API

The backend now exposes PostgreSQL-backed CRUD endpoints for txt, pdf, and csv documents.

POST /api/documents

Upload a single document using multipart form data:

curl -X POST "http://127.0.0.1:8000/api/documents" \
  -F "files=@generation/outputs/call_log.csv"

The API stores:

  • original file bytes
  • extracted text content
  • filename, type, checksum, source path, and upload timestamp
  • synchronized Qdrant vectors and BM25 snapshot entries for Technical mode retrieval

GET /api/documents

List documents with optional filters:

curl "http://127.0.0.1:8000/api/documents?file_type=pdf&search=hydraulic&limit=25"

GET /api/documents/{document_id}

Fetch one document and its extracted text:

curl "http://127.0.0.1:8000/api/documents/<document_id>"

GET /api/documents/{document_id}/download

Download the original uploaded file:

curl -OJ "http://127.0.0.1:8000/api/documents/<document_id>/download"

DELETE /api/documents/{document_id}

Delete a document from PostgreSQL:

curl -X DELETE "http://127.0.0.1:8000/api/documents/<document_id>"

Use the frontend Documents page to upload the corpus. Each uploaded file is parsed, stored in PostgreSQL, embedded, and indexed into Qdrant/BM25 automatically.

Running the App (after initial setup)

# 1. Start Qdrant (if not already running)
docker start qdrant

# 2. Start Postgres (if not already running)
docker start rag-postgres

# 3. Start Next.js + FastAPI
pnpm dev

App: http://localhost:3000 — FastAPI docs: http://127.0.0.1:8000/docs

Run Modes

Command What it runs
pnpm dev Next.js + FastAPI together
pnpm next-dev Next.js only
pnpm fastapi-dev FastAPI only (installs Python deps first)

API Contract

POST /api/chat?protocol=data

Request body:

{
  "messages": [
    { "role": "user", "content": "What does error E-32 mean?" }
  ],
  "mode": "technical"
}

mode is "general" or "technical" (defaults to "general" if omitted).

Response: Content-Type: text/event-stream — Data Stream Protocol events:

  • start, text-start, text-delta, text-end, finish, [DONE]

Technical mode responses also include response headers:

  • x-confidence-score — float 0.0–1.0
  • x-citations — list of { source, reference } objects

Smoke test (Technical mode)

curl -N -X POST "http://localhost:3000/api/chat?protocol=data" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What does error E-32 mean?"}],"mode":"technical"}'

Updating the Corpus

If you update any documents in generation/outputs/, upload the new files through the Documents UI or POST /api/documents. The backend keeps PostgreSQL, Qdrant, and the BM25 snapshot aligned during upload and delete operations.

Troubleshooting

OPENAI_API_KEY is required Ensure .env exists in the project root with a valid key.

Connection refused on port 6333 Qdrant is not running. Start it with the Docker command in step 3. Check with: docker ps | grep qdrant

FileNotFoundError: bm25_index.pkl The BM25 snapshot is created automatically the first time documents are uploaded. If it is missing in an existing setup, upload or re-upload a document to regenerate it.

Collection 'rag_poc' not found Qdrant is running but no documents have been uploaded yet. Upload a document from the frontend or POST /api/documents to initialize the collection.

Port 3000 or 8000 already in use Stop the existing process and rerun pnpm dev.

Backend not reachable from Next.js in dev Confirm Uvicorn is running on 127.0.0.1:8000.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors