OVSoftware RAG — Next.js + FastAPI

Full-stack RAG (Retrieval-Augmented Generation) chat application for Vanorak winter road maintenance equipment support. Built with Next.js (frontend) and FastAPI (backend), streaming responses over the Vercel AI SDK Data Stream Protocol.

Documentation

File	Audience
`docs/overview.md`	Client / management — what was built, results, next steps
`docs/documentation.md`	Technical — full requirements spec, design decisions, evaluation
`docs/system_architecture.md`	Technical — component diagram and data flow
`docs/technical_notes.md`	Developer — thresholds, design decisions, known limitations

What It Does

Two query modes: General (direct LLM) and Technical (full RAG pipeline)
Technical mode retrieves context via hybrid search — dense vector (Qdrant) + keyword (BM25) fused with RRF
Streams assistant output token-by-token
Returns source citations and a confidence score (0.0–1.0) with every Technical response

Tech Stack

Frontend: Next.js 16, React 19, TypeScript, Vercel AI SDK, Jotai
Backend: FastAPI, Uvicorn, OpenAI SDK (gpt-4o-mini + text-embedding-3-small)
Document DB: PostgreSQL (stores uploaded txt, pdf, and csv files plus extracted text)
Vector DB: Qdrant (Docker, local)
Keyword search: rank_bm25 (BM25 Okapi, persisted as .pkl)
PDF parsing: PyMuPDF
Streaming: Server-Sent Events, Data Stream Protocol

Architecture

src/app/                        Next.js App Router entry
src/components/chat/            Chat UI, composer, mode toggle
src/store/chat.ts               Jotai atom — chat mode state
api/index.py                    FastAPI route — mode routing (General / Technical)
api/routes/documents.py         Document CRUD routes (PostgreSQL)
api/services/vector_sync.py     Sync uploaded docs into Qdrant + BM25
api/db.py                       SQLAlchemy engine/session/bootstrap
api/utils/rag.py                RAG pipeline orchestrator
api/utils/bm25.py               BM25 index load/save/query
api/utils/embedder.py           text-embedding-3-small query embedder
api/utils/retrieval.py          Hybrid search (dense + BM25 + RRF)
api/utils/reranker.py           Cross-encoder reranking
api/utils/confidence.py         Confidence scoring (RRF score + LLM self-eval)
api/utils/generator.py          Context formatting for the LLM prompt
api/utils/stream.py             SSE stream formatting
generation/                     Synthetic corpus generators and outputs
scripts/bulk_upload.py          Uploads all corpus files to the CMS API
docs/                           Documentation (architecture, implementation notes, full spec)

Prerequisites

Node.js 20+
pnpm
Python 3.10+
Docker (for Qdrant)

Environment Variables

Create .env in the project root:

Create .env in the project root with the following values:

OPENAI_API_KEY=your_openai_api_key
DATABASE_URL=postgresql+psycopg://postgres:postgres@localhost:5432/rag_documents

.env is gitignored. You can start from .env.example.

Local Setup

1. Install JavaScript dependencies

pnpm install

2. Set up Python environment

python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Start Qdrant via Docker

docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

This persists the vector database to qdrant_storage/ (gitignored).

Verify it's running: http://localhost:6333/dashboard

Optional: Start PostgreSQL for the document API

If you do not already have PostgreSQL running locally, you can start it with Docker:

docker run -d \
  --name rag-postgres \
  -e POSTGRES_DB=rag_documents \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -p 5432:5432 \
  postgres:17

The FastAPI app creates the documents table automatically on startup.

4. Start the app

pnpm dev

5. Load the corpus

With the app running, upload all corpus documents in one command:

python scripts/bulk_upload.py

This uploads every file under generation/outputs/ to POST /api/documents. The backend chunks, embeds, and indexes each file automatically. Skip this step if the Qdrant collection and BM25 snapshot are already populated.

App: http://localhost:3000
FastAPI docs: http://127.0.0.1:8000/docs
Qdrant dashboard: http://localhost:6333/dashboard

Document API

The backend now exposes PostgreSQL-backed CRUD endpoints for txt, pdf, and csv documents.

`POST /api/documents`

Upload a single document using multipart form data:

curl -X POST "http://127.0.0.1:8000/api/documents" \
  -F "files=@generation/outputs/call_log.csv"

The API stores:

original file bytes
extracted text content
filename, type, checksum, source path, and upload timestamp
synchronized Qdrant vectors and BM25 snapshot entries for Technical mode retrieval

`GET /api/documents`

List documents with optional filters:

curl "http://127.0.0.1:8000/api/documents?file_type=pdf&search=hydraulic&limit=25"

`GET /api/documents/{document_id}`

Fetch one document and its extracted text:

curl "http://127.0.0.1:8000/api/documents/<document_id>"

`GET /api/documents/{document_id}/download`

Download the original uploaded file:

curl -OJ "http://127.0.0.1:8000/api/documents/<document_id>/download"

`DELETE /api/documents/{document_id}`

Delete a document from PostgreSQL:

curl -X DELETE "http://127.0.0.1:8000/api/documents/<document_id>"

Use the frontend Documents page to upload the corpus. Each uploaded file is parsed, stored in PostgreSQL, embedded, and indexed into Qdrant/BM25 automatically.

Running the App (after initial setup)

# 1. Start Qdrant (if not already running)
docker start qdrant

# 2. Start Postgres (if not already running)
docker start rag-postgres

# 3. Start Next.js + FastAPI
pnpm dev

App: http://localhost:3000 — FastAPI docs: http://127.0.0.1:8000/docs

Run Modes

Command	What it runs
`pnpm dev`	Next.js + FastAPI together
`pnpm next-dev`	Next.js only
`pnpm fastapi-dev`	FastAPI only (installs Python deps first)

API Contract

`POST /api/chat?protocol=data`

Request body:

{
  "messages": [
    { "role": "user", "content": "What does error E-32 mean?" }
  ],
  "mode": "technical"
}

mode is "general" or "technical" (defaults to "general" if omitted).

Response: Content-Type: text/event-stream — Data Stream Protocol events:

start, text-start, text-delta, text-end, finish, [DONE]

Technical mode responses also include response headers:

x-confidence-score — float 0.0–1.0
x-citations — list of { source, reference } objects

Smoke test (Technical mode)

curl -N -X POST "http://localhost:3000/api/chat?protocol=data" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What does error E-32 mean?"}],"mode":"technical"}'

Updating the Corpus

If you update any documents in generation/outputs/, upload the new files through the Documents UI or POST /api/documents. The backend keeps PostgreSQL, Qdrant, and the BM25 snapshot aligned during upload and delete operations.

Troubleshooting

OPENAI_API_KEY is required Ensure .env exists in the project root with a valid key.

Connection refused on port 6333 Qdrant is not running. Start it with the Docker command in step 3. Check with: docker ps | grep qdrant

FileNotFoundError: bm25_index.pkl The BM25 snapshot is created automatically the first time documents are uploaded. If it is missing in an existing setup, upload or re-upload a document to regenerate it.

Collection 'rag_poc' not found Qdrant is running but no documents have been uploaded yet. Upload a document from the frontend or POST /api/documents to initialize the collection.

Port 3000 or 8000 already in use Stop the existing process and rerun pnpm dev.

Backend not reachable from Next.js in dev Confirm Uvicorn is running on 127.0.0.1:8000.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.vscode		.vscode
api		api
docs		docs
evaluation		evaluation
generation		generation
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
biome.jsonc		biome.jsonc
components.json		components.json
lefthook.yml		lefthook.yml
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OVSoftware RAG — Next.js + FastAPI

Documentation

What It Does

Tech Stack

Architecture

Prerequisites

Environment Variables

Local Setup

1. Install JavaScript dependencies

2. Set up Python environment

3. Start Qdrant via Docker

Optional: Start PostgreSQL for the document API

4. Start the app

5. Load the corpus

Document API

`POST /api/documents`

`GET /api/documents`

`GET /api/documents/{document_id}`

`GET /api/documents/{document_id}/download`

`DELETE /api/documents/{document_id}`

Running the App (after initial setup)

Run Modes

API Contract

`POST /api/chat?protocol=data`

Smoke test (Technical mode)

Updating the Corpus

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OVSoftware RAG — Next.js + FastAPI

Documentation

What It Does

Tech Stack

Architecture

Prerequisites

Environment Variables

Local Setup

1. Install JavaScript dependencies

2. Set up Python environment

3. Start Qdrant via Docker

Optional: Start PostgreSQL for the document API

4. Start the app

5. Load the corpus

Document API

POST /api/documents

GET /api/documents

GET /api/documents/{document_id}

GET /api/documents/{document_id}/download

DELETE /api/documents/{document_id}

Running the App (after initial setup)

Run Modes

API Contract

POST /api/chat?protocol=data

Smoke test (Technical mode)

Updating the Corpus

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/documents`

`GET /api/documents`

`GET /api/documents/{document_id}`

`GET /api/documents/{document_id}/download`

`DELETE /api/documents/{document_id}`

`POST /api/chat?protocol=data`

Packages