contextual-qa-engine

A production-style RAG (Retrieval-Augmented Generation) backend built with FastAPI. Upload documents, ask questions, get grounded answers with citations.

The pipeline combines hybrid search (semantic + lexical) with cross-encoder reranking for high-quality retrieval, and persists conversations in Postgres so follow-up questions can use prior context.

Features

Document ingestion — PDF and text files, chunked with metadata (source, page, chunk index)
Hybrid retrieval — pgvector (HNSW) + Postgres FTS (GIN) fused with Reciprocal Rank Fusion
Cross-encoder reranking — Cohere rerank-english-v3.0 re-orders the candidate pool
Conversational /ask — sessions, sliding-window history, numbered citations [1] [2]
Pluggable retrieval mode — switch between semantic | lexical | hybrid via env var
Benchmark harness — golden set + Recall@K / MRR / nDCG@10 metrics
LangSmith tracing — optional, for cost/latency observability

Architecture

upload ─▶ chunker ─▶ embeddings ─▶ pgvector (vector + tsvector columns)
                                          │
ask ─▶ query rewrite ─▶ hybrid retrieve (RRF) ─▶ Cohere rerank ─▶ LLM ─▶ answer + citations
                                                                    │
                                                              session history (Postgres)

Stack: FastAPI · SQLAlchemy (async) · Alembic · pgvector · OpenAI (gpt-4o-mini, text-embedding-3-small) · Cohere · LangChain (loaders + chunkers only).

Quick start

# 1. Configure secrets
cp .env.example .env
# edit .env: set OPENAI_API_KEY and (optional) COHERE_API_KEY

# 2. Start the stack (FastAPI + Postgres with pgvector)
make up

# 3. Apply migrations
make migration-up

# 4. Sanity check
curl http://localhost:8000/health

The API is now at http://localhost:8000. OpenAPI docs at /docs.

API endpoints

Method	Path	Purpose
`POST`	`/upload`	Upload a PDF or text file, ingest into the vector store
`GET`	`/documents`	List ingested documents
`DELETE`	`/documents/{source}`	Remove a document and its chunks
`POST`	`/search`	Raw retrieval — returns top chunks (no LLM call)
`POST`	`/ask`	Ask a question. Accepts optional `session_id` for follow-ups
`POST`	`/sessions`	Create a new conversation session
`GET`	`/sessions`	List sessions
`GET`	`/sessions/{id}/history`	Get full message history for a session
`DELETE`	`/sessions/{id}`	Delete a session

Configuration

All settings are env vars (see .env.example). The important ones:

Variable	Default	Purpose
`OPENAI_API_KEY`	—	Required, for LLM + embeddings
`COHERE_API_KEY`	—	Required for reranking
`DATABASE_URL`	`postgresql+asyncpg://postgres:postgres@db:5432/contextual_qa`	Postgres connection (asyncpg driver)
`RETRIEVAL_MODE`	`hybrid`	`semantic` / `lexical` / `hybrid`
`RERANKER_MODEL`	`rerank-english-v3.0`	Cohere reranker model
`RERANKER_CANDIDATE_K`	`20`	Candidate pool size before reranking
`MAX_HISTORY_MESSAGES`	`20`	Sliding window for conversational context
`LANGSMITH_TRACING`	`false`	Set `true` to send traces to LangSmith

Development

make up               # start app + postgres
make down             # stop
make logs             # tail container logs
make shell            # exec into the app container
make migration-up     # apply Alembic migrations
make migration-create m="add foo"   # generate new migration

For local (non-Docker) script runs, point at the host-mapped Postgres port:

source .venv/bin/activate
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5433/contextual_qa \
  python scripts/eval_retrieval.py --top-k 5 --mode hybrid --rerank

Benchmarking

The retrieval pipeline is benchmarked against a 25-question golden set in benchmark/golden_set.json. Current best configuration:

Configuration	recall@5	MRR	nDCG@10
semantic only (pgvector)	0.60	0.42	0.7334
lexical only (FTS)	0.16	0.12	0.1659
hybrid (RRF)	0.60	0.43	0.7499
hybrid + Cohere rerank	0.68	0.55	0.8678

Full methodology, formulas, and per-run history are in benchmark/BENCHMARK.md.

Project structure

app/
  core/         # config, database engine
  models/       # SQLAlchemy models (Document, Chunk, Session, Message)
  routers/      # FastAPI endpoints
  services/     # chunker, embedding, pg_vector_store, reranker, llm, prompt, ...
alembic/        # database migrations
benchmark/      # golden set, metrics, results
scripts/        # eval_retrieval.py and other CLI tools
data/           # uploaded files + cached chunk dumps (gitignored)

Documentation

PLAN.md — phased development roadmap, what's done and what's next
benchmark/BENCHMARK.md — retrieval evaluation methodology and results
RESOURCES.md — design notes, sources for test corpora, LangSmith usage
CODING_STYLE.md — code conventions used in this repo

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
alembic		alembic
app		app
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CODING_STYLE.md		CODING_STYLE.md
Dockerfile		Dockerfile
Makefile		Makefile
PLAN.md		PLAN.md
README.md		README.md
RESOURCES.md		RESOURCES.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

contextual-qa-engine

Features

Architecture

Quick start

API endpoints

Configuration

Development

Benchmarking

Project structure

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

contextual-qa-engine

Features

Architecture

Quick start

API endpoints

Configuration

Development

Benchmarking

Project structure

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages