FastAPI RAG Backend

A production-ready FastAPI backend with JWT authentication, role- and permission-based authorization, and a provider-agnostic RAG (Retrieval-Augmented Generation) stack. Ingest documents (JSON or file uploads: PDF, DOCX, TXT), chunk and embed them, store vectors in Qdrant, and answer questions with optional hybrid search (dense + BM25), cross-encoder reranking, and LLM-based query expansion. Embedding and LLM backends (Hugging Face, Gemini, OpenAI, Anthropic) are switched via environment variables.

Features

JWT authentication & refresh: Token-based auth with permission checks on protected routes
Users / roles / permissions: SQLAlchemy models and middleware-driven access control
RAG ingestion: Create documents from JSON text or multipart file upload
Hybrid retrieval: Dense vectors + sparse BM25-style vectors with RRF-style fusion in Qdrant
Reranking: Cross-encoder rescoring (configurable model)
Query expansion: Optional LLM-generated query variants before retrieval
Swappable providers: Embeddings and LLMs via EMBEDDING_PROVIDER / LLM_PROVIDER (Hugging Face, Gemini, OpenAI, Anthropic)
Vector store: Qdrant with health-checked Docker service
MySQL: App metadata (users, documents); tables created on startup via SQLAlchemy
Ops: SlowAPI rate limiting, CORS, structured logging, /health, Gunicorn + Uvicorn workers in Docker
Dev UX: Pre-warm of embedder and vector store on startup (first request avoids cold model load when possible)

Technologies

Backend: FastAPI, Uvicorn, Gunicorn (Docker)
Database: MySQL 8 (SQLAlchemy 2.x; sync sessions for routes)
Vector DB: Qdrant (qdrant-client)
Embeddings: sentence-transformers / Gemini / OpenAI (configurable)
LLM: Hugging Face Transformers (local), Gemini, OpenAI, Anthropic (configurable)
RAG utilities: LangChain text splitters, custom chunking, sparse vectors for hybrid search
Security: python-jose (JWT), bcrypt/passlib, middleware for authZ
Containerization: Docker, Docker Compose (MySQL + Qdrant + backend)

RAG pipeline

End-to-end flow for POST /documents/query:

Query expansion (optional): LLM generates alternative phrasings (use_query_expansion, n_expansions).
Retrieval: For each query variant, hybrid search (dense + sparse) or dense-only against Qdrant; optional metadata filters (category, title, source).
Deduplication: Merged hits across expanded queries.
Reranking (optional): Cross-encoder rescores candidates; top-k passed to the LLM.
Generation: LLM answers with a fixed system prompt that restricts answers to retrieved context.

Search without generation: GET /documents/search returns ranked chunks only.

# Query body (POST /documents/query) — key fields
{
  "query": "What is the refund policy?",
  "k": 5,
  "use_hybrid": true,
  "use_reranking": true,
  "use_query_expansion": false,
  "n_expansions": 3,
  "category": null,
  "title": null,
  "source": null
}

Quick Start

You can run everything with Docker Compose, or run the API locally while MySQL/Qdrant run in containers.

Option 1: Full Docker Compose (recommended)

MySQL, Qdrant, and the API start together; the backend waits until MySQL and Qdrant are healthy.

Clone the repository

git clone <repository-url>
cd fastApi-rag

Create a .env file in the project root (do not commit secrets). Compose merges this with the service environment block. At minimum, set API keys for the providers you use (e.g. GEMINI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY). For local Hugging Face embedding/LLM defaults, large models may download on first start—ensure enough disk and RAM.

# Example — add keys only for providers you enable
GEMINI_API_KEY=your_key_here
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_REFRESH_SECRET_KEY=your-super-secret-refresh-key

Start the stack

docker compose up -d --build

Monitor logs

docker compose logs -f backend
docker compose logs -f mysql
docker compose logs -f qdrant

Access the application

Service	URL
API	http://localhost:8000
OpenAPI (Swagger)	http://localhost:8000/docs
ReDoc	http://localhost:8000/redoc
Qdrant REST	http://localhost:6333
Qdrant gRPC	localhost:6334
MySQL	localhost:3306 (user `root`, password set in `docker-compose.yml` / `.env`)

Stop

docker compose down

Data persists in volumes rag-mysql-data and rag-qdrant. Use docker compose down -v to remove volumes and start clean.

Option 2: Local API + Docker for MySQL and Qdrant

Install Python 3.12+ and create a virtualenv:

python3 -m venv venv
source venv/bin/activate
pip install -e .

Start only infrastructure:

docker compose up -d mysql qdrant

.env for local process (example):

DB_HOST=localhost
DB_PORT=3306
DB_USER=root
DB_PASS=Password_2547422
DB_NAME=rag
QDRANT_URL=http://localhost:6333
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_REFRESH_SECRET_KEY=your-super-secret-refresh-key
EMBEDDING_PROVIDER=huggingface
LLM_PROVIDER=huggingface

Align DB_PASS with the MySQL container’s MYSQL_ROOT_PASSWORD if you use the same Compose file.

Run the API

python main.py
# or
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Docker Setup

Services

mysql: MySQL 8 with database rag, persistent volume, healthcheck
qdrant: Latest Qdrant image, ports 6333/6334, healthcheck
backend: Builds from Dockerfile, depends on healthy MySQL and Qdrant, runs Gunicorn with Uvicorn workers

Commands

docker compose up -d --build
docker compose logs -f backend
docker compose ps
docker compose down
docker compose down -v

Build image only

docker build -t fastapi-rag .
docker run --env-file .env -p 8000:8000 fastapi-rag

API Documentation

Base URL: http://localhost:8000

Health

Method	Endpoint	Description	Auth
`GET`	`/health`	Liveness / status	Open

Authentication

Method	Endpoint	Description	Auth
`POST`	`/auth/register`	Register user (with role payload per schema)	Open
`POST`	`/auth/token`	Login → access token	Open
`POST`	`/auth/refresh`	Refresh token	Protected (Bearer)

Users

Method	Endpoint	Description	Auth
`GET`	`/users/`	List users (`skip`, `limit`)	Protected
`POST`	`/users/`	Create user	Protected

Documents & RAG

Method	Endpoint	Description
`POST`	`/documents/`	Ingest JSON body (`content`, optional `title`, `source`, `category`)
`POST`	`/documents/upload`	Multipart file upload (optional form fields `title`, `source`, `category`)
`POST`	`/documents/query`	Full RAG: retrieve (+ optional expansion/rerank) + LLM answer
`GET`	`/documents/search`	Hybrid or dense search; query params: `query`, `k`, `use_hybrid`, `use_reranking`, optional filters
`GET`	`/documents/`	List documents (`skip`, `limit`)
`GET`	`/documents/{doc_id}`	Get one document
`DELETE`	`/documents/{doc_id}`	Delete document and associated vectors

Interactive docs: http://localhost:8000/docs

Authentication & open routes

Routes not listed as open require a valid JWT: Authorization: Bearer <access_token>.

Default open paths (see configs.py — override with OPEN_END_POINTS):

/auth/token, /auth/register
/docs, /openapi.json, /redoc, /health
/news (placeholder; adjust if unused)

All /users/* and /documents/* routes are protected unless you extend OPEN_END_POINTS.

Example: login and call RAG

TOKEN=$(curl -s -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"email":"user@example.com","password":"yourpassword"}' | jq -r '.access_token')

curl -X POST http://localhost:8000/documents/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query":"Summarize the main points","k":5,"use_hybrid":true,"use_reranking":true}'

Project Structure

fastApi-rag/
├── main.py                 # FastAPI app, lifespan (DB init, RAG pre-warm), middleware
├── configs.py              # Env-based settings (JWT, DB, RAG providers)
├── database.py             # SQLAlchemy engines, sessions, table creation
├── custom_logger.py
├── error_handling.py
├── ddl_mysql.sql           # Reference DDL (optional; app also creates tables)
├── Dockerfile
├── docker-compose.yml      # mysql, qdrant, backend
├── pyproject.toml
├── middleware/
│   ├── authentication_middleware.py
│   └── authorization_middleware.py
├── models/                 # User, Role, Permission, Document, junction tables
├── routers/
│   ├── auth_router.py
│   ├── user_router.py
│   └── document_router.py
├── schemas/
├── services/
└── rag/
    ├── config.py           # get_embedder, get_llm, get_vector_store, get_reranker
    ├── embeddings/         # huggingface, gemini, openai
    ├── llm/                # huggingface, gemini, openai, anthropic
    ├── vectorstore/        # qdrant
    ├── reranker/           # cross-encoder
    ├── retriever/          # retrieve, query expansion, LLM orchestration
    └── indexing/           # loaders, splitter, sparse vectors

Environment Variables

Primary variables are read in configs.py and rag/config.py. Common entries:

Variable	Description	Typical default
`DB_HOST`, `DB_PORT`, `DB_USER`, `DB_PASS`, `DB_NAME`	MySQL connection	`localhost`, `3306`, `root`, empty/`rag`
`JWT_SECRET_KEY`, `JWT_REFRESH_SECRET_KEY`	JWT signing	Dev defaults in code — set in production
`CORS_ORIGINS`	Comma-separated origins	Local dev ports + `*` if unset
`OPEN_END_POINTS`	Comma-separated paths without auth	See defaults in `configs.py`
`QDRANT_URL`	Qdrant HTTP URL	`http://localhost:6333`
`QDRANT_COLLECTION`	Collection name	`rag_documents`
`EMBEDDING_PROVIDER`	`huggingface` \| `gemini` \| `openai`	`huggingface`
`EMBEDDING_MODEL`	Model id override	Provider-specific defaults in `rag/config.py`
`LLM_PROVIDER`	`huggingface` \| `gemini` \| `openai` \| `anthropic`	`huggingface`
`LLM_MODEL`	Model id override	Provider-specific defaults
`RERANKER_MODEL`	Cross-encoder id	`cross-encoder/ms-marco-MiniLM-L-6-v2`
`GEMINI_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`	Cloud APIs	Empty if unused

Docker Compose sets DB_* and QDRANT_URL for the backend service; your .env supplies secrets and overrides.

Development Commands

# Install (editable)
pip install -e .

# Run dev server
python main.py
uvicorn main:app --reload --port 8000

# Docker
docker compose up -d --build
docker compose logs -f backend

Troubleshooting

Database connection errors
Ensure MySQL is running and DB_* match the server (especially password vs MYSQL_ROOT_PASSWORD in Compose). Check docker compose logs mysql.
Qdrant unreachable
Confirm QDRANT_URL (use http://qdrant:6333 inside Docker network, http://localhost:6333 from host). Check docker compose logs qdrant.
First request very slow / OOM
Hugging Face models download and load on first use; the app pre-warms embedder and vector store at startup when possible. Reduce model sizes via EMBEDDING_MODEL / LLM_MODEL or switch to API providers.
401 on document routes
Send Authorization: Bearer <token> from /auth/token. Register via /auth/register first if no users exist.
Rate limits
SlowAPI is configured on the app; repeated failures may hit limits—see main.py and route decorators if extended.

Contributing

Fork the repository
Create a branch: git checkout -b feature/your-feature
Commit with clear messages
Open a Pull Request

Follow existing patterns: thin routers, services for business logic, env-driven RAG providers.

License

This project is intended for use under the MIT License (see repository license file when added).

Built with FastAPI, Qdrant, MySQL, and Docker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastAPI RAG Backend

Features

Table of Contents

Technologies

RAG pipeline

Quick Start

Option 1: Full Docker Compose (recommended)

Option 2: Local API + Docker for MySQL and Qdrant

Docker Setup

Services

Commands

Build image only

API Documentation

Health

Authentication

Users

Documents & RAG

Authentication & open routes

Example: login and call RAG

Project Structure

Environment Variables

Development Commands

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
middleware		middleware
models		models
rag		rag
routers		routers
schemas		schemas
services		services
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
RAG_ARCHITECTURE.md		RAG_ARCHITECTURE.md
README.md		README.md
configs.py		configs.py
custom_logger.py		custom_logger.py
database.py		database.py
ddl_mysql.sql		ddl_mysql.sql
docker-compose.yml		docker-compose.yml
error_handling.py		error_handling.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

FastAPI RAG Backend

Features

Table of Contents

Technologies

RAG pipeline

Quick Start

Option 1: Full Docker Compose (recommended)

Option 2: Local API + Docker for MySQL and Qdrant

Docker Setup

Services

Commands

Build image only

API Documentation

Health

Authentication

Users

Documents & RAG

Authentication & open routes

Example: login and call RAG

Project Structure

Environment Variables

Development Commands

Troubleshooting

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages