Skip to content

saimdinky/fastapi-rag

Repository files navigation

FastAPI RAG Backend

Python FastAPI Qdrant MySQL Docker

A production-ready FastAPI backend with JWT authentication, role- and permission-based authorization, and a provider-agnostic RAG (Retrieval-Augmented Generation) stack. Ingest documents (JSON or file uploads: PDF, DOCX, TXT), chunk and embed them, store vectors in Qdrant, and answer questions with optional hybrid search (dense + BM25), cross-encoder reranking, and LLM-based query expansion. Embedding and LLM backends (Hugging Face, Gemini, OpenAI, Anthropic) are switched via environment variables.

Features

  • JWT authentication & refresh: Token-based auth with permission checks on protected routes
  • Users / roles / permissions: SQLAlchemy models and middleware-driven access control
  • RAG ingestion: Create documents from JSON text or multipart file upload
  • Hybrid retrieval: Dense vectors + sparse BM25-style vectors with RRF-style fusion in Qdrant
  • Reranking: Cross-encoder rescoring (configurable model)
  • Query expansion: Optional LLM-generated query variants before retrieval
  • Swappable providers: Embeddings and LLMs via EMBEDDING_PROVIDER / LLM_PROVIDER (Hugging Face, Gemini, OpenAI, Anthropic)
  • Vector store: Qdrant with health-checked Docker service
  • MySQL: App metadata (users, documents); tables created on startup via SQLAlchemy
  • Ops: SlowAPI rate limiting, CORS, structured logging, /health, Gunicorn + Uvicorn workers in Docker
  • Dev UX: Pre-warm of embedder and vector store on startup (first request avoids cold model load when possible)

Table of Contents

Technologies

  • Backend: FastAPI, Uvicorn, Gunicorn (Docker)
  • Database: MySQL 8 (SQLAlchemy 2.x; sync sessions for routes)
  • Vector DB: Qdrant (qdrant-client)
  • Embeddings: sentence-transformers / Gemini / OpenAI (configurable)
  • LLM: Hugging Face Transformers (local), Gemini, OpenAI, Anthropic (configurable)
  • RAG utilities: LangChain text splitters, custom chunking, sparse vectors for hybrid search
  • Security: python-jose (JWT), bcrypt/passlib, middleware for authZ
  • Containerization: Docker, Docker Compose (MySQL + Qdrant + backend)

RAG pipeline

End-to-end flow for POST /documents/query:

  1. Query expansion (optional): LLM generates alternative phrasings (use_query_expansion, n_expansions).
  2. Retrieval: For each query variant, hybrid search (dense + sparse) or dense-only against Qdrant; optional metadata filters (category, title, source).
  3. Deduplication: Merged hits across expanded queries.
  4. Reranking (optional): Cross-encoder rescores candidates; top-k passed to the LLM.
  5. Generation: LLM answers with a fixed system prompt that restricts answers to retrieved context.

Search without generation: GET /documents/search returns ranked chunks only.

# Query body (POST /documents/query) — key fields
{
  "query": "What is the refund policy?",
  "k": 5,
  "use_hybrid": true,
  "use_reranking": true,
  "use_query_expansion": false,
  "n_expansions": 3,
  "category": null,
  "title": null,
  "source": null
}

Quick Start

You can run everything with Docker Compose, or run the API locally while MySQL/Qdrant run in containers.

Option 1: Full Docker Compose (recommended)

MySQL, Qdrant, and the API start together; the backend waits until MySQL and Qdrant are healthy.

  1. Clone the repository
git clone <repository-url>
cd fastApi-rag
  1. Create a .env file in the project root (do not commit secrets). Compose merges this with the service environment block. At minimum, set API keys for the providers you use (e.g. GEMINI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY). For local Hugging Face embedding/LLM defaults, large models may download on first start—ensure enough disk and RAM.
# Example — add keys only for providers you enable
GEMINI_API_KEY=your_key_here
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_REFRESH_SECRET_KEY=your-super-secret-refresh-key
  1. Start the stack
docker compose up -d --build
  1. Monitor logs
docker compose logs -f backend
docker compose logs -f mysql
docker compose logs -f qdrant
  1. Access the application
Service URL
API http://localhost:8000
OpenAPI (Swagger) http://localhost:8000/docs
ReDoc http://localhost:8000/redoc
Qdrant REST http://localhost:6333
Qdrant gRPC localhost:6334
MySQL localhost:3306 (user root, password set in docker-compose.yml / .env)
  1. Stop
docker compose down

Data persists in volumes rag-mysql-data and rag-qdrant. Use docker compose down -v to remove volumes and start clean.

Option 2: Local API + Docker for MySQL and Qdrant

  1. Install Python 3.12+ and create a virtualenv:
python3 -m venv venv
source venv/bin/activate
pip install -e .
  1. Start only infrastructure:
docker compose up -d mysql qdrant
  1. .env for local process (example):
DB_HOST=localhost
DB_PORT=3306
DB_USER=root
DB_PASS=Password_2547422
DB_NAME=rag
QDRANT_URL=http://localhost:6333
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_REFRESH_SECRET_KEY=your-super-secret-refresh-key
EMBEDDING_PROVIDER=huggingface
LLM_PROVIDER=huggingface

Align DB_PASS with the MySQL container’s MYSQL_ROOT_PASSWORD if you use the same Compose file.

  1. Run the API
python main.py
# or
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Docker Setup

Services

  • mysql: MySQL 8 with database rag, persistent volume, healthcheck
  • qdrant: Latest Qdrant image, ports 6333/6334, healthcheck
  • backend: Builds from Dockerfile, depends on healthy MySQL and Qdrant, runs Gunicorn with Uvicorn workers

Commands

docker compose up -d --build
docker compose logs -f backend
docker compose ps
docker compose down
docker compose down -v

Build image only

docker build -t fastapi-rag .
docker run --env-file .env -p 8000:8000 fastapi-rag

API Documentation

Base URL: http://localhost:8000

Health

Method Endpoint Description Auth
GET /health Liveness / status Open

Authentication

Method Endpoint Description Auth
POST /auth/register Register user (with role payload per schema) Open
POST /auth/token Login → access token Open
POST /auth/refresh Refresh token Protected (Bearer)

Users

Method Endpoint Description Auth
GET /users/ List users (skip, limit) Protected
POST /users/ Create user Protected

Documents & RAG

Method Endpoint Description
POST /documents/ Ingest JSON body (content, optional title, source, category)
POST /documents/upload Multipart file upload (optional form fields title, source, category)
POST /documents/query Full RAG: retrieve (+ optional expansion/rerank) + LLM answer
GET /documents/search Hybrid or dense search; query params: query, k, use_hybrid, use_reranking, optional filters
GET /documents/ List documents (skip, limit)
GET /documents/{doc_id} Get one document
DELETE /documents/{doc_id} Delete document and associated vectors

Interactive docs: http://localhost:8000/docs

Authentication & open routes

Routes not listed as open require a valid JWT: Authorization: Bearer <access_token>.

Default open paths (see configs.py — override with OPEN_END_POINTS):

  • /auth/token, /auth/register
  • /docs, /openapi.json, /redoc, /health
  • /news (placeholder; adjust if unused)

All /users/* and /documents/* routes are protected unless you extend OPEN_END_POINTS.

Example: login and call RAG

TOKEN=$(curl -s -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"email":"user@example.com","password":"yourpassword"}' | jq -r '.access_token')

curl -X POST http://localhost:8000/documents/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query":"Summarize the main points","k":5,"use_hybrid":true,"use_reranking":true}'

Project Structure

fastApi-rag/
├── main.py                 # FastAPI app, lifespan (DB init, RAG pre-warm), middleware
├── configs.py              # Env-based settings (JWT, DB, RAG providers)
├── database.py             # SQLAlchemy engines, sessions, table creation
├── custom_logger.py
├── error_handling.py
├── ddl_mysql.sql           # Reference DDL (optional; app also creates tables)
├── Dockerfile
├── docker-compose.yml      # mysql, qdrant, backend
├── pyproject.toml
├── middleware/
│   ├── authentication_middleware.py
│   └── authorization_middleware.py
├── models/                 # User, Role, Permission, Document, junction tables
├── routers/
│   ├── auth_router.py
│   ├── user_router.py
│   └── document_router.py
├── schemas/
├── services/
└── rag/
    ├── config.py           # get_embedder, get_llm, get_vector_store, get_reranker
    ├── embeddings/         # huggingface, gemini, openai
    ├── llm/                # huggingface, gemini, openai, anthropic
    ├── vectorstore/        # qdrant
    ├── reranker/           # cross-encoder
    ├── retriever/          # retrieve, query expansion, LLM orchestration
    └── indexing/           # loaders, splitter, sparse vectors

Environment Variables

Primary variables are read in configs.py and rag/config.py. Common entries:

Variable Description Typical default
DB_HOST, DB_PORT, DB_USER, DB_PASS, DB_NAME MySQL connection localhost, 3306, root, empty/rag
JWT_SECRET_KEY, JWT_REFRESH_SECRET_KEY JWT signing Dev defaults in code — set in production
CORS_ORIGINS Comma-separated origins Local dev ports + * if unset
OPEN_END_POINTS Comma-separated paths without auth See defaults in configs.py
QDRANT_URL Qdrant HTTP URL http://localhost:6333
QDRANT_COLLECTION Collection name rag_documents
EMBEDDING_PROVIDER huggingface | gemini | openai huggingface
EMBEDDING_MODEL Model id override Provider-specific defaults in rag/config.py
LLM_PROVIDER huggingface | gemini | openai | anthropic huggingface
LLM_MODEL Model id override Provider-specific defaults
RERANKER_MODEL Cross-encoder id cross-encoder/ms-marco-MiniLM-L-6-v2
GEMINI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY Cloud APIs Empty if unused

Docker Compose sets DB_* and QDRANT_URL for the backend service; your .env supplies secrets and overrides.

Development Commands

# Install (editable)
pip install -e .

# Run dev server
python main.py
uvicorn main:app --reload --port 8000

# Docker
docker compose up -d --build
docker compose logs -f backend

Troubleshooting

  1. Database connection errors
    Ensure MySQL is running and DB_* match the server (especially password vs MYSQL_ROOT_PASSWORD in Compose). Check docker compose logs mysql.

  2. Qdrant unreachable
    Confirm QDRANT_URL (use http://qdrant:6333 inside Docker network, http://localhost:6333 from host). Check docker compose logs qdrant.

  3. First request very slow / OOM
    Hugging Face models download and load on first use; the app pre-warms embedder and vector store at startup when possible. Reduce model sizes via EMBEDDING_MODEL / LLM_MODEL or switch to API providers.

  4. 401 on document routes
    Send Authorization: Bearer <token> from /auth/token. Register via /auth/register first if no users exist.

  5. Rate limits
    SlowAPI is configured on the app; repeated failures may hit limits—see main.py and route decorators if extended.

Contributing

  1. Fork the repository
  2. Create a branch: git checkout -b feature/your-feature
  3. Commit with clear messages
  4. Open a Pull Request

Follow existing patterns: thin routers, services for business logic, env-driven RAG providers.

License

This project is intended for use under the MIT License (see repository license file when added).


Built with FastAPI, Qdrant, MySQL, and Docker

About

FastAPI RAG Backend with Multi-Provider Support

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors