A production-ready FastAPI backend with JWT authentication, role- and permission-based authorization, and a provider-agnostic RAG (Retrieval-Augmented Generation) stack. Ingest documents (JSON or file uploads: PDF, DOCX, TXT), chunk and embed them, store vectors in Qdrant, and answer questions with optional hybrid search (dense + BM25), cross-encoder reranking, and LLM-based query expansion. Embedding and LLM backends (Hugging Face, Gemini, OpenAI, Anthropic) are switched via environment variables.
- JWT authentication & refresh: Token-based auth with permission checks on protected routes
- Users / roles / permissions: SQLAlchemy models and middleware-driven access control
- RAG ingestion: Create documents from JSON text or multipart file upload
- Hybrid retrieval: Dense vectors + sparse BM25-style vectors with RRF-style fusion in Qdrant
- Reranking: Cross-encoder rescoring (configurable model)
- Query expansion: Optional LLM-generated query variants before retrieval
- Swappable providers: Embeddings and LLMs via
EMBEDDING_PROVIDER/LLM_PROVIDER(Hugging Face, Gemini, OpenAI, Anthropic) - Vector store: Qdrant with health-checked Docker service
- MySQL: App metadata (users, documents); tables created on startup via SQLAlchemy
- Ops: SlowAPI rate limiting, CORS, structured logging,
/health, Gunicorn + Uvicorn workers in Docker - Dev UX: Pre-warm of embedder and vector store on startup (first request avoids cold model load when possible)
- Technologies
- RAG pipeline
- Quick Start
- Docker Setup
- API Documentation
- Authentication & open routes
- Project Structure
- Environment Variables
- Development Commands
- Troubleshooting
- Contributing
- Backend: FastAPI, Uvicorn, Gunicorn (Docker)
- Database: MySQL 8 (SQLAlchemy 2.x; sync sessions for routes)
- Vector DB: Qdrant (
qdrant-client) - Embeddings: sentence-transformers / Gemini / OpenAI (configurable)
- LLM: Hugging Face Transformers (local), Gemini, OpenAI, Anthropic (configurable)
- RAG utilities: LangChain text splitters, custom chunking, sparse vectors for hybrid search
- Security: python-jose (JWT), bcrypt/passlib, middleware for authZ
- Containerization: Docker, Docker Compose (MySQL + Qdrant + backend)
End-to-end flow for POST /documents/query:
- Query expansion (optional): LLM generates alternative phrasings (
use_query_expansion,n_expansions). - Retrieval: For each query variant, hybrid search (dense + sparse) or dense-only against Qdrant; optional metadata filters (
category,title,source). - Deduplication: Merged hits across expanded queries.
- Reranking (optional): Cross-encoder rescores candidates; top-
kpassed to the LLM. - Generation: LLM answers with a fixed system prompt that restricts answers to retrieved context.
Search without generation: GET /documents/search returns ranked chunks only.
# Query body (POST /documents/query) — key fields
{
"query": "What is the refund policy?",
"k": 5,
"use_hybrid": true,
"use_reranking": true,
"use_query_expansion": false,
"n_expansions": 3,
"category": null,
"title": null,
"source": null
}You can run everything with Docker Compose, or run the API locally while MySQL/Qdrant run in containers.
MySQL, Qdrant, and the API start together; the backend waits until MySQL and Qdrant are healthy.
- Clone the repository
git clone <repository-url>
cd fastApi-rag- Create a
.envfile in the project root (do not commit secrets). Compose merges this with the serviceenvironmentblock. At minimum, set API keys for the providers you use (e.g.GEMINI_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY). For local Hugging Face embedding/LLM defaults, large models may download on first start—ensure enough disk and RAM.
# Example — add keys only for providers you enable
GEMINI_API_KEY=your_key_here
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_REFRESH_SECRET_KEY=your-super-secret-refresh-key- Start the stack
docker compose up -d --build- Monitor logs
docker compose logs -f backend
docker compose logs -f mysql
docker compose logs -f qdrant- Access the application
| Service | URL |
|---|---|
| API | http://localhost:8000 |
| OpenAPI (Swagger) | http://localhost:8000/docs |
| ReDoc | http://localhost:8000/redoc |
| Qdrant REST | http://localhost:6333 |
| Qdrant gRPC | localhost:6334 |
| MySQL | localhost:3306 (user root, password set in docker-compose.yml / .env) |
- Stop
docker compose downData persists in volumes rag-mysql-data and rag-qdrant. Use docker compose down -v to remove volumes and start clean.
- Install Python 3.12+ and create a virtualenv:
python3 -m venv venv
source venv/bin/activate
pip install -e .- Start only infrastructure:
docker compose up -d mysql qdrant.envfor local process (example):
DB_HOST=localhost
DB_PORT=3306
DB_USER=root
DB_PASS=Password_2547422
DB_NAME=rag
QDRANT_URL=http://localhost:6333
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_REFRESH_SECRET_KEY=your-super-secret-refresh-key
EMBEDDING_PROVIDER=huggingface
LLM_PROVIDER=huggingfaceAlign DB_PASS with the MySQL container’s MYSQL_ROOT_PASSWORD if you use the same Compose file.
- Run the API
python main.py
# or
uvicorn main:app --reload --host 0.0.0.0 --port 8000- mysql: MySQL 8 with database
rag, persistent volume, healthcheck - qdrant: Latest Qdrant image, ports 6333/6334, healthcheck
- backend: Builds from
Dockerfile, depends on healthy MySQL and Qdrant, runs Gunicorn with Uvicorn workers
docker compose up -d --build
docker compose logs -f backend
docker compose ps
docker compose down
docker compose down -vdocker build -t fastapi-rag .
docker run --env-file .env -p 8000:8000 fastapi-ragBase URL: http://localhost:8000
| Method | Endpoint | Description | Auth |
|---|---|---|---|
GET |
/health |
Liveness / status | Open |
| Method | Endpoint | Description | Auth |
|---|---|---|---|
POST |
/auth/register |
Register user (with role payload per schema) | Open |
POST |
/auth/token |
Login → access token | Open |
POST |
/auth/refresh |
Refresh token | Protected (Bearer) |
| Method | Endpoint | Description | Auth |
|---|---|---|---|
GET |
/users/ |
List users (skip, limit) |
Protected |
POST |
/users/ |
Create user | Protected |
| Method | Endpoint | Description |
|---|---|---|
POST |
/documents/ |
Ingest JSON body (content, optional title, source, category) |
POST |
/documents/upload |
Multipart file upload (optional form fields title, source, category) |
POST |
/documents/query |
Full RAG: retrieve (+ optional expansion/rerank) + LLM answer |
GET |
/documents/search |
Hybrid or dense search; query params: query, k, use_hybrid, use_reranking, optional filters |
GET |
/documents/ |
List documents (skip, limit) |
GET |
/documents/{doc_id} |
Get one document |
DELETE |
/documents/{doc_id} |
Delete document and associated vectors |
Interactive docs: http://localhost:8000/docs
Routes not listed as open require a valid JWT: Authorization: Bearer <access_token>.
Default open paths (see configs.py — override with OPEN_END_POINTS):
/auth/token,/auth/register/docs,/openapi.json,/redoc,/health/news(placeholder; adjust if unused)
All /users/* and /documents/* routes are protected unless you extend OPEN_END_POINTS.
TOKEN=$(curl -s -X POST http://localhost:8000/auth/token \
-H "Content-Type: application/json" \
-d '{"email":"user@example.com","password":"yourpassword"}' | jq -r '.access_token')
curl -X POST http://localhost:8000/documents/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"query":"Summarize the main points","k":5,"use_hybrid":true,"use_reranking":true}'fastApi-rag/
├── main.py # FastAPI app, lifespan (DB init, RAG pre-warm), middleware
├── configs.py # Env-based settings (JWT, DB, RAG providers)
├── database.py # SQLAlchemy engines, sessions, table creation
├── custom_logger.py
├── error_handling.py
├── ddl_mysql.sql # Reference DDL (optional; app also creates tables)
├── Dockerfile
├── docker-compose.yml # mysql, qdrant, backend
├── pyproject.toml
├── middleware/
│ ├── authentication_middleware.py
│ └── authorization_middleware.py
├── models/ # User, Role, Permission, Document, junction tables
├── routers/
│ ├── auth_router.py
│ ├── user_router.py
│ └── document_router.py
├── schemas/
├── services/
└── rag/
├── config.py # get_embedder, get_llm, get_vector_store, get_reranker
├── embeddings/ # huggingface, gemini, openai
├── llm/ # huggingface, gemini, openai, anthropic
├── vectorstore/ # qdrant
├── reranker/ # cross-encoder
├── retriever/ # retrieve, query expansion, LLM orchestration
└── indexing/ # loaders, splitter, sparse vectors
Primary variables are read in configs.py and rag/config.py. Common entries:
| Variable | Description | Typical default |
|---|---|---|
DB_HOST, DB_PORT, DB_USER, DB_PASS, DB_NAME |
MySQL connection | localhost, 3306, root, empty/rag |
JWT_SECRET_KEY, JWT_REFRESH_SECRET_KEY |
JWT signing | Dev defaults in code — set in production |
CORS_ORIGINS |
Comma-separated origins | Local dev ports + * if unset |
OPEN_END_POINTS |
Comma-separated paths without auth | See defaults in configs.py |
QDRANT_URL |
Qdrant HTTP URL | http://localhost:6333 |
QDRANT_COLLECTION |
Collection name | rag_documents |
EMBEDDING_PROVIDER |
huggingface | gemini | openai |
huggingface |
EMBEDDING_MODEL |
Model id override | Provider-specific defaults in rag/config.py |
LLM_PROVIDER |
huggingface | gemini | openai | anthropic |
huggingface |
LLM_MODEL |
Model id override | Provider-specific defaults |
RERANKER_MODEL |
Cross-encoder id | cross-encoder/ms-marco-MiniLM-L-6-v2 |
GEMINI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY |
Cloud APIs | Empty if unused |
Docker Compose sets DB_* and QDRANT_URL for the backend service; your .env supplies secrets and overrides.
# Install (editable)
pip install -e .
# Run dev server
python main.py
uvicorn main:app --reload --port 8000
# Docker
docker compose up -d --build
docker compose logs -f backend-
Database connection errors
Ensure MySQL is running andDB_*match the server (especially password vsMYSQL_ROOT_PASSWORDin Compose). Checkdocker compose logs mysql. -
Qdrant unreachable
ConfirmQDRANT_URL(usehttp://qdrant:6333inside Docker network,http://localhost:6333from host). Checkdocker compose logs qdrant. -
First request very slow / OOM
Hugging Face models download and load on first use; the app pre-warms embedder and vector store at startup when possible. Reduce model sizes viaEMBEDDING_MODEL/LLM_MODELor switch to API providers. -
401 on document routes
SendAuthorization: Bearer <token>from/auth/token. Register via/auth/registerfirst if no users exist. -
Rate limits
SlowAPI is configured on the app; repeated failures may hit limits—seemain.pyand route decorators if extended.
- Fork the repository
- Create a branch:
git checkout -b feature/your-feature - Commit with clear messages
- Open a Pull Request
Follow existing patterns: thin routers, services for business logic, env-driven RAG providers.
This project is intended for use under the MIT License (see repository license file when added).
Built with FastAPI, Qdrant, MySQL, and Docker