An autonomous AI analyst capable of answering complex business questions by navigating both unstructured documents (PDFs/Reports) and structured relational databases (SQL) while keeping track of every request cost throughout the entire process.
You are an AI Data Engineer at a financial tech startup. Non-technical executives need to ask questions like:
"What was Apple's total hardware revenue last year, and what did their Q3 report say about supply chain risks?"
It requires an agent that can dynamically write SQL to calculate the revenue, and simultaneously query a Vector Database to read the Q3 report. A standard LLM cannot answer this hence the need for the RAG Data Analyst
- Docker and Docker Compose Installed
- Docker Desktop (for local Qdrant)
- Python 3.10+ (for local dependency management and any script inspection)
- A
.envfile created from.env.example - Your Google Gemini API key from Google AI Studio
-
Clone this repository
-
Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Create a
.envat the project root from.env.exampleand add yourGOOGLE_API_KEY -
Run Qdrant (local development)
-
Start Qdrant via docker-compose (provided):
docker compose up --build
-
If you prefer a managed Qdrant, set
QDRANT_URLto the HTTP endpoint.
- Prepare documents for the vector embedding
-
Run the script to extract texts from PDFs and clean texts:
python etl/extract_pdfs.py
python etl/clean.py
- Prepare vectors & indexes
-
Run the ETL pipeline to create the collection, payload indexes, and upload vectors:
python etl/pipeline.py
The pipeline creates the collection and ensures
company_nameanddocument_yearpayload indexes.
-
Create SQLite DB
python etl/extract_financials.py
-
Run the API locally
-
Start the FastAPI app with Uvicorn:
source venv/bin/activate uvicorn app.main:app --host 0.0.0.0 --port 8080
-
Open http://localhost:8080/docs to try the
/queryendpoint.
see REPORT.md for the system architecture diagram, RAGAS Evaluation and Cost Analysis