Overview

An autonomous AI analyst capable of answering complex business questions by navigating both unstructured documents (PDFs/Reports) and structured relational databases (SQL) while keeping track of every request cost throughout the entire process.

The Problem

You are an AI Data Engineer at a financial tech startup. Non-technical executives need to ask questions like:

"What was Apple's total hardware revenue last year, and what did their Q3 report say about supply chain risks?"

It requires an agent that can dynamically write SQL to calculate the revenue, and simultaneously query a Vector Database to read the Q3 report. A standard LLM cannot answer this hence the need for the RAG Data Analyst

Prerequisites

Docker and Docker Compose Installed
Docker Desktop (for local Qdrant)
Python 3.10+ (for local dependency management and any script inspection)
A .env file created from .env.example
Your Google Gemini API key from Google AI Studio

To Run

Clone this repository

Create and activate a virtual environment:

python -m venv venv

source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Create a .env at the project root from .env.example and add your GOOGLE_API_KEY
Run Qdrant (local development)

Start Qdrant via docker-compose (provided):
```
docker compose up --build
```
If you prefer a managed Qdrant, set QDRANT_URL to the HTTP endpoint.

Prepare documents for the vector embedding

Run the script to extract texts from PDFs and clean texts:
```
python etl/extract_pdfs.py
```
```
python etl/clean.py
```

Prepare vectors & indexes

Run the ETL pipeline to create the collection, payload indexes, and upload vectors:
```
python etl/pipeline.py
```
The pipeline creates the collection and ensures company_name and document_year payload indexes.

Create SQLite DB
```
python etl/extract_financials.py
```
Run the API locally

Start the FastAPI app with Uvicorn:

source venv/bin/activate uvicorn app.main:app --host 0.0.0.0 --port 8080
Open http://localhost:8080/docs to try the /query endpoint.

see REPORT.md for the system architecture diagram, RAGAS Evaluation and Cost Analysis

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
_images		_images
agent		agent
app		app
etl		etl
eval		eval
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
REPORT.md		REPORT.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

The Problem

Prerequisites

To Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

The Problem

Prerequisites

To Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages