📈 NavIQ — Mutual Fund RAG Chatbot

A production-style RAG (Retrieval-Augmented Generation) application that answers natural language questions about Indian Large Cap mutual funds using real AMFI data, semantic search, and LLaMA 3 via Groq.

Ask questions like "Which large cap fund gave the best 3-year returns?" or "Compare SBI and HDFC large cap funds" — and get structured, data-backed answers instantly.

🖥️ Demo

🏗️ Architecture

AMFI API ──────────────┐
(Daily NAV)            ▼
                  fetch_schemes.py ──► scheme_list.csv
mfapi.in ─────────────┐
(5Y Historical NAV)    ▼
                  fetch_nav.py ──────► nav_history.parquet
                       │
                       ▼
              compute_metrics.py ───► fund_metrics.csv
              (CAGR · Sharpe · Sortino)
                       │
                       ▼
              build_chunks.py ──────► chunks.jsonl
              (1 text doc per fund)
                       │
                       ▼
              embed_chunks.py ──────► ChromaDB (local)
              (HuggingFace all-MiniLM-L6-v2)
                       │
                 User Query
                       │
                       ▼
              retriever.py ─────────► Top-5 similar funds
                       │
                       ▼
              llm.py ──────────────► Structured Answer
              (LLaMA 3.3 via Groq)
                       │
                       ▼
              app.py (Streamlit UI)

✨ Features

Real financial data — pulls live NAV from AMFI and 5 years of history from mfapi.in
Quantitative metrics — computes CAGR (1Y/3Y/5Y), Sharpe Ratio, and Sortino Ratio per fund
Semantic search — HuggingFace embeddings find relevant funds even for vague queries
Structured answers — LLM outputs comparison tables, not paragraphs
Fast inference — Groq free tier delivers answers in ~2 seconds
Fully local vector DB — ChromaDB stores embeddings on disk, no external service needed

🛠️ Tech Stack

Layer	Technology
Data ingestion	Python · Requests · Pandas
Metrics computation	NumPy · Pandas
Embeddings	HuggingFace `all-MiniLM-L6-v2`
Vector store	ChromaDB (local, persistent)
LLM	LLaMA 3.3 70B via Groq API (free)
UI	Streamlit
Storage format	Parquet · JSONL · CSV

📁 Project Structure

mf-assistant/
├── data/
│   ├── fetch_schemes.py       # Downloads AMFI fund list
│   ├── fetch_nav.py           # Downloads 5Y NAV history from mfapi.in
│   ├── scheme_list.csv        # ~36 Large Cap funds
│   ├── nav_history.parquet    # ~32,000 rows of daily NAV
│   ├── fund_metrics.csv       # Computed CAGR, Sharpe, Sortino
│   ├── chunks.jsonl           # One text document per fund
│   └── chroma_db/             # Local vector store (auto-generated)
├── processing/
│   ├── compute_metrics.py     # Financial metric calculations
│   └── build_chunks.py        # Converts metrics to RAG-ready text
├── rag/
│   ├── embed_chunks.py        # Embeds chunks → stores in ChromaDB
│   ├── retriever.py           # Semantic search over ChromaDB
│   └── llm.py                 # Groq LLM call + prompt engineering
├── app.py                     # Streamlit UI
├── requirements.txt
├── .env.example
└── .gitignore

🚀 Setup & Run

1. Clone the repo

git clone https://github.com/YOUR_USERNAME/mf-assistant.git
cd mf-assistant

2. Create virtual environment

python -m venv venv
source venv/bin/activate        # Mac/Linux
venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Add your Groq API key

cp .env.example .env
# Edit .env and add your key
# Get a free key at https://console.groq.com

5. Run the data pipeline (one time)

python data/fetch_schemes.py       # ~5 seconds
python data/fetch_nav.py           # ~3-5 minutes
python processing/compute_metrics.py
python processing/build_chunks.py
python rag/embed_chunks.py         # Downloads model on first run (~90MB)

6. Launch the app

streamlit run app.py

Open http://localhost:8501 in your browser.

📊 Sample Output

Query: "Which large cap fund gave the best 3 year returns?"

Fund Name	1Y Return	3Y Return	5Y Return	Sharpe	Sortino	Assessment
BANDHAN Large Cap Fund	2.1%	16.28%	14.9%	0.767	1.023	Strong performer
DSP Large Cap Fund	3.4%	15.91%	13.7%	0.821	1.130	Strong performer
Kotak Large Cap Fund	1.8%	14.21%	12.8%	0.743	0.991	Good performer

Key Takeaway: DSP Large Cap Fund edges out on risk-adjusted returns (higher Sharpe + Sortino), while BANDHAN leads on raw 3Y CAGR. For risk-conscious investors, DSP is the stronger pick.

⚠️ Past performance does not guarantee future returns. This is not financial advice.

⚙️ How the RAG Pipeline Works

Data collection — AMFI provides the master list of all mutual fund schemes. mfapi.in provides historical daily NAV prices going back 5 years.
Metric computation — For each fund, CAGR is computed as (End NAV / Start NAV)^(1/years) - 1. Sharpe and Sortino ratios are computed using daily returns against a 6.5% annualised risk-free rate (Indian 10Y govt bond approximation).
Chunk building — Each fund's metrics are converted into a structured natural language document. This is what the LLM reads — not raw numbers.
Embedding — Each document is embedded using sentence-transformers/all-MiniLM-L6-v2, a 384-dimension model optimised for semantic similarity. Stored in ChromaDB with cosine similarity.
Retrieval — User query is embedded with the same model. Top-5 most similar fund documents are retrieved from ChromaDB.
Generation — Retrieved chunks + user query are sent to LLaMA 3.3 70B (via Groq) with a strict prompt that enforces table-based structured output.

🔑 Environment Variables

# .env.example
GROQ_API_KEY=your_groq_api_key_here

📌 Current Scope & Limitations

Currently covers Large Cap equity funds only (~36 funds)
Data freshness depends on when you last ran fetch_schemes.py and fetch_nav.py
Hindi/multilingual queries work but retrieval quality is lower (model is English-first)
Not financial advice — for educational and portfolio demonstration purposes only

🗺️ Roadmap

Add Mid Cap and Small Cap fund categories
Add fund AUM and expense ratio to chunks
Add benchmark comparison (Nifty 50 vs fund returns)
Deploy on Streamlit Cloud
Add date-aware queries ("best fund of 2023")

👤 Author

Yash Mavare
GitHub · LinkedIn

📄 License

MIT License — free to use, modify, and distribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 NavIQ — Mutual Fund RAG Chatbot

🖥️ Demo

🏗️ Architecture

✨ Features

🛠️ Tech Stack

📁 Project Structure

🚀 Setup & Run

1. Clone the repo

2. Create virtual environment

3. Install dependencies

4. Add your Groq API key

5. Run the data pipeline (one time)

6. Launch the app

📊 Sample Output

⚙️ How the RAG Pipeline Works

🔑 Environment Variables

📌 Current Scope & Limitations

🗺️ Roadmap

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
data		data
processing		processing
rag		rag
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📈 NavIQ — Mutual Fund RAG Chatbot

🖥️ Demo

🏗️ Architecture

✨ Features

🛠️ Tech Stack

📁 Project Structure

🚀 Setup & Run

1. Clone the repo

2. Create virtual environment

3. Install dependencies

4. Add your Groq API key

5. Run the data pipeline (one time)

6. Launch the app

📊 Sample Output

⚙️ How the RAG Pipeline Works

🔑 Environment Variables

📌 Current Scope & Limitations

🗺️ Roadmap

👤 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages