Skip to content

YashRM27/NavIQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📈 NavIQ — Mutual Fund RAG Chatbot

A production-style RAG (Retrieval-Augmented Generation) application that answers natural language questions about Indian Large Cap mutual funds using real AMFI data, semantic search, and LLaMA 3 via Groq.

Ask questions like "Which large cap fund gave the best 3-year returns?" or "Compare SBI and HDFC large cap funds" — and get structured, data-backed answers instantly.


🖥️ Demo

MF Assistant Demo


🏗️ Architecture

AMFI API ──────────────┐
(Daily NAV)            ▼
                  fetch_schemes.py ──► scheme_list.csv
mfapi.in ─────────────┐
(5Y Historical NAV)    ▼
                  fetch_nav.py ──────► nav_history.parquet
                       │
                       ▼
              compute_metrics.py ───► fund_metrics.csv
              (CAGR · Sharpe · Sortino)
                       │
                       ▼
              build_chunks.py ──────► chunks.jsonl
              (1 text doc per fund)
                       │
                       ▼
              embed_chunks.py ──────► ChromaDB (local)
              (HuggingFace all-MiniLM-L6-v2)
                       │
                 User Query
                       │
                       ▼
              retriever.py ─────────► Top-5 similar funds
                       │
                       ▼
              llm.py ──────────────► Structured Answer
              (LLaMA 3.3 via Groq)
                       │
                       ▼
              app.py (Streamlit UI)

✨ Features

  • Real financial data — pulls live NAV from AMFI and 5 years of history from mfapi.in
  • Quantitative metrics — computes CAGR (1Y/3Y/5Y), Sharpe Ratio, and Sortino Ratio per fund
  • Semantic search — HuggingFace embeddings find relevant funds even for vague queries
  • Structured answers — LLM outputs comparison tables, not paragraphs
  • Fast inference — Groq free tier delivers answers in ~2 seconds
  • Fully local vector DB — ChromaDB stores embeddings on disk, no external service needed

🛠️ Tech Stack

Layer Technology
Data ingestion Python · Requests · Pandas
Metrics computation NumPy · Pandas
Embeddings HuggingFace all-MiniLM-L6-v2
Vector store ChromaDB (local, persistent)
LLM LLaMA 3.3 70B via Groq API (free)
UI Streamlit
Storage format Parquet · JSONL · CSV

📁 Project Structure

mf-assistant/
├── data/
│   ├── fetch_schemes.py       # Downloads AMFI fund list
│   ├── fetch_nav.py           # Downloads 5Y NAV history from mfapi.in
│   ├── scheme_list.csv        # ~36 Large Cap funds
│   ├── nav_history.parquet    # ~32,000 rows of daily NAV
│   ├── fund_metrics.csv       # Computed CAGR, Sharpe, Sortino
│   ├── chunks.jsonl           # One text document per fund
│   └── chroma_db/             # Local vector store (auto-generated)
├── processing/
│   ├── compute_metrics.py     # Financial metric calculations
│   └── build_chunks.py        # Converts metrics to RAG-ready text
├── rag/
│   ├── embed_chunks.py        # Embeds chunks → stores in ChromaDB
│   ├── retriever.py           # Semantic search over ChromaDB
│   └── llm.py                 # Groq LLM call + prompt engineering
├── app.py                     # Streamlit UI
├── requirements.txt
├── .env.example
└── .gitignore

🚀 Setup & Run

1. Clone the repo

git clone https://github.com/YOUR_USERNAME/mf-assistant.git
cd mf-assistant

2. Create virtual environment

python -m venv venv
source venv/bin/activate        # Mac/Linux
venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Add your Groq API key

cp .env.example .env
# Edit .env and add your key
# Get a free key at https://console.groq.com

5. Run the data pipeline (one time)

python data/fetch_schemes.py       # ~5 seconds
python data/fetch_nav.py           # ~3-5 minutes
python processing/compute_metrics.py
python processing/build_chunks.py
python rag/embed_chunks.py         # Downloads model on first run (~90MB)

6. Launch the app

streamlit run app.py

Open http://localhost:8501 in your browser.


📊 Sample Output

Query: "Which large cap fund gave the best 3 year returns?"

Fund Name 1Y Return 3Y Return 5Y Return Sharpe Sortino Assessment
BANDHAN Large Cap Fund 2.1% 16.28% 14.9% 0.767 1.023 Strong performer
DSP Large Cap Fund 3.4% 15.91% 13.7% 0.821 1.130 Strong performer
Kotak Large Cap Fund 1.8% 14.21% 12.8% 0.743 0.991 Good performer

Key Takeaway: DSP Large Cap Fund edges out on risk-adjusted returns (higher Sharpe + Sortino), while BANDHAN leads on raw 3Y CAGR. For risk-conscious investors, DSP is the stronger pick.

⚠️ Past performance does not guarantee future returns. This is not financial advice.


⚙️ How the RAG Pipeline Works

  1. Data collection — AMFI provides the master list of all mutual fund schemes. mfapi.in provides historical daily NAV prices going back 5 years.

  2. Metric computation — For each fund, CAGR is computed as (End NAV / Start NAV)^(1/years) - 1. Sharpe and Sortino ratios are computed using daily returns against a 6.5% annualised risk-free rate (Indian 10Y govt bond approximation).

  3. Chunk building — Each fund's metrics are converted into a structured natural language document. This is what the LLM reads — not raw numbers.

  4. Embedding — Each document is embedded using sentence-transformers/all-MiniLM-L6-v2, a 384-dimension model optimised for semantic similarity. Stored in ChromaDB with cosine similarity.

  5. Retrieval — User query is embedded with the same model. Top-5 most similar fund documents are retrieved from ChromaDB.

  6. Generation — Retrieved chunks + user query are sent to LLaMA 3.3 70B (via Groq) with a strict prompt that enforces table-based structured output.


🔑 Environment Variables

# .env.example
GROQ_API_KEY=your_groq_api_key_here

📌 Current Scope & Limitations

  • Currently covers Large Cap equity funds only (~36 funds)
  • Data freshness depends on when you last ran fetch_schemes.py and fetch_nav.py
  • Hindi/multilingual queries work but retrieval quality is lower (model is English-first)
  • Not financial advice — for educational and portfolio demonstration purposes only

🗺️ Roadmap

  • Add Mid Cap and Small Cap fund categories
  • Add fund AUM and expense ratio to chunks
  • Add benchmark comparison (Nifty 50 vs fund returns)
  • Deploy on Streamlit Cloud
  • Add date-aware queries ("best fund of 2023")

👤 Author

Yash Mavare
GitHub · LinkedIn


📄 License

MIT License — free to use, modify, and distribute.

About

RAG-based mutual fund analysis tool for Indian investors. Fetches real AMFI data, computes CAGR · Sharpe · Sortino, and answers natural language queries with structured fund comparisons using LLaMA 3.3 + ChromaDB + Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages