Skip to content

SaiSreenivasReddy/QuantMinds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuantMinds Financial Assistant

QuantMinds is a financial question-answering system that combines Retrieval-Augmented Generation (RAG), agentic orchestration, human-in-the-loop review, and Signal messaging integration.

It ingests PDFs, extracts text page by page, chunks and embeds content, indexes vectors with FAISS, serves a modern Gradio chatbot UI with source citations, and offers an advanced agentic mode with 6 specialized agents for hybrid internal/external research and visualization.

Key Features

  • PDF ingestion with incremental sync (10-50x faster updates via MD5 hash tracking)
  • Hybrid retrieval (FAISS dense + BM25 lexical with Reciprocal Rank Fusion)
  • Classic RAG mode for fast, grounded answers with 83% factual accuracy
  • Agentic mode with 6 specialist agents: Router, Internal, External, Synthesizer, Visualizer, Orchestrator
  • Human-in-the-loop review (approve/rewrite/research actions)
  • Live source panel and session history in UI
  • Signal bot integration for messaging
  • Real-time analytics dashboard and cost tracking
  • Full execution tracing and report generation

Project Structure

QuantMinds/
	app/
		app.py                # Gradio UI
		evaluate.py           # Evaluation runner
	data/
		pdfs/                 # Input PDFs
		corpus.json           # Extracted pages
		chunks.json           # Chunk metadata used by retrieval
		my_index.faiss        # Vector index
		pipeline_state.json   # Change-detection state (auto-generated)
	scripts/
		extract.py            # PDF to corpus extractor
		rag_pipeline.py       # Main smart-sync pipeline entrypoint
		rag/
			chunking.py
			embedding.py
			indexing.py
			retrieval.py
			generation.py
			pipeline.py         # Incremental sync + full build logic

Requirements

Install dependencies from requirements.txt:

pip install -r requirements.txt

Required environment variable:

OPENAI_API_KEY=your_api_key_here

You can place this in a .env file at project root.

How The Pipeline Works

Smart Sync Behavior

The pipeline uses change detection on data/pdfs/.

  • If PDFs are added, removed, or modified: rebuild extraction + chunks + embeddings + index
  • If pipeline config changes: rebuild
  • If nothing changed: skip rebuild and reuse existing artifacts

This is tracked in data/pipeline_state.json.

Main Pipeline Command

Recommended command:

python scripts/rag_pipeline.py

This runs smart sync automatically and then performs a retrieval sanity check.

Alternative Explicit Commands

Only sync/rebuild logic (no retrieval sanity check):

python scripts/rag/pipeline.py --sync

Force rebuild:

python scripts/rag/pipeline.py --sync --force

Running The App

Start chatbot UI:

python app/app.py

On startup, app.py runs smart pipeline sync first.

  • PDFs changed -> rebuild pipeline
  • No changes -> skip rebuild

Then the UI opens.

UI Features

  • Two-pane interface:
    • Left: chat interaction
    • Right: source PDF pages for latest answer
  • Query result caching in memory for repeated questions during the same run
  • Index/chunks are loaded once per server process and reused

Evaluation

Run evaluation suite:

python app/evaluate.py

Categories covered:

  • factual
  • cross-reference
  • out-of-scope
  • ambiguous
  • no-answer
  • prompt-injection

Prompt Guardrails

The answer generation prompt is tuned to:

  • use only provided context
  • consider all provided sources
  • stay concise (2-3 sentences)
  • refuse when context is insufficient
  • ignore malicious instructions in user input/context
  • cite sources

Cost Notes

  • Major cost comes from embeddings and generation calls
  • Rebuilds are skipped when PDFs/config are unchanged
  • Repeated identical chat queries in one app session are served from in-memory cache

Troubleshooting

ModuleNotFoundError: No module named 'scripts'

  • Run commands from project root (QuantMinds/)
  • Use the provided entrypoints as documented

Missing index error

  • Ensure data/pdfs/ contains PDFs
  • Run:
python scripts/rag_pipeline.py

API key issues

  • Ensure OPENAI_API_KEY is set in environment or .env

Notes

  • Python standard library imports (such as os, sys, json, argparse) are not listed in requirements.txt
  • Only third-party packages belong in requirements.txt

Releases

No releases published

Packages

 
 
 

Contributors