A production-grade Retrieval-Augmented Generation (RAG) application that lets you chat with your documents using AI. Upload PDFs, Markdown, or text files and ask questions - the system will search through your documents and provide accurate answers with source citations.
This is an intelligent document Q&A system that:
- π Ingests your documents - Upload PDFs, Markdown (.md), or text files
- π Smart search - Uses hybrid retrieval combining keyword (BM25) and semantic (vector) search
- π― Accurate answers - Reranks results for precision and generates answers using AI
- π Source citations - Every answer includes references to specific document chunks
- π¬ Chat interface - Clean, modern UI with conversation history
FREE APIs (no credit card required!):
- Groq API - Ultra-fast LLM inference with Llama 3.3 70B (completely free!)
- Sentence-Transformers - Local embeddings, runs on your machine (no API needed!)
- Cohere - Cross-encoder reranking (has free tier)
Framework & Libraries:
- LangChain - RAG pipeline orchestration
- FastAPI - Backend REST API
- Streamlit - Interactive web UI
- ChromaDB - Vector database for semantic search
- BM25 - Keyword search algorithm
- Python 3.13 - Core language
User Question
β
βΌ
ββββββββββββββββββββββββββββ
β Hybrid Retrieval β
β βββββββββββ¬ββββββββββββ β
β β BM25 β Vector β β
β β(keyword)β(semantic) β β
β ββββββ¬βββββ΄ββββββ¬ββββββ β
β ββββββ¬ββββββ β
β Reciprocal Rank β
β Fusion (RRF) β
βββββββββββββ¬βββββββββββββββ
βΌ
ββββββββββββββββββββββββββββ
β Cross-Encoder Reranker β
β (Cohere rerank-v3.0) β
βββββββββββββ¬βββββββββββββββ
βΌ
ββββββββββββββββββββββββββββ
β LLM Generation with β
β Citation Enforcement β
β (Groq Llama 3.3 70B) β
βββββββββββββ¬βββββββββββββββ
βΌ
Answer + [Source: file, Chunk N]
- Document Ingestion - Your documents are split into chunks and indexed
- Hybrid Search - When you ask a question, both keyword and semantic search run in parallel
- Fusion - Results are merged using Reciprocal Rank Fusion for better relevance
- Reranking - A cross-encoder model reranks the top results for maximum precision
- Generation - The LLM generates an answer based on the most relevant chunks
- Citations - Every answer includes source references so you can verify the information
- Hybrid Retrieval β Combines BM25 keyword search with dense vector search for best results
- Cross-Encoder Reranking β Uses Cohere's reranker to boost precision
- Citation Enforcement β Every answer includes traceable
[Source: file, Chunk N]references - Chat History β Save and load your conversations
- Modern UI β Clean Streamlit interface with expandable citation cards
- REST API β FastAPI backend with
/ask,/ingest, and/healthendpoints - 100% FREE β Uses Groq API (free) and local embeddings (no API costs!)
- Easy Setup β One-click run with
RUN_PROJECT.bat
- Python 3.13 (or 3.10+)
- Git (optional, for cloning)
git clone https://github.com/2024yuva/AskMyDocs.git
cd AskMyDocspython -m venv venv
venv\Scripts\activatepip install -r requirements.txtThis will install all necessary packages including:
- LangChain and Groq integration
- Sentence-transformers for embeddings
- FastAPI and Streamlit
- ChromaDB and other dependencies
Edit the .env file in the project root:
# Get your FREE Groq API key at: https://console.groq.com/keys
GROQ_API_KEY=your_groq_api_key_here
# Optional: Get Cohere API key at: https://dashboard.cohere.com/api-keys
COHERE_API_KEY=your_cohere_api_key_hereNote: Groq API is completely free! Just sign up and get your key.
Simply double-click RUN_PROJECT.bat or run in terminal:
RUN_PROJECT.batThis will:
- Stop any running servers
- Start the FastAPI backend on port 8000
- Start the Streamlit UI on port 8501
- Open two terminal windows (one for API, one for UI)
- Open your browser to http://localhost:8501
- Upload documents using the sidebar (PDF, Markdown, or text files)
- Click "π Ingest Documents" to process them
- Start asking questions in the chat!
Press any key in the main terminal window, or close both terminal windows.
AskMyDocs/
βββ RUN_PROJECT.bat # β Main entry point - run this!
βββ README.md # This file
βββ .env # Your API keys (create from .env.example)
βββ .env.example # Template for environment variables
βββ requirements.txt # Python dependencies
β
βββ app/ # Main application code
β βββ config.py # Configuration and environment variables
β βββ ingest.py # Document loading and chunking
β βββ retriever.py # Hybrid retrieval (BM25 + Vector) + reranking
β βββ chain.py # RAG pipeline orchestration
β βββ prompts.py # Prompt templates with citation enforcement
β βββ chat_history.py # Chat history management
β β
β βββ api/ # FastAPI backend
β β βββ main.py # API endpoints
β β βββ schemas.py # Request/response models
β β
β βββ ui/ # Streamlit frontend
β βββ app.py # Web interface
β
βββ docs/ # π Put your documents here!
β βββ rag_overview.md # Sample documents
β βββ langchain_guide.md
β βββ evaluation_metrics.md
β
βββ tests/ # Test suite
β βββ test_ingest.py
β βββ test_retriever.py
β βββ test_chain.py
β βββ test_api.py
β
βββ eval/ # Evaluation pipeline
β βββ golden_qa.json # Test Q&A dataset
β βββ evaluate.py # Ragas evaluation
β
βββ chroma_db/ # Vector database (auto-generated)
βββ chat_history/ # Saved conversations (auto-generated)
βββ bm25_index.pkl # BM25 index (auto-generated)
-
Place your documents in the
docs/folder- Supported formats: PDF (.pdf), Markdown (.md), Text (.txt)
- Can organize in subfolders
-
Open the Streamlit UI (http://localhost:8501)
-
Use the sidebar to upload files or click "π Ingest Documents"
-
Wait for ingestion to complete (you'll see a success message)
-
Type your question in the chat input at the bottom
-
The system will:
- Search through your documents
- Find the most relevant chunks
- Generate an answer with citations
-
Click on "π Citations" to see which documents were used
-
Click on "π Source Documents" to see the actual text chunks
-
Click "πΎ Save Conversation" in the sidebar
-
Your chat history is saved to
chat_history/folder -
Click "π View History" to see past conversations
-
Load previous conversations by clicking "Load"
You can also use the REST API directly:
# Ask a question
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is RAG?"}'
# Trigger ingestion
curl -X POST http://localhost:8000/ingest
# Health check
curl http://localhost:8000/healthAPI documentation available at: http://localhost:8000/docs
All settings can be customized via environment variables in .env:
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
β | Groq API key (FREE at console.groq.com) |
COHERE_API_KEY |
β | Cohere API key for reranking (optional) |
LLM_MODEL |
llama-3.3-70b-versatile |
Groq chat model |
EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Local embedding model |
CHUNK_SIZE |
1000 |
Chunk size in characters |
CHUNK_OVERLAP |
200 |
Overlap between chunks |
RETRIEVER_K |
10 |
Number of documents to retrieve |
RERANK_TOP_N |
5 |
Number of documents after reranking |
BM25_WEIGHT |
0.5 |
Weight for BM25 in hybrid search |
VECTOR_WEIGHT |
0.5 |
Weight for vector search in hybrid search |
pytest tests/ -vpython eval/evaluate.pyThis evaluates the RAG pipeline using Ragas metrics:
- Faithfulness (answer accuracy)
- Answer relevancy
- Context precision
- Context recall
- Check if port 8000 is already in use
- Verify virtual environment is activated:
venv\Scripts\activate - Check for errors in the API terminal window
- Ensure documents are in the
docs/folder - Verify your Groq API key is valid in
.env - Check that sentence-transformers is installed:
pip install sentence-transformers
- Make sure you've ingested documents first (click "π Ingest Documents")
- Verify both API server and UI are running
- Check your Groq API key is correct
- Reduce
CHUNK_SIZEin.env - Reduce
RETRIEVER_Kto retrieve fewer documents - Process fewer documents at once
- Go to https://console.groq.com/keys
- Sign up for a free account
- Create an API key
- Copy to
.envasGROQ_API_KEY
- Go to https://dashboard.cohere.com/api-keys
- Sign up for a free account
- Create an API key
- Copy to
.envasCOHERE_API_KEY
Contributions are welcome! Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
MIT
Built with:
- LangChain - RAG framework
- Groq - Ultra-fast LLM inference
- Cohere - Cross-encoder reranking
- Sentence-Transformers - Local embeddings
- FastAPI - Backend framework
- Streamlit - UI framework
- ChromaDB - Vector database
Made with β€οΈ by 2024yuva
Repository: https://github.com/2024yuva/AskMyDocs