A Streamlit web app that enables Retrieval-Augmented Generation (RAG) on PDF documents using:
- PDF text extraction
- Sentence embeddings with SentenceTransformers
- Large Language Model (LLM) for answer generation (HuggingFace transformers)
- Vector similarity search on MongoDB Atlas
- Chunking of long documents for efficient retrieval
- Upload one or more PDF files and index their contents by splitting into chunks and storing embeddings in MongoDB.
- Perform semantic search over PDF content using vector similarity search.
- Ask natural language questions and receive context-aware answers generated by a pretrained LLM.
- Delete all indexed PDFs and embeddings from the database.
- View already indexed PDF files in the app.
- Supports GPU acceleration (if available) for faster model inference.
flowchart TD
A[Upload PDF] --> B[Extract Text]
B --> C[Chunk Text]
C --> D[Create Embeddings]
D --> E[Store in MongoDB Vector Search]
F[Ask Question] --> G[Embed Query]
G --> H[Vector Search]
H --> I[Top-k Chunks]
I --> J[LLM Answer]
-
Clone the repo:
git clone https://github.com/econexpert/LLMwithRAG.git cd LLMwithRAG -
Add vector search index:
-
Open MongoDB Atlas Console and navigate to your cluster.
-
Go to your database and then your collection (e.g., vectors).
-
Click the Indexes tab.
-
Click Create Index.
-
Choose JSON Editor mode.
-
Paste the JSON configuration above.
-
Give the index a name, for example, "vector_index".
-
Click Create to build the index.
{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
}
]
}
- Create a Virtual Environment
It’s best practice to create a virtual environment to isolate your project dependencies.
On macOS/Linux:
python3 -m venv venv
source venv/bin/activate
On Windows (Command Prompt):
python -m venv venv
venv\Scripts\activate
On Windows (PowerShell):
python -m venv venv
.\venv\Scripts\Activate.ps1
- Install Dependencies
Make sure you have a requirements.txt file in your project folder listing the required packages. Then run:
pip install -r requirements.txt
- Run the Streamlit App
With the environment activated, start your app by running:
streamlit run app.py
This will launch the Streamlit server and open your app in a browser window (usually at http://localhost:8501).
- Python 3.8+
- MongoDB Atlas account (free tier works)
- CUDA-enabled GPU (optional, for faster inference)
- Python packages listed in
requirements.txtincluding:streamlitpypdfsentence-transformerstransformerstorchpymongolangchain_text_splittersnumpy
flowchart TD
A[Start Installation] --> B[Install Python 3.9+]
B --> C[Create Virtual Environment]
C --> D[Activate Virtual Environment]
D --> E[Install Python Dependencies]
E --> F[Install PyTorch]
F --> G[Install Sentence-Transformers]
G --> H[Install Streamlit]
H --> I[Create MongoDB Atlas Account]
I --> J[Create MongoDB Cluster]
J --> K[Create Database and Collection]
K --> L[Create Vector Search Index<br/>Dimensions = 384]
L --> M[Generate MongoDB Connection String]
M --> N[Export MONGO_URI Environment Variable]
N --> O[First Run Downloads Embedding Model]
O --> P[First Run Downloads LLM Model]
P --> Q[Installation Complete]
google/flan-t5-base is a fine-tuned variant of the T5 (Text-to-Text Transfer Transformer) model developed by Google, part of the FLAN family designed for improved instruction-following capabilities.
- Text-to-Text Framework: All tasks are framed as text input to text output.
- Instruction Tuning: Fine-tuned on a diverse set of instructions to better understand and follow prompts.
- Base Model Size: Provides a balanced trade-off between speed and performance.
- Versatile Applications: Suitable for text generation, summarization, translation, question answering, and more.
- Model on Hugging Face: google/flan-t5-base
- Transformers Library: huggingface/transformers

