This project is a Retrieval-Augmented Generation (RAG) orchestrator that routes user queries to the correct knowledge source before generating an answer. Instead of putting every document and dataset into one vector database, it keeps separate domain-specific vector stores and uses an intent router to decide which retrieval path should answer the question.
Many user queries belong to different knowledge domains. A single vector database can work for small demos, but mixing unrelated domains often reduces retrieval quality because unrelated chunks compete in the same semantic search space.
This project solves that by using an orchestrator:
- Classify the user's intent.
- Route the query to the correct RAG source.
- Retrieve the most relevant context.
- Generate an answer from the retrieved evidence.
- Build a simple and explainable RAG system.
- Use two separate knowledge bases:
- A database normalization PDF.
- A laptop pricing CSV dataset.
- Route user queries with zero-shot intent classification plus rule-based shortcuts.
- Use a Flask web UI for interaction.
- Retrieve top matching records/chunks and generate grounded responses with an LLM.
- Intent classification for general chat, database normalization, laptop pricing, web search, and HTTP/API errors.
- Qdrant vector database for PDF/database-normalization chunks.
- Supabase PostgreSQL with
pgvectorfor laptop CSV rows. - Sentence embeddings with
all-MiniLM-L6-v2. - Top-k retrieval before LLM answer generation.
- Flask UI showing the detected intent and generated answer.
- Safer ingestion path for laptop data through a staging table.
High-level workflow:
User query
-> Flask UI
-> /chat API
-> Intent router
-> Retrieval path
-> Qdrant for database normalization
-> Supabase pgvector for laptop pricing
-> DuckDuckGo for web search
-> Direct response for general chat
-> LLM generation
-> JSON response
-> Browser UI
| Component | Purpose |
|---|---|
| Flask frontend | Provides the browser interface for submitting queries. |
Flask /chat API |
Coordinates intent classification, retrieval, and answer generation. |
| Intent router | Uses rule-based routing first, then zero-shot classification when needed. |
| Embedding service | Converts text into 384-dimensional vectors using all-MiniLM-L6-v2. |
| Supabase search | Searches laptop rows using pgvector similarity and metadata filters. |
| Qdrant search | Searches database-normalization chunks from a local Qdrant store. |
| Web search | Uses DuckDuckGo search results for open web questions. |
| Generator | Sends retrieved context and the user question to a Hugging Face chat model. |
- Python
- Flask
- Transformers
- Sentence Transformers
- Hugging Face Inference API
- Qdrant
- Supabase PostgreSQL
pgvector- pandas
- DuckDuckGo search through
ddgs - python-dotenv
The original project documentation also describes a PDF ingestion design using unstructured, pytesseract, and Gemini-based image descriptions for multimodal PDF processing. In the current repository, the laptop CSV ingestion is implemented, while ingest_pdf.py and ingest_json.py are still placeholders.
The normalization knowledge base covers:
- Why normalization is needed.
- Insertion, update, and deletion anomalies.
- First Normal Form (1NF).
- Second Normal Form (2NF).
- Third Normal Form (3NF).
- Boyce-Codd Normal Form (BCNF).
- Functional dependencies and schema design concepts.
This data is stored in the local Qdrant vector store.
The laptop pricing dataset contains 238 rows and 13 columns. It includes multiple manufacturers, categories, hardware attributes, and prices.
CSV columns:
Unnamed: 0
Manufacturer
Category
Screen
GPU
OS
CPU_core
Screen_Size_cm
CPU_frequency
RAM_GB
Storage_GB_SSD
Weight_kg
Price
Each row is converted into a natural-language text representation, embedded, and stored in Supabase PostgreSQL with metadata.
The project documentation describes this intended pipeline:
- Load the PDF and parse it into elements.
- For image elements, run OCR using Tesseract.
- Send image/OCR context to an LLM to create clean descriptions.
- Inject those descriptions back into the text stream.
- Chunk the final text.
- Embed chunks with
all-MiniLM-L6-v2. - Store vectors and payload metadata in Qdrant.
The implemented CSV pipeline:
- Read
data_ingestion/raw_data/laptop_pricing_dataset.csv. - Convert each row into a sentence containing brand, screen, CPU, RAM, SSD, weight, and price.
- Generate a 384-dimensional embedding.
- Insert content, metadata, and embedding into a staging table.
- Swap staging into the production
laptopstable only after the full ingest succeeds.
This staging approach prevents failed ingestions from wiping out the existing production table.
The orchestrator predicts an intent, then runs the matching path.
| Intent | Route |
|---|---|
general_chat |
Reply directly without retrieval. |
db_normalization |
Retrieve context from Qdrant. |
laptop_pricing |
Retrieve laptop rows from Supabase pgvector. |
browse_web |
Retrieve snippets from DuckDuckGo. |
http_errors |
Placeholder for future HTTP/API error diagnosis. |
The router first checks obvious keywords such as laptop specs, normal forms, HTTP status codes, and greetings. If no rule matches, it falls back to zero-shot classification.
For RAG-backed intents, the system retrieves relevant context and passes it to the LLM with instructions to answer only from the provided evidence.
Retrieval paths:
- Database normalization: Qdrant similarity search.
- Laptop pricing: Supabase
pgvectorsearch plus metadata filters. - Web questions: DuckDuckGo search snippets.
The answer generator is configured by:
HF_MODEL=Qwen/Qwen2.5-7B-Instruct
You can change this model in .env as long as your Hugging Face provider supports it for chat completion.
RAG_Orchestration/
app.py
pyproject.toml
uv.lock
README.md
.env.example
templates/
index.html
services/
__init__.py
classifier.py
embedder.py
generator.py
search_qdrant.py
search_supabase.py
web_search.py
data_ingestion/
ingest_csv.py
ingest_json.py
ingest_pdf.py
raw_data/
db_normalization.pdf
laptop_pricing_dataset.csv
errors.json
vector_stores/
qdrant_db/
docs/
assets/
rag-doc-000.png
rag-doc-002.png
rag-doc-003.png
rag-doc-004.png
rag-doc-005.png
rag-doc-006.png
rag-doc-007.png
rag-doc-008.png
rag-doc-009.png
Create a local .env file from the example:
cp .env.example .envRequired variables:
POSTGRES_CONNECTION_STRING=postgresql://USER:PASSWORD@HOST:5432/postgres
HF_TOKEN=hf_your_hugging_face_token_here
HF_MODEL=Qwen/Qwen2.5-7B-Instruct
FLASK_DEBUG=0
Do not commit .env. It contains secrets.
Install dependencies:
uv syncActivate the environment:
source .venv/bin/activateRun the Flask app:
python app.pyOpen the UI:
http://127.0.0.1:5000
For local development debug mode:
FLASK_DEBUG=1 python app.pyMake sure Supabase PostgreSQL has the vector extension available. Then run:
python data_ingestion/ingest_csv.pyThe ingestion script creates a staging table, loads all rows into it, and only replaces the production laptops table after the full run succeeds.
Laptop pricing:
list dell laptops 16gb ram
show me 16GB RAM and 512GB SSD Dell laptop under 1000
I need a lightweight Windows laptop
Database normalization:
what is database normalization?
explain 1NF
what is transitive dependency?
Web search:
who was the winner of 2024 cricket world cup?
what is the latest Python version?
General chat:
hello
thanks








