Skip to content

HadeedTariq-475/RAG-Orchestration

Repository files navigation

RAG Orchestrator over Dual Vector Databases

This project is a Retrieval-Augmented Generation (RAG) orchestrator that routes user queries to the correct knowledge source before generating an answer. Instead of putting every document and dataset into one vector database, it keeps separate domain-specific vector stores and uses an intent router to decide which retrieval path should answer the question.

RAG Orchestrator UI

Problem Statement

Many user queries belong to different knowledge domains. A single vector database can work for small demos, but mixing unrelated domains often reduces retrieval quality because unrelated chunks compete in the same semantic search space.

This project solves that by using an orchestrator:

  1. Classify the user's intent.
  2. Route the query to the correct RAG source.
  3. Retrieve the most relevant context.
  4. Generate an answer from the retrieved evidence.

Goals

  • Build a simple and explainable RAG system.
  • Use two separate knowledge bases:
    • A database normalization PDF.
    • A laptop pricing CSV dataset.
  • Route user queries with zero-shot intent classification plus rule-based shortcuts.
  • Use a Flask web UI for interaction.
  • Retrieve top matching records/chunks and generate grounded responses with an LLM.

Key Features

  • Intent classification for general chat, database normalization, laptop pricing, web search, and HTTP/API errors.
  • Qdrant vector database for PDF/database-normalization chunks.
  • Supabase PostgreSQL with pgvector for laptop CSV rows.
  • Sentence embeddings with all-MiniLM-L6-v2.
  • Top-k retrieval before LLM answer generation.
  • Flask UI showing the detected intent and generated answer.
  • Safer ingestion path for laptop data through a staging table.

System Architecture

High-level workflow:

User query
  -> Flask UI
  -> /chat API
  -> Intent router
  -> Retrieval path
      -> Qdrant for database normalization
      -> Supabase pgvector for laptop pricing
      -> DuckDuckGo for web search
      -> Direct response for general chat
  -> LLM generation
  -> JSON response
  -> Browser UI

Main Components

Component Purpose
Flask frontend Provides the browser interface for submitting queries.
Flask /chat API Coordinates intent classification, retrieval, and answer generation.
Intent router Uses rule-based routing first, then zero-shot classification when needed.
Embedding service Converts text into 384-dimensional vectors using all-MiniLM-L6-v2.
Supabase search Searches laptop rows using pgvector similarity and metadata filters.
Qdrant search Searches database-normalization chunks from a local Qdrant store.
Web search Uses DuckDuckGo search results for open web questions.
Generator Sends retrieved context and the user question to a Hugging Face chat model.

Technology Stack

  • Python
  • Flask
  • Transformers
  • Sentence Transformers
  • Hugging Face Inference API
  • Qdrant
  • Supabase PostgreSQL
  • pgvector
  • pandas
  • DuckDuckGo search through ddgs
  • python-dotenv

The original project documentation also describes a PDF ingestion design using unstructured, pytesseract, and Gemini-based image descriptions for multimodal PDF processing. In the current repository, the laptop CSV ingestion is implemented, while ingest_pdf.py and ingest_json.py are still placeholders.

Data Sources

Database Normalization PDF

The normalization knowledge base covers:

  • Why normalization is needed.
  • Insertion, update, and deletion anomalies.
  • First Normal Form (1NF).
  • Second Normal Form (2NF).
  • Third Normal Form (3NF).
  • Boyce-Codd Normal Form (BCNF).
  • Functional dependencies and schema design concepts.

This data is stored in the local Qdrant vector store.

Laptop Pricing CSV Dataset

The laptop pricing dataset contains 238 rows and 13 columns. It includes multiple manufacturers, categories, hardware attributes, and prices.

CSV columns:

Unnamed: 0
Manufacturer
Category
Screen
GPU
OS
CPU_core
Screen_Size_cm
CPU_frequency
RAM_GB
Storage_GB_SSD
Weight_kg
Price

Each row is converted into a natural-language text representation, embedded, and stored in Supabase PostgreSQL with metadata.

Data Processing and Indexing

PDF Pipeline: Normalization Knowledge Base to Qdrant

The project documentation describes this intended pipeline:

  1. Load the PDF and parse it into elements.
  2. For image elements, run OCR using Tesseract.
  3. Send image/OCR context to an LLM to create clean descriptions.
  4. Inject those descriptions back into the text stream.
  5. Chunk the final text.
  6. Embed chunks with all-MiniLM-L6-v2.
  7. Store vectors and payload metadata in Qdrant.

CSV Pipeline: Laptop Dataset to Supabase pgvector

The implemented CSV pipeline:

  1. Read data_ingestion/raw_data/laptop_pricing_dataset.csv.
  2. Convert each row into a sentence containing brand, screen, CPU, RAM, SSD, weight, and price.
  3. Generate a 384-dimensional embedding.
  4. Insert content, metadata, and embedding into a staging table.
  5. Swap staging into the production laptops table only after the full ingest succeeds.

This staging approach prevents failed ingestions from wiping out the existing production table.

Orchestration

The orchestrator predicts an intent, then runs the matching path.

Intent Route
general_chat Reply directly without retrieval.
db_normalization Retrieve context from Qdrant.
laptop_pricing Retrieve laptop rows from Supabase pgvector.
browse_web Retrieve snippets from DuckDuckGo.
http_errors Placeholder for future HTTP/API error diagnosis.

The router first checks obvious keywords such as laptop specs, normal forms, HTTP status codes, and greetings. If no rule matches, it falls back to zero-shot classification.

Retrieval and Answer Generation

For RAG-backed intents, the system retrieves relevant context and passes it to the LLM with instructions to answer only from the provided evidence.

Retrieval paths:

  • Database normalization: Qdrant similarity search.
  • Laptop pricing: Supabase pgvector search plus metadata filters.
  • Web questions: DuckDuckGo search snippets.

The answer generator is configured by:

HF_MODEL=Qwen/Qwen2.5-7B-Instruct

You can change this model in .env as long as your Hugging Face provider supports it for chat completion.

Screenshots

Database Normalization Query

Database normalization query

Web Search Query

Web search query

General Chat Query

General chat query

Laptop Pricing Query

Laptop pricing query

Supabase pgvector Table

Supabase pgvector table

Qdrant Local Storage

Qdrant local storage

Qdrant local storage rows

Folder Structure from Documentation

Folder structure

Current Folder Structure

RAG_Orchestration/
  app.py
  pyproject.toml
  uv.lock
  README.md
  .env.example
  templates/
    index.html
  services/
    __init__.py
    classifier.py
    embedder.py
    generator.py
    search_qdrant.py
    search_supabase.py
    web_search.py
  data_ingestion/
    ingest_csv.py
    ingest_json.py
    ingest_pdf.py
    raw_data/
      db_normalization.pdf
      laptop_pricing_dataset.csv
      errors.json
  vector_stores/
    qdrant_db/
  docs/
    assets/
      rag-doc-000.png
      rag-doc-002.png
      rag-doc-003.png
      rag-doc-004.png
      rag-doc-005.png
      rag-doc-006.png
      rag-doc-007.png
      rag-doc-008.png
      rag-doc-009.png

Environment Variables

Create a local .env file from the example:

cp .env.example .env

Required variables:

POSTGRES_CONNECTION_STRING=postgresql://USER:PASSWORD@HOST:5432/postgres
HF_TOKEN=hf_your_hugging_face_token_here
HF_MODEL=Qwen/Qwen2.5-7B-Instruct
FLASK_DEBUG=0

Do not commit .env. It contains secrets.

Setup

Install dependencies:

uv sync

Activate the environment:

source .venv/bin/activate

Run the Flask app:

python app.py

Open the UI:

http://127.0.0.1:5000

For local development debug mode:

FLASK_DEBUG=1 python app.py

Ingest Laptop Data

Make sure Supabase PostgreSQL has the vector extension available. Then run:

python data_ingestion/ingest_csv.py

The ingestion script creates a staging table, loads all rows into it, and only replaces the production laptops table after the full run succeeds.

Example Queries

Laptop pricing:

list dell laptops 16gb ram
show me 16GB RAM and 512GB SSD Dell laptop under 1000
I need a lightweight Windows laptop

Database normalization:

what is database normalization?
explain 1NF
what is transitive dependency?

Web search:

who was the winner of 2024 cricket world cup?
what is the latest Python version?

General chat:

hello
thanks

About

This project is a Retrieval-Augmented Generation (RAG) orchestrator that routes user queries to the correct knowledge source before generating an answer. Instead of putting every document and dataset into one vector database, it keeps separate domain-specific vector stores and uses an intent router to decide which retrieval path should answer the q

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors