Skip to content

eparirishit/rag-experiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏗️ Inclusion Agent Assistant

A Retrieval-Augmented Generation (RAG) system for querying construction project documentation, subcontractor scopes, and project inclusions/exclusions.

📋 Features

  • ChromaDB - Local vector storage with hybrid search (semantic + BM25 keyword)
  • OpenAI - Embeddings (text-embedding-3-small) and LLM generation (GPT-4o)
  • LangSmith - Observability and tracing for debugging
  • ChatKit UI - Modern React-based chat interface with streaming
  • Source Citations - Shows relevant sources for each answer

🚀 Quick Start

1. Installation

cd rag

# Install Python dependencies with uv
uv sync

# Install frontend dependencies (optional, for web UI)
cd frontend && npm install && cd ..

2. Configuration

Set your OpenAI API key:

# Option A: Environment variable
export OPENAI_API_KEY=sk-...

# Option B: .env file
echo "OPENAI_API_KEY=sk-..." > .env

(Optional) Enable LangSmith observability:

export LANGCHAIN_API_KEY=lsv2_...
export LANGCHAIN_PROJECT=inclusion-agent  # optional

3. Ingest Data

Place Excel files in data/ directory, then run:

uv run python main.py ingest

4. Run the Application

CLI Mode:

uv run python main.py chat

Web UI Mode:

# Terminal 1 - Backend
uv run uvicorn src.api.server:app --reload --port 8000

# Terminal 2 - Frontend
cd frontend && npm run dev

Opens at http://localhost:5173

📁 Project Structure

rag/
├── src/
│   ├── api/                     # FastAPI backend (ChatKit)
│   ├── config.py                # Settings
│   ├── retrieval/
│   │   ├── store.py             # ChromaDB + hybrid search
│   │   └── router.py            # Query routing
│   └── generation/
│       └── chain.py             # RAG chain (search + LLM)
├── frontend/                    # React ChatKit UI
├── main.py                      # CLI entry point
├── data/                        # Excel files to ingest
├── chroma_db/                   # Vector database
└── logs/                        # Application logs

⚙️ Configuration

Edit src/config.py:

Setting Description Default
OPENAI_MODEL Chat model gpt-4o
OPENAI_EMBEDDING_MODEL Embedding model text-embedding-3-small
COLLECTION_NAME ChromaDB collection construction_docs
RETRIEVAL_K Documents per query 10

Environment Variables:

Variable Description
OPENAI_API_KEY OpenAI API key (required)
LANGSMITH_API_KEY LangSmith API key (optional)
LANGSMITH_PROJECT LangSmith project name (optional)

🔍 Example Queries

"What subcontractors handle metal railings?"
"Tell me about Atlantic Aluminum's scope"
"What are the inclusions for project 984673?"

📦 Dependencies

  • ChromaDB - Local vector storage
  • OpenAI - Embeddings + chat completions
  • LangChain - LLM orchestration
  • LangSmith - Observability
  • OpenAI ChatKit - Chat UI (React frontend)
  • FastAPI - Backend server

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors