A Retrieval-Augmented Generation (RAG) system for querying construction project documentation, subcontractor scopes, and project inclusions/exclusions.
- ChromaDB - Local vector storage with hybrid search (semantic + BM25 keyword)
- OpenAI - Embeddings (text-embedding-3-small) and LLM generation (GPT-4o)
- LangSmith - Observability and tracing for debugging
- ChatKit UI - Modern React-based chat interface with streaming
- Source Citations - Shows relevant sources for each answer
cd rag
# Install Python dependencies with uv
uv sync
# Install frontend dependencies (optional, for web UI)
cd frontend && npm install && cd ..Set your OpenAI API key:
# Option A: Environment variable
export OPENAI_API_KEY=sk-...
# Option B: .env file
echo "OPENAI_API_KEY=sk-..." > .env(Optional) Enable LangSmith observability:
export LANGCHAIN_API_KEY=lsv2_...
export LANGCHAIN_PROJECT=inclusion-agent # optionalPlace Excel files in data/ directory, then run:
uv run python main.py ingestCLI Mode:
uv run python main.py chatWeb UI Mode:
# Terminal 1 - Backend
uv run uvicorn src.api.server:app --reload --port 8000
# Terminal 2 - Frontend
cd frontend && npm run devOpens at http://localhost:5173
rag/
├── src/
│ ├── api/ # FastAPI backend (ChatKit)
│ ├── config.py # Settings
│ ├── retrieval/
│ │ ├── store.py # ChromaDB + hybrid search
│ │ └── router.py # Query routing
│ └── generation/
│ └── chain.py # RAG chain (search + LLM)
├── frontend/ # React ChatKit UI
├── main.py # CLI entry point
├── data/ # Excel files to ingest
├── chroma_db/ # Vector database
└── logs/ # Application logs
Edit src/config.py:
| Setting | Description | Default |
|---|---|---|
OPENAI_MODEL |
Chat model | gpt-4o |
OPENAI_EMBEDDING_MODEL |
Embedding model | text-embedding-3-small |
COLLECTION_NAME |
ChromaDB collection | construction_docs |
RETRIEVAL_K |
Documents per query | 10 |
Environment Variables:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key (required) |
LANGSMITH_API_KEY |
LangSmith API key (optional) |
LANGSMITH_PROJECT |
LangSmith project name (optional) |
"What subcontractors handle metal railings?"
"Tell me about Atlantic Aluminum's scope"
"What are the inclusions for project 984673?"
- ChromaDB - Local vector storage
- OpenAI - Embeddings + chat completions
- LangChain - LLM orchestration
- LangSmith - Observability
- OpenAI ChatKit - Chat UI (React frontend)
- FastAPI - Backend server