HECTOR is a high-precision "Hard-RAG" legal intelligence system for Indian Law. It specializes in mapping the transition from the Indian Penal Code (IPC) to the Bharatiya Nyaya Sanhita (BNS), providing authoritative citations from a curated library of Bare Acts and commentaries with zero hallucination.
git clone <repo-url> && cd Hector
cp .env.example .env # Add your API keys
docker compose --profile full up -d
# Frontend: http://localhost:3000
# API: http://localhost:8000
# Docs: http://localhost:8000/docs# Backend
python -m venv venv && venv\Scripts\activate # Windows
pip install -r requirements.txt
cp .env.example .env # Add your API keys
uvicorn api.app:app --reload --port 8000
# Frontend (separate terminal)
cd frontend
npm install && npm run devpip install -e .
hector status # Verify system
hector ingest # Index books
hector search "Section 302 IPC"┌─────────────────────────────────────────────────────────────────┐
│ USER LAYER │
│ ┌──────────┐ ┌──────────────┐ ┌─────────┐ ┌────────────┐ │
│ │ React UI │ │ REST API │ │ CLI │ │ Voice I/O │ │
│ │ (Vite) │ │ (FastAPI) │ │ (Typer) │ │ (Web API) │ │
│ └────┬─────┘ └──────┬───────┘ └────┬────┘ └─────┬──────┘ │
│ └────────────────┴───────────────┴─────────────┘ │
└───────────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────────────▼─────────────────────────────────┐
│ CORE ENGINE │
│ │
│ ┌─────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Router │───▶│ Retriever │───▶│ Verifier │ │
│ │(Groq) │ │(Hybrid RAG) │ │(Chain-of- │ │
│ │ │ │ │ │Verification)│ │
│ └─────────┘ └──────┬───────┘ └─────┬──────┘ │
│ │ │ │
│ ┌─────────────────────▼──────────────────▼─────────────┐ │
│ │ RESPONSE GENERATOR │ │
│ │ (Citation grounding, IPC↔BNS comparison tables) │ │
│ └──────────────────────────────────────────────────────┘ │
└───────────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────────────▼─────────────────────────────────┐
│ DATA LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ ChromaDB │ │ BM25 Index │ │ PDF Corpus │ │
│ │ (Semantic) │ │ (Keyword) │ │ (24 Bare Acts + │ │
│ │ │ │ │ │ 13 Commentaries) │ │
│ └──────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
- Intent Routing -- Taxonomy agent classifies domain (Criminal/Civil/Procedural) to prevent data bleeding
- Hybrid Retrieval -- Semantic search (sentence-transformers) + BM25 keyword search, fused via Reciprocal Rank Fusion, reranked by cross-encoder
- Hierarchical Contextualization -- Sub-clauses automatically pull parent Section, Chapter, and Act titles
- Citation Grounding -- Validator checks response against source; unverified claims flagged, never guessed
- IPC to BNS Mapping -- 495 cross-reference mappings with temporal validation (IPC repealed July 1, 2024)
Copy .env.example to .env and configure:
| Variable | Required | Default | Description |
|---|---|---|---|
HECTOR_API_KEY |
Yes | -- | API authentication key |
HECTOR_JWT_SECRET |
Yes | -- | JWT signing secret (min 32 chars) |
HECTOR_JWT_EXPIRY_SECONDS |
No | 3600 |
Token lifetime |
GROQ_API_KEY |
Yes | -- | Groq API key for LLM routing |
GEMINI_API_KEY |
No | -- | Google Gemini API key |
NVIDIA_API_KEY |
No | -- | NVIDIA NIM API key |
NIM_API_KEY |
No | -- | NVIDIA NIM API key (alt) |
NIM_BASE_URL |
No | https://integrate.api.nvidia.com/v1 |
NIM endpoint |
HECTOR_ROUTER_MODEL |
No | llama-3.3-70b-versatile |
Groq model for routing |
HECTOR_BOOKS_DIR |
No | ./data/Books |
PDF corpus directory |
HECTOR_DB_PATH |
No | ./hector_db |
ChromaDB storage path |
HECTOR_TESSERACT_CMD |
No | tesseract |
Tesseract OCR binary path |
HECTOR_POPPLER_PATH |
No | -- | Poppler bin/ directory (for pdf2image) |
HECTOR_CORS_ORIGINS |
No | http://localhost:3000 |
Comma-separated CORS origins |
HECTOR_LOG_LEVEL |
No | INFO |
Logging level |
HECTOR_DEBUG |
No | false |
Debug mode |
Frontend (frontend/.env):
| Variable | Required | Default | Description |
|---|---|---|---|
VITE_API_URL |
No | http://localhost:8000 |
Backend API URL |
VITE_API_KEY |
No | -- | Pre-configured API key for UI |
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/search |
API Key / JWT | Hybrid legal search |
POST |
/compare |
API Key / JWT | IPC to BNS section comparison |
POST |
/route |
API Key / JWT | Intent classification |
POST |
/ingest |
API Key / JWT | PDF ingestion trigger |
GET |
/status |
API Key / JWT | System health + ChromaDB status |
GET |
/healthz |
None | Liveness probe (for orchestrators) |
GET |
/readyz |
None | Readiness probe (ChromaDB + disk) |
POST |
/auth/token |
API Key | Get JWT bearer token |
WS |
/ws/search |
Query param | Streaming search events |
Authenticate with:
X-API-Key: <your-key>header, orAuthorization: Bearer <jwt-token>header
| Layer | Technology |
|---|---|
| Backend | FastAPI, Python 3.11+ |
| Vector DB | ChromaDB |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Reranker | cross-encoder (ms-marco-MiniLM-L-6-v2) |
| LLM Router | Groq (llama-3.3-70b-versatile) |
| Frontend | Vite 5, React 18, Tailwind CSS 4 |
| OCR | Tesseract 5, Poppler, pdf2image |
| CLI | Typer |
| Containerization | Docker Compose |
Hector/
├── api/ # FastAPI application
│ ├── app.py # Main app, middleware, routes
│ ├── security.py # AuthManager, JWT, bcrypt
│ ├── rate_limit.py # Token bucket rate limiting
│ ├── schemas.py # Pydantic request/response models
│ └── services.py # Business logic layer
├── core/ # Core engine
│ ├── router.py # Intent classification (Groq LLM)
│ ├── orchestrator.py # Query pipeline coordinator
│ ├── hybrid_retriever.py # Semantic + BM25 + cross-encoder
│ ├── verifier.py # Chain-of-Verification
│ ├── response_generator.py # Citation-grounded responses
│ ├── voice.py # Voice I/O (Web Speech API)
│ ├── precedent.py # Precedent analysis
│ ├── enterprise/ # Enterprise user management
│ └── mapping.json # 495 IPC-BNS cross-references
├── data/Books/ # PDF corpus (24 bare acts + commentaries)
├── frontend/ # Vite + React frontend
│ ├── src/ # React components
│ ├── nginx.conf # Production nginx config
│ └── Dockerfile # Multi-stage build
├── tests/ # Test suite
├── utils/ # Ingestion pipeline
│ ├── enhanced_ingestor.py # PDF to ChromaDB pipeline
│ └── legal_structure_parser.py # Legal document parsing
├── docker-compose.yml # Container orchestration
├── requirements.txt # Python dependencies
└── main.py # CLI entry point
- Python 3.11+
- Node.js 18+ (for frontend)
- Tesseract OCR (for scanned PDFs):
winget install UB-Mannheim.TesseractOCR - Poppler (for
pdf2image): Download from github.com/oschwartz10612/poppler-windows - Docker (optional, for containerized deploy)
RuntimeError: HECTOR_API_KEY and HECTOR_JWT_SECRET must be set
Fix: Copy .env.example to .env and add your API keys. The server will not start without them.
TesseractNotFoundError: ...
Fix: Set HECTOR_TESSERACT_CMD in .env to the full path:
HECTOR_TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
Fix: Set HECTOR_POPPLER_PATH in .env to the Poppler bin/ directory:
HECTOR_POPPLER_PATH=C:\path\to\poppler-xx\Library\bin
Fix: Ensure HECTOR_CORS_ORIGINS in .env includes your frontend URL:
HECTOR_CORS_ORIGINS=http://localhost:3000,http://localhost:5173
Fix: Run ingestion first:
hector ingest # via CLI
# or
python main.py ingest # via main.pyThe API enforces rate limiting. Wait for the Retry-After period in the response header.
Fix: Ensure .env exists in the project root. Docker Compose reads it automatically:
cp .env.example .env
# Edit .env with your keys
docker compose --profile full up -dSee LICENSE for details.