H.E.C.T.O.R.

Hierarchical Evaluation of Civil-Criminal Textual's Orchestrator & Retrieval

HECTOR is a high-precision "Hard-RAG" legal intelligence system for Indian Law. It specializes in mapping the transition from the Indian Penal Code (IPC) to the Bharatiya Nyaya Sanhita (BNS), providing authoritative citations from a curated library of Bare Acts and commentaries with zero hallucination.

Quick Start

Docker (Recommended)

git clone <repo-url> && cd Hector
cp .env.example .env          # Add your API keys
docker compose --profile full up -d
# Frontend: http://localhost:3000
# API:      http://localhost:8000
# Docs:     http://localhost:8000/docs

Local Development

# Backend
python -m venv venv && venv\Scripts\activate   # Windows
pip install -r requirements.txt
cp .env.example .env            # Add your API keys
uvicorn api.app:app --reload --port 8000

# Frontend (separate terminal)
cd frontend
npm install && npm run dev

CLI

pip install -e .
hector status                   # Verify system
hector ingest                   # Index books
hector search "Section 302 IPC"

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER LAYER                               │
│  ┌──────────┐  ┌──────────────┐  ┌─────────┐  ┌────────────┐  │
│  │ React UI │  │  REST API    │  │   CLI   │  │  Voice I/O │  │
│  │ (Vite)   │  │  (FastAPI)   │  │ (Typer) │  │  (Web API) │  │
│  └────┬─────┘  └──────┬───────┘  └────┬────┘  └─────┬──────┘  │
│       └────────────────┴───────────────┴─────────────┘         │
└───────────────────────────────┬─────────────────────────────────┘
                                │
┌───────────────────────────────▼─────────────────────────────────┐
│                      CORE ENGINE                                │
│                                                                 │
│  ┌─────────┐    ┌──────────────┐    ┌────────────┐             │
│  │ Router  │───▶│  Retriever   │───▶│  Verifier  │             │
│  │(Groq)   │    │(Hybrid RAG)  │    │(Chain-of-  │             │
│  │         │    │              │    │Verification)│             │
│  └─────────┘    └──────┬───────┘    └─────┬──────┘             │
│                        │                  │                     │
│  ┌─────────────────────▼──────────────────▼─────────────┐      │
│  │              RESPONSE GENERATOR                       │      │
│  │  (Citation grounding, IPC↔BNS comparison tables)     │      │
│  └──────────────────────────────────────────────────────┘      │
└───────────────────────────────┬─────────────────────────────────┘
                                │
┌───────────────────────────────▼─────────────────────────────────┐
│                     DATA LAYER                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐    │
│  │   ChromaDB   │  │  BM25 Index  │  │  PDF Corpus        │    │
│  │  (Semantic)  │  │  (Keyword)   │  │  (24 Bare Acts +   │    │
│  │              │  │              │  │   13 Commentaries)  │    │
│  └──────────────┘  └──────────────┘  └────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Query Pipeline

Intent Routing -- Taxonomy agent classifies domain (Criminal/Civil/Procedural) to prevent data bleeding
Hybrid Retrieval -- Semantic search (sentence-transformers) + BM25 keyword search, fused via Reciprocal Rank Fusion, reranked by cross-encoder
Hierarchical Contextualization -- Sub-clauses automatically pull parent Section, Chapter, and Act titles
Citation Grounding -- Validator checks response against source; unverified claims flagged, never guessed
IPC to BNS Mapping -- 495 cross-reference mappings with temporal validation (IPC repealed July 1, 2024)

Environment Variables

Copy .env.example to .env and configure:

Variable	Required	Default	Description
`HECTOR_API_KEY`	Yes	--	API authentication key
`HECTOR_JWT_SECRET`	Yes	--	JWT signing secret (min 32 chars)
`HECTOR_JWT_EXPIRY_SECONDS`	No	`3600`	Token lifetime
`GROQ_API_KEY`	Yes	--	Groq API key for LLM routing
`GEMINI_API_KEY`	No	--	Google Gemini API key
`NVIDIA_API_KEY`	No	--	NVIDIA NIM API key
`NIM_API_KEY`	No	--	NVIDIA NIM API key (alt)
`NIM_BASE_URL`	No	`https://integrate.api.nvidia.com/v1`	NIM endpoint
`HECTOR_ROUTER_MODEL`	No	`llama-3.3-70b-versatile`	Groq model for routing
`HECTOR_BOOKS_DIR`	No	`./data/Books`	PDF corpus directory
`HECTOR_DB_PATH`	No	`./hector_db`	ChromaDB storage path
`HECTOR_TESSERACT_CMD`	No	`tesseract`	Tesseract OCR binary path
`HECTOR_POPPLER_PATH`	No	--	Poppler `bin/` directory (for `pdf2image`)
`HECTOR_CORS_ORIGINS`	No	`http://localhost:3000`	Comma-separated CORS origins
`HECTOR_LOG_LEVEL`	No	`INFO`	Logging level
`HECTOR_DEBUG`	No	`false`	Debug mode

Frontend (frontend/.env):

Variable	Required	Default	Description
`VITE_API_URL`	No	`http://localhost:8000`	Backend API URL
`VITE_API_KEY`	No	--	Pre-configured API key for UI

API Endpoints

Method	Path	Auth	Description
`POST`	`/search`	API Key / JWT	Hybrid legal search
`POST`	`/compare`	API Key / JWT	IPC to BNS section comparison
`POST`	`/route`	API Key / JWT	Intent classification
`POST`	`/ingest`	API Key / JWT	PDF ingestion trigger
`GET`	`/status`	API Key / JWT	System health + ChromaDB status
`GET`	`/healthz`	None	Liveness probe (for orchestrators)
`GET`	`/readyz`	None	Readiness probe (ChromaDB + disk)
`POST`	`/auth/token`	API Key	Get JWT bearer token
`WS`	`/ws/search`	Query param	Streaming search events

Authenticate with:

X-API-Key: <your-key> header, or
Authorization: Bearer <jwt-token> header

Tech Stack

Layer	Technology
Backend	FastAPI, Python 3.11+
Vector DB	ChromaDB
Embeddings	sentence-transformers (`all-MiniLM-L6-v2`)
Reranker	cross-encoder (`ms-marco-MiniLM-L-6-v2`)
LLM Router	Groq (`llama-3.3-70b-versatile`)
Frontend	Vite 5, React 18, Tailwind CSS 4
OCR	Tesseract 5, Poppler, pdf2image
CLI	Typer
Containerization	Docker Compose

Project Structure

Hector/
├── api/                    # FastAPI application
│   ├── app.py              # Main app, middleware, routes
│   ├── security.py         # AuthManager, JWT, bcrypt
│   ├── rate_limit.py       # Token bucket rate limiting
│   ├── schemas.py          # Pydantic request/response models
│   └── services.py         # Business logic layer
├── core/                   # Core engine
│   ├── router.py           # Intent classification (Groq LLM)
│   ├── orchestrator.py     # Query pipeline coordinator
│   ├── hybrid_retriever.py # Semantic + BM25 + cross-encoder
│   ├── verifier.py         # Chain-of-Verification
│   ├── response_generator.py # Citation-grounded responses
│   ├── voice.py            # Voice I/O (Web Speech API)
│   ├── precedent.py        # Precedent analysis
│   ├── enterprise/         # Enterprise user management
│   └── mapping.json        # 495 IPC-BNS cross-references
├── data/Books/             # PDF corpus (24 bare acts + commentaries)
├── frontend/               # Vite + React frontend
│   ├── src/                # React components
│   ├── nginx.conf          # Production nginx config
│   └── Dockerfile          # Multi-stage build
├── tests/                  # Test suite
├── utils/                  # Ingestion pipeline
│   ├── enhanced_ingestor.py # PDF to ChromaDB pipeline
│   └── legal_structure_parser.py # Legal document parsing
├── docker-compose.yml      # Container orchestration
├── requirements.txt        # Python dependencies
└── main.py                 # CLI entry point

Prerequisites

Python 3.11+
Node.js 18+ (for frontend)
Tesseract OCR (for scanned PDFs): winget install UB-Mannheim.TesseractOCR
Poppler (for pdf2image): Download from github.com/oschwartz10612/poppler-windows
Docker (optional, for containerized deploy)

Troubleshooting

Server refuses to start -- missing environment variables

RuntimeError: HECTOR_API_KEY and HECTOR_JWT_SECRET must be set

Fix: Copy .env.example to .env and add your API keys. The server will not start without them.

Tesseract not found

TesseractNotFoundError: ...

Fix: Set HECTOR_TESSERACT_CMD in .env to the full path:

HECTOR_TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe

Poppler not found (PDF to image conversion fails)

Fix: Set HECTOR_POPPLER_PATH in .env to the Poppler bin/ directory:

HECTOR_POPPLER_PATH=C:\path\to\poppler-xx\Library\bin

CORS errors in browser

Fix: Ensure HECTOR_CORS_ORIGINS in .env includes your frontend URL:

HECTOR_CORS_ORIGINS=http://localhost:3000,http://localhost:5173

ChromaDB collection not found

Fix: Run ingestion first:

hector ingest           # via CLI
# or
python main.py ingest   # via main.py

Rate limited (429 responses)

The API enforces rate limiting. Wait for the Retry-After period in the response header.

Docker build fails

Fix: Ensure .env exists in the project root. Docker Compose reads it automatically:

cp .env.example .env
# Edit .env with your keys
docker compose --profile full up -d

License

See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.claude		.claude
.github/workflows		.github/workflows
.idea		.idea
api		api
core		core
data		data
docs		docs
frontend		frontend
monitoring		monitoring
scripts		scripts
tests		tests
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
HECTOR_CLI.md		HECTOR_CLI.md
PENDING_WORK.md		PENDING_WORK.md
README.md		README.md
activate_and_run.sh		activate_and_run.sh
design.md		design.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
hector		hector
hector.bat		hector.bat
hector.cmd		hector.cmd
hector.py		hector.py
main.py		main.py
netlify.toml		netlify.toml
nginx.prod.conf		nginx.prod.conf
project-status.md		project-status.md
rename_books.py		rename_books.py
requirements.txt		requirements.txt
setup-path.sh		setup-path.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

H.E.C.T.O.R.

Hierarchical Evaluation of Civil-Criminal Textual's Orchestrator & Retrieval

Quick Start

Docker (Recommended)

Local Development

CLI

Architecture

Query Pipeline

Environment Variables

API Endpoints

Tech Stack

Project Structure

Prerequisites

Troubleshooting

Server refuses to start -- missing environment variables

Tesseract not found

Poppler not found (PDF to image conversion fails)

CORS errors in browser

ChromaDB collection not found

Rate limited (429 responses)

Docker build fails

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

H.E.C.T.O.R.

Hierarchical Evaluation of Civil-Criminal Textual's Orchestrator & Retrieval

Quick Start

Docker (Recommended)

Local Development

CLI

Architecture

Query Pipeline

Environment Variables

API Endpoints

Tech Stack

Project Structure

Prerequisites

Troubleshooting

Server refuses to start -- missing environment variables

Tesseract not found

Poppler not found (PDF to image conversion fails)

CORS errors in browser

ChromaDB collection not found

Rate limited (429 responses)

Docker build fails

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages