This repository contains a sophisticated, multi-agent AI pipeline designed to intelligently predict Canadian Harmonized Tariff Schedule (HTS) codes from plain-text item descriptions. The system leverages LangGraph, LangChain, and Chroma vector databases to achieve high-accuracy classifications through a combination of live web research, official document retrieval, and self-consistency voting.
- Multi-Agent Orchestration: Powered by LangGraph for structured, stateful workflows.
- Search Agent: Real-time web browsing using DuckDuckGo and BeautifulSoup to gather material composition and industry context.
- RAG Agent: Contextual retrieval from the official 2024 Canadian Tariff Schedule (PDF) via ChromaDB.
- Self-Consistency Voting: Uses a 3-instance ensemble with prompt perturbations to calculate confidence scores for each digit of the HTS code.
- Human-in-the-Loop Escalation: Automatically triggers a clarification chat interface if the AI's confidence in any HTS element falls below 60%.
- Modern Web UI: A glassmorphic, responsive web interface built with FastAPI and Vanilla CSS/JS for real-time interaction and reasoning visualization.
- Model Agnostic: Supports Google Gemini (Cloud) and Ollama (Local) pipelines.
The classification workflow follows a directed acyclic graph (DAG):
- Search Node: Formulates optimized queries, scrapes results, and extracts relevant technical specifications.
- RAG Node: Vectorizes the item description to pull specific legal headings and subheadings from the local 1,500+ page tariff schedule.
- Decision Node:
- Synthesizes Web + PDF context.
- Runs 3 independent classification attempts.
- Performs element-wise voting to determine the final 10-digit code (
XXXX.XX.XX.XX). - Calculates a confidence score for each segment (Chapter, Heading, Subheading, etc.).
- Escalation Logic: If consensus is not reached, it prepares a targeted clarifying question for the user.
Ensure you have Python 3.11+ installed. We recommend using a virtual environment.
# Install dependencies
pip install -r requirements.txt
# Note: Ensure fastapi, uvicorn, and duckduckgo-search (ddgs) are also installed.Create a .env file in the root directory:
GEMINI_API_KEY=your_google_ai_studio_api_keyPopulate the Chroma vector store with extracts from the official tariff.pdf:
python data_ingestion.pyThis creates the chroma_db/ directory.
Start the interactive web interface to experience the human-escalation chatbot and visual reasoning chain.
python app.pyAccess the UI at: http://localhost:8001
Test individual descriptions and view full internal logs in the terminal.
- Gemini (Cloud):
python run_gemini.py - Ollama (Local):
python run_ollama.py
Benchmark the system against data.csv to calculate accuracy and generate performance reports.
python evaluate.pyResults are saved to evaluation_results.csv.
| File | Description |
|---|---|
app.py |
FastAPI application serving the Web UI and API endpoints. |
graph.py |
Core LangGraph logic, agent node definitions, and escalation chat. |
data_ingestion.py |
PDF parsing and vector database (RAG) initialization. |
evaluate.py |
Metric generation and batch testing script. |
run_*.py |
CLI entry points for specific LLM backends. |
static/ |
Frontend assets (HTML/CSS/JS) for the web interface. |
tariff.pdf |
The official reference document for Canadian HTS codes. |
requirements.txt |
Python dependencies. |
This project is intended for research and internal logistical optimization. Always verify final HTS codes with official customs documentation for legal compliance.