AI-powered compliance risk detection for IDB project documents.
TextGuard 2.0 uses a Hierarchical Attention Network (HAN) built on DistilBERT to analyze PDF documents and detect compliance risks at both the document and sentence level. It categorizes flagged sentences into business-friendly risk domains and produces annotated PDFs with highlighted risk areas.
- Document-Level Risk Classification — HAN-CE model with soft attention over text chunks
- Sentence-Level Risk Scoring — Flags individual sentences using fine-tuned DistilBERT weights
- Categorized Compliance Report — Risks grouped into:
- 💰 Financial & Funding Risks
- 🌳 Socio-Environmental Risks
- ⚖️ Governance & Policy Risks
- ⚙️ Operational & Strategic Risks
- Annotated PDF Generation — Highlighted risk sentences directly in the PDF
- Premium Dark UI — Glassmorphic design with ambient glow effects, Plus Jakarta Sans typography
- Serverless GPU Inference — Model runs on Modal with T4 GPU, scales to zero when idle
┌─────────────────────────────┐
│ Next.js Frontend (Vercel) │
│ localhost:3000 │
│ │
│ Upload PDF → /api/classify│
└──────────┬──────────────────┘
│ POST (raw bytes)
▼
┌─────────────────────────────┐
│ Modal Serverless GPU │
│ serve.py → classify() │
│ │
│ 1. Extract text (PyMuPDF) │
│ 2. HAN document scoring │
│ 3. Sentence-level scoring │
│ 4. PDF annotation │
│ 5. Return JSON + PDF b64 │
└─────────────────────────────┘
├── src/app/
│ ├── page.tsx # Main UI with categorized risk report
│ ├── layout.tsx # Root layout + Google Fonts
│ ├── globals.css # All styles (glassmorphism, animations)
│ └── api/classify/
│ └── route.ts # API proxy: frontend → Modal endpoint
├── serve.py # Modal deployment: GPU inference endpoint
├── model.py # HAN-CE model architecture (PyTorch)
├── package.json # Next.js dependencies
└── README.md
- Node.js ≥ 18
- Python ≥ 3.9
- Modal account (modal.com) with CLI authenticated
- Model weights (
han_ce_weights.pt) and tokenizer files uploaded to a Modal Volume namedtextguard-weights
git clone https://github.com/ichandan2151/TextGuard-2.0.git
cd TextGuard-2.0# Authenticate with Modal (one-time)
python3 -m modal setup
# Deploy the inference endpoint
python3 -m modal deploy serve.pyAfter deployment, you'll see a URL like:
https://<your-username>--textguard-classify.modal.run
Create a .env.local file in the project root:
MODAL_ENDPOINT_URL="https://<your-username>--textguard-classify.modal.run"npm install
npm run devOpen http://localhost:3000 in your browser.
npm install -g vercel
vercelWhen prompted, add the environment variable:
MODAL_ENDPOINT_URL = https://<your-username>--textguard-classify.modal.run
Or set it in the Vercel dashboard: Project → Settings → Environment Variables.
| Component | Description |
|---|---|
| Chunk Encoder | distilbert-base-uncased fine-tuned on IDB compliance data |
| Document Attention | Soft attention layer aggregating chunk embeddings |
| Classifier | Linear layer (hidden_size → 2) for binary risk classification |
| Sentence Scorer | Reuses chunk encoder + classifier for per-sentence risk scoring |
| Performance | Macro-F1: 0.820, Risk Recall: 1.00 |
Upload your trained weights to a Modal Volume:
python3 -m modal volume create textguard-weights
python3 -m modal volume put textguard-weights han_ce_weights.pt /han_ce_weights.pt
python3 -m modal volume put textguard-weights tokenizer/ /tokenizer/| Layer | Technology |
|---|---|
| Frontend | Next.js 15, React, TypeScript |
| Styling | Vanilla CSS with glassmorphism, ambient glow effects |
| Typography | Plus Jakarta Sans (Google Fonts) |
| API Proxy | Next.js API Routes (server-side) |
| ML Inference | PyTorch, Transformers, PyMuPDF |
| GPU Hosting | Modal (serverless T4 GPU, scales to zero) |
| Deployment | Vercel (frontend) + Modal (backend) |
This project is for academic and research purposes.
Built with ❤️ for the Inter-American Development Bank compliance workflow.