Skip to content

ichandan2151/TextGuard-2.0

Repository files navigation

TextGuard 2.0 🛡️

AI-powered compliance risk detection for IDB project documents.

TextGuard 2.0 uses a Hierarchical Attention Network (HAN) built on DistilBERT to analyze PDF documents and detect compliance risks at both the document and sentence level. It categorizes flagged sentences into business-friendly risk domains and produces annotated PDFs with highlighted risk areas.

TextGuard 2.0 Next.js Modal Python


✨ Features

  • Document-Level Risk Classification — HAN-CE model with soft attention over text chunks
  • Sentence-Level Risk Scoring — Flags individual sentences using fine-tuned DistilBERT weights
  • Categorized Compliance Report — Risks grouped into:
    • 💰 Financial & Funding Risks
    • 🌳 Socio-Environmental Risks
    • ⚖️ Governance & Policy Risks
    • ⚙️ Operational & Strategic Risks
  • Annotated PDF Generation — Highlighted risk sentences directly in the PDF
  • Premium Dark UI — Glassmorphic design with ambient glow effects, Plus Jakarta Sans typography
  • Serverless GPU Inference — Model runs on Modal with T4 GPU, scales to zero when idle

🏗️ Architecture

┌─────────────────────────────┐
│   Next.js Frontend (Vercel) │
│   localhost:3000             │
│                             │
│   Upload PDF → /api/classify│
└──────────┬──────────────────┘
           │ POST (raw bytes)
           ▼
┌─────────────────────────────┐
│   Modal Serverless GPU      │
│   serve.py → classify()    │
│                             │
│   1. Extract text (PyMuPDF) │
│   2. HAN document scoring   │
│   3. Sentence-level scoring  │
│   4. PDF annotation          │
│   5. Return JSON + PDF b64   │
└─────────────────────────────┘

📁 Project Structure

├── src/app/
│   ├── page.tsx                # Main UI with categorized risk report
│   ├── layout.tsx              # Root layout + Google Fonts
│   ├── globals.css             # All styles (glassmorphism, animations)
│   └── api/classify/
│       └── route.ts            # API proxy: frontend → Modal endpoint
├── serve.py                    # Modal deployment: GPU inference endpoint
├── model.py                    # HAN-CE model architecture (PyTorch)
├── package.json                # Next.js dependencies
└── README.md

🚀 Getting Started

Prerequisites

  • Node.js ≥ 18
  • Python ≥ 3.9
  • Modal account (modal.com) with CLI authenticated
  • Model weights (han_ce_weights.pt) and tokenizer files uploaded to a Modal Volume named textguard-weights

1. Clone the repository

git clone https://github.com/ichandan2151/TextGuard-2.0.git
cd TextGuard-2.0

2. Deploy the ML backend to Modal

# Authenticate with Modal (one-time)
python3 -m modal setup

# Deploy the inference endpoint
python3 -m modal deploy serve.py

After deployment, you'll see a URL like:

https://<your-username>--textguard-classify.modal.run

3. Configure environment variables

Create a .env.local file in the project root:

MODAL_ENDPOINT_URL="https://<your-username>--textguard-classify.modal.run"

4. Install dependencies and run

npm install
npm run dev

Open http://localhost:3000 in your browser.


☁️ Deploying to Vercel

npm install -g vercel
vercel

When prompted, add the environment variable:

MODAL_ENDPOINT_URL = https://<your-username>--textguard-classify.modal.run

Or set it in the Vercel dashboard: Project → Settings → Environment Variables.


🧠 Model Details

Component Description
Chunk Encoder distilbert-base-uncased fine-tuned on IDB compliance data
Document Attention Soft attention layer aggregating chunk embeddings
Classifier Linear layer (hidden_size → 2) for binary risk classification
Sentence Scorer Reuses chunk encoder + classifier for per-sentence risk scoring
Performance Macro-F1: 0.820, Risk Recall: 1.00

Modal Volume Setup

Upload your trained weights to a Modal Volume:

python3 -m modal volume create textguard-weights
python3 -m modal volume put textguard-weights han_ce_weights.pt /han_ce_weights.pt
python3 -m modal volume put textguard-weights tokenizer/ /tokenizer/

🛠️ Tech Stack

Layer Technology
Frontend Next.js 15, React, TypeScript
Styling Vanilla CSS with glassmorphism, ambient glow effects
Typography Plus Jakarta Sans (Google Fonts)
API Proxy Next.js API Routes (server-side)
ML Inference PyTorch, Transformers, PyMuPDF
GPU Hosting Modal (serverless T4 GPU, scales to zero)
Deployment Vercel (frontend) + Modal (backend)

📝 License

This project is for academic and research purposes.


Built with ❤️ for the Inter-American Development Bank compliance workflow.

About

AI-powered compliance risk detection for IDB project documents using Hierarchical Attention Networks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors