TextGuard 2.0 🛡️

AI-powered compliance risk detection for IDB project documents.

TextGuard 2.0 uses a Hierarchical Attention Network (HAN) built on DistilBERT to analyze PDF documents and detect compliance risks at both the document and sentence level. It categorizes flagged sentences into business-friendly risk domains and produces annotated PDFs with highlighted risk areas.

✨ Features

Document-Level Risk Classification — HAN-CE model with soft attention over text chunks
Sentence-Level Risk Scoring — Flags individual sentences using fine-tuned DistilBERT weights
Categorized Compliance Report — Risks grouped into:
- 💰 Financial & Funding Risks
- 🌳 Socio-Environmental Risks
- ⚖️ Governance & Policy Risks
- ⚙️ Operational & Strategic Risks
Annotated PDF Generation — Highlighted risk sentences directly in the PDF
Premium Dark UI — Glassmorphic design with ambient glow effects, Plus Jakarta Sans typography
Serverless GPU Inference — Model runs on Modal with T4 GPU, scales to zero when idle

🏗️ Architecture

┌─────────────────────────────┐
│   Next.js Frontend (Vercel) │
│   localhost:3000             │
│                             │
│   Upload PDF → /api/classify│
└──────────┬──────────────────┘
           │ POST (raw bytes)
           ▼
┌─────────────────────────────┐
│   Modal Serverless GPU      │
│   serve.py → classify()    │
│                             │
│   1. Extract text (PyMuPDF) │
│   2. HAN document scoring   │
│   3. Sentence-level scoring  │
│   4. PDF annotation          │
│   5. Return JSON + PDF b64   │
└─────────────────────────────┘

📁 Project Structure

├── src/app/
│   ├── page.tsx                # Main UI with categorized risk report
│   ├── layout.tsx              # Root layout + Google Fonts
│   ├── globals.css             # All styles (glassmorphism, animations)
│   └── api/classify/
│       └── route.ts            # API proxy: frontend → Modal endpoint
├── serve.py                    # Modal deployment: GPU inference endpoint
├── model.py                    # HAN-CE model architecture (PyTorch)
├── package.json                # Next.js dependencies
└── README.md

🚀 Getting Started

Prerequisites

Node.js ≥ 18
Python ≥ 3.9
Modal account (modal.com) with CLI authenticated
Model weights (han_ce_weights.pt) and tokenizer files uploaded to a Modal Volume named textguard-weights

1. Clone the repository

git clone https://github.com/ichandan2151/TextGuard-2.0.git
cd TextGuard-2.0

2. Deploy the ML backend to Modal

# Authenticate with Modal (one-time)
python3 -m modal setup

# Deploy the inference endpoint
python3 -m modal deploy serve.py

After deployment, you'll see a URL like:

https://<your-username>--textguard-classify.modal.run

3. Configure environment variables

Create a .env.local file in the project root:

MODAL_ENDPOINT_URL="https://<your-username>--textguard-classify.modal.run"

4. Install dependencies and run

npm install
npm run dev

Open http://localhost:3000 in your browser.

☁️ Deploying to Vercel

npm install -g vercel
vercel

When prompted, add the environment variable:

MODAL_ENDPOINT_URL = https://<your-username>--textguard-classify.modal.run

Or set it in the Vercel dashboard: Project → Settings → Environment Variables.

🧠 Model Details

Component	Description
Chunk Encoder	`distilbert-base-uncased` fine-tuned on IDB compliance data
Document Attention	Soft attention layer aggregating chunk embeddings
Classifier	Linear layer (hidden_size → 2) for binary risk classification
Sentence Scorer	Reuses chunk encoder + classifier for per-sentence risk scoring
Performance	Macro-F1: 0.820, Risk Recall: 1.00

Modal Volume Setup

Upload your trained weights to a Modal Volume:

python3 -m modal volume create textguard-weights
python3 -m modal volume put textguard-weights han_ce_weights.pt /han_ce_weights.pt
python3 -m modal volume put textguard-weights tokenizer/ /tokenizer/

🛠️ Tech Stack

Layer	Technology
Frontend	Next.js 15, React, TypeScript
Styling	Vanilla CSS with glassmorphism, ambient glow effects
Typography	Plus Jakarta Sans (Google Fonts)
API Proxy	Next.js API Routes (server-side)
ML Inference	PyTorch, Transformers, PyMuPDF
GPU Hosting	Modal (serverless T4 GPU, scales to zero)
Deployment	Vercel (frontend) + Modal (backend)

📝 License

This project is for academic and research purposes.

Built with ❤️ for the Inter-American Development Bank compliance workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/app		src/app
.gitignore		.gitignore
README.md		README.md
model.py		model.py
next-env.d.ts		next-env.d.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
serve.py		serve.py
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextGuard 2.0 🛡️

✨ Features

🏗️ Architecture

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the repository

2. Deploy the ML backend to Modal

3. Configure environment variables

4. Install dependencies and run

☁️ Deploying to Vercel

🧠 Model Details

Modal Volume Setup

🛠️ Tech Stack

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TextGuard 2.0 🛡️

✨ Features

🏗️ Architecture

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the repository

2. Deploy the ML backend to Modal

3. Configure environment variables

4. Install dependencies and run

☁️ Deploying to Vercel

🧠 Model Details

Modal Volume Setup

🛠️ Tech Stack

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages