Skip to content

RamvignesH-R/Privacy_Guard_AI_Chat_System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Privacy Guard AI Chat System

An AI-powered privacy protection platform that masks Personally Identifiable Information (PII) from user conversations and uploaded documents before cloud processing.

Live Demo

Frontend Application: https://privacy-guard-ai-chat-system-2tmzwgyo5-ramvignesh-rs-projects.vercel.app/

  • BERT model is disabled due to free tier cloud deployment hence feedback loop is not functional in this deployment.

Features

AI Chat Privacy Protection

  • Real-time PII masking before sending data to cloud AI models
  • Sensitive information detection and anonymization
  • Gemini AI integration
  • Interactive chat interface
  • Active learning feedback system for improving masking quality

Document OCR + Masking

  • Upload and process:

    • PDF
    • TXT
    • PNG
    • JPG
    • JPEG
    • DOCX
  • OCR support using Tesseract

  • Automatic sensitive data masking

  • Secure document processing

Deep Learning Engine

  • BiLSTM-based custom masking model
  • Transformer-based NER pipeline
  • Continuous learning workflow
  • Known entities memory storage

Tech Stack

Frontend

  • React.js
  • Vite
  • Lucide React
  • CSS
  • Vercel Deployment

Backend

  • FastAPI
  • Uvicorn
  • SQLAlchemy
  • Pydantic
  • Python

AI / Machine Learning

  • PyTorch
  • HuggingFace Transformers
  • BiLSTM
  • TorchCRF
  • Scikit-learn
  • SpaCy
  • Gemini API

OCR / Document Processing

  • Tesseract OCR
  • PyMuPDF
  • pdf2image
  • Pillow
  • python-docx

Deployment

  • Vercel (Frontend)
  • Railway (Backend + OCR Services)
  • Docker

System Architecture

Frontend (Vercel)
    ↓
Chat Backend (Railway + FastAPI)
    ↓
PII Detection & Masking
    ↓
Gemini AI Processing

Document OCR Service (Railway + Flask)
    ↓
OCR Extraction
    ↓
PII Masking

Project Structure

Privacy_Guard_AI_Chat_System/
│
├── backend/
│   ├── model_output/
│   ├── main.py
│   ├── pii_pipeline.py
│   ├── bilstm_model.py
│   ├── database.py
│   ├── models.py
│   ├── requirements.txt
│   └── .env
│
├── DocumentMasker/
│   ├── utils/
│   │   └── ocr_extractor.py
│   ├── web/
│   │   ├── app.py
│   │   └── requirements.txt
│   └── Dockerfile
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   ├── App.jsx
│   │   └── main.jsx
│   ├── package.json
│   └── vite.config.js
│
└── README.md

Installation Guide

1. Clone Repository

git clone https://github.com/Ramvignesh-R/Privacy_Guard_AI_Chat_System.git

cd Privacy_Guard_AI_Chat_System

Frontend Setup

cd frontend
npm install
npm run dev

Frontend runs on:

http://localhost:5173

Backend Setup

Install Dependencies

cd backend
pip install -r requirements.txt

Create .env File

GOOGLE_API_KEY=your_gemini_api_key

Run Backend

uvicorn main:app --reload

Backend runs on:

http://localhost:8000

OCR Service Setup

Install Dependencies

cd DocumentMasker/web
pip install -r requirements.txt

Install OCR Requirements

Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y tesseract-ocr poppler-utils

macOS

brew install tesseract poppler

Windows

Install:

  • Tesseract OCR
  • Poppler

Run OCR Service

python app.py

OCR service runs on:

http://localhost:5000

Environment Variables

Frontend (.env)

VITE_CHAT_API_URL=http://localhost:8000
VITE_OCR_API_URL=http://localhost:5000

Production Environment Variables

VITE_CHAT_API_URL=https://your-railway-chat-url.up.railway.app
VITE_OCR_API_URL=https://your-railway-ocr-url.up.railway.app

Deployment

Frontend Deployment (Vercel)

  1. Push frontend code to GitHub
  2. Import repository into Vercel
  3. Set Root Directory:
frontend
  1. Add environment variables
  2. Deploy

Backend Deployment (Railway)

  1. Create Railway project
  2. Connect GitHub repository
  3. Set Root Directory:
backend
  1. Add environment variables
  2. Deploy FastAPI service

OCR Service Deployment (Railway)

  1. Create second Railway project
  2. Set Root Directory:
DocumentMasker
  1. Use Docker deployment
  2. Deploy OCR service

API Endpoints

Chat Endpoint

POST /chat

Request

{
  "text": "My phone number is 9876543210"
}

Feedback Endpoint

POST /report_bad_masking

Screenshots

Chat Interface

  • Real-time PII masking
  • Gemini integration
  • Secure AI interaction

Document OCR

  • Upload documents
  • OCR extraction
  • Masked output generation

Security Highlights

  • Sensitive information masking before AI processing
  • Privacy-first architecture
  • Local preprocessing workflow
  • Entity masking pipeline
  • Active learning retraining system

Future Improvements

  • Multi-language OCR support
  • Speech-to-text privacy masking
  • Real-time voice anonymization
  • Advanced entity detection
  • User authentication system
  • Analytics dashboard
  • PDF export support

Contributors

Ramvignesh R and Ritujaa BG

Integrated M.Sc Data Science PSG College of Technology


License

This project is developed for educational and research purposes.


Acknowledgements

  • Google Gemini API
  • HuggingFace Transformers
  • PyTorch
  • FastAPI
  • Railway
  • Vercel
  • Tesseract OCR

About

AI-powered privacy protection system that masks sensitive personal information from chats and documents before cloud AI processing using FastAPI, React, OCR, BiLSTM, and Transformer-based NLP models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors