An AI-powered privacy protection platform that masks Personally Identifiable Information (PII) from user conversations and uploaded documents before cloud processing.
Frontend Application: https://privacy-guard-ai-chat-system-2tmzwgyo5-ramvignesh-rs-projects.vercel.app/
- BERT model is disabled due to free tier cloud deployment hence feedback loop is not functional in this deployment.
- Real-time PII masking before sending data to cloud AI models
- Sensitive information detection and anonymization
- Gemini AI integration
- Interactive chat interface
- Active learning feedback system for improving masking quality
-
Upload and process:
- TXT
- PNG
- JPG
- JPEG
- DOCX
-
OCR support using Tesseract
-
Automatic sensitive data masking
-
Secure document processing
- BiLSTM-based custom masking model
- Transformer-based NER pipeline
- Continuous learning workflow
- Known entities memory storage
- React.js
- Vite
- Lucide React
- CSS
- Vercel Deployment
- FastAPI
- Uvicorn
- SQLAlchemy
- Pydantic
- Python
- PyTorch
- HuggingFace Transformers
- BiLSTM
- TorchCRF
- Scikit-learn
- SpaCy
- Gemini API
- Tesseract OCR
- PyMuPDF
- pdf2image
- Pillow
- python-docx
- Vercel (Frontend)
- Railway (Backend + OCR Services)
- Docker
Frontend (Vercel)
↓
Chat Backend (Railway + FastAPI)
↓
PII Detection & Masking
↓
Gemini AI Processing
Document OCR Service (Railway + Flask)
↓
OCR Extraction
↓
PII Masking
Privacy_Guard_AI_Chat_System/
│
├── backend/
│ ├── model_output/
│ ├── main.py
│ ├── pii_pipeline.py
│ ├── bilstm_model.py
│ ├── database.py
│ ├── models.py
│ ├── requirements.txt
│ └── .env
│
├── DocumentMasker/
│ ├── utils/
│ │ └── ocr_extractor.py
│ ├── web/
│ │ ├── app.py
│ │ └── requirements.txt
│ └── Dockerfile
│
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ ├── App.jsx
│ │ └── main.jsx
│ ├── package.json
│ └── vite.config.js
│
└── README.md
git clone https://github.com/Ramvignesh-R/Privacy_Guard_AI_Chat_System.git
cd Privacy_Guard_AI_Chat_Systemcd frontend
npm install
npm run devFrontend runs on:
http://localhost:5173
cd backend
pip install -r requirements.txtGOOGLE_API_KEY=your_gemini_api_keyuvicorn main:app --reloadBackend runs on:
http://localhost:8000
cd DocumentMasker/web
pip install -r requirements.txtsudo apt-get update
sudo apt-get install -y tesseract-ocr poppler-utilsbrew install tesseract popplerInstall:
- Tesseract OCR
- Poppler
python app.pyOCR service runs on:
http://localhost:5000
VITE_CHAT_API_URL=http://localhost:8000
VITE_OCR_API_URL=http://localhost:5000VITE_CHAT_API_URL=https://your-railway-chat-url.up.railway.app
VITE_OCR_API_URL=https://your-railway-ocr-url.up.railway.app- Push frontend code to GitHub
- Import repository into Vercel
- Set Root Directory:
frontend
- Add environment variables
- Deploy
- Create Railway project
- Connect GitHub repository
- Set Root Directory:
backend
- Add environment variables
- Deploy FastAPI service
- Create second Railway project
- Set Root Directory:
DocumentMasker
- Use Docker deployment
- Deploy OCR service
POST /chat{
"text": "My phone number is 9876543210"
}POST /report_bad_masking- Real-time PII masking
- Gemini integration
- Secure AI interaction
- Upload documents
- OCR extraction
- Masked output generation
- Sensitive information masking before AI processing
- Privacy-first architecture
- Local preprocessing workflow
- Entity masking pipeline
- Active learning retraining system
- Multi-language OCR support
- Speech-to-text privacy masking
- Real-time voice anonymization
- Advanced entity detection
- User authentication system
- Analytics dashboard
- PDF export support
Integrated M.Sc Data Science PSG College of Technology
This project is developed for educational and research purposes.
- Google Gemini API
- HuggingFace Transformers
- PyTorch
- FastAPI
- Railway
- Vercel
- Tesseract OCR