RAG Shield is an academic final-year project focused on improving the security of Retrieval-Augmented Generation (RAG) systems.
The project demonstrates the difference between a basic RAG pipeline and a secure RAG pipeline. The secure version applies access control, prompt injection detection, data masking, document quarantine, chunk exclusion, context minimisation, and security logging before information is sent to the language model.
Retrieval-Augmented Generation allows a language model to answer user questions using internal documents. However, this creates security risks because retrieved documents may contain sensitive information, secrets, infrastructure details, or malicious instructions.
In a basic RAG system, relevant chunks are often sent directly to the LLM. If those chunks contain private data or hidden malicious instructions, the model may expose sensitive information or follow unsafe instructions.
RAG Shield was developed as a proof-of-concept system to reduce these risks by filtering, classifying, and sanitising retrieved content before it reaches the LLM.
https://showcase.itcarlow.ie/C00290978/index.html
- Basic RAG pipeline for comparison
- Secure RAG pipeline with security controls
- Query-level prompt injection detection
- Document-level prompt injection scanning
- Chunk-level prompt injection scanning
- Document quarantine for high-risk content
- Chunk exclusion for suspicious content
- Role-based access control for retrieved documents and chunks
- L0 and L1 masking layers for sensitive information
- Secret, PII, and internal infrastructure detection
- Context minimisation before sending data to the LLM
- PostgreSQL and pgvector-based semantic search
- Security logging
- Grafana dashboard support for monitoring security events
The system scans user queries, uploaded documents, and document chunks for suspicious instructions.
Examples of suspicious behaviour include:
- attempts to override previous instructions
- attempts to reveal the system prompt
- attempts to extract secrets or credentials
- malicious instructions hidden inside retrieved documents
- fake system or developer messages inside documents
- tool misuse instructions
- exfiltration-style instructions
Depending on the severity, the system can either exclude a specific chunk or quarantine the whole document.
If a document contains high-risk or repeated suspicious instructions, it can be quarantined.
A quarantined document is excluded from the retrieval pipeline and is not sent to the LLM.
This helps prevent indirect prompt injection attacks where malicious instructions are hidden inside documents.
If only one part of a document is suspicious, the system can exclude only the unsafe chunk while keeping the rest of the document available.
This allows the system to still use safe information without sending suspicious content to the LLM.
Documents and chunks are assigned access levels.
Users can only retrieve content that matches their role and permission level.
This helps prevent broken access control, where a low-privileged user could retrieve internal or sensitive information.
The project supports different masking layers:
- L0 masking — stricter masking for lower-privileged users
- L1 masking — lighter masking for more privileged users
Sensitive values can be replaced with placeholders before being sent to the LLM.
Examples of masked data include:
- email addresses
- phone numbers
- private IP addresses
- internal hostnames
- tokens
- credentials
- database connection strings
- secrets
Instead of sending full documents to the LLM, the system only sends a small number of relevant, authorised, and sanitised chunks.
This reduces unnecessary data exposure and supports a more privacy-aware RAG design.
- Python
- FastAPI
- PostgreSQL
- pgvector
- Redis
- Docker / Docker Compose
- Ollama / local LLM integration
- Microsoft Presidio
- Grafana
- HTML / CSS / JavaScript
The project compares two RAG pipelines.
- User submits a query
- System retrieves relevant chunks
- Retrieved content is sent directly to the LLM
- LLM generates an answer
This version is intentionally insecure and is used for comparison.
- User submits a query
- Query is scanned for prompt injection
- Retrieval is filtered by access control
- Retrieved chunks are checked for security flags
- Unsafe chunks are excluded
- Sensitive data is masked
- Only safe and minimal context is sent to the LLM
- Security events are logged
The project demonstrates several RAG security risks and mitigations:
- Direct prompt injection
- Indirect prompt injection through retrieved documents
- Secret exposure
- Sensitive data leakage
- Broken access control
- Document quarantine
- Chunk-level exclusion
- Difference between basic and secure RAG behaviour
git clone https://github.com/777liza/RAG_Shield_Final.git
cd RAG_Shield_Finalpython -m venv .venvWindows:
.venv\Scripts\activatemacOS / Linux:
source .venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root.
Example:
DB_DSN=postgresql://user:password@localhost:5432/rag_db
REDIS_URL=redis://localhost:6379
OLLAMA_BASE_URL=http://localhost:11434/v1
MODEL_NAME=llama3.2Do not commit the .env file to GitHub.
uvicorn main:app --reloadThen open:
http://localhost:8000
If the main application file is called app.py instead of main.py, use:
uvicorn app:app --reloadIf Docker Compose is configured, the project can also be started with:
docker compose up --buildThis project was developed as an academic final-year project for Cybercrime and IT Security.
The goal of the project is to demonstrate common security risks in Retrieval-Augmented Generation systems and show how security controls can be added before retrieved information is sent to the LLM.
This project is a proof of concept and is not intended to be used as a production-ready security product without further testing and hardening.
Known limitations include:
- Prompt injection detection is rule-based.
- Rule-based detection may produce false positives.
- Some malicious instructions can be difficult to detect reliably.
- The system was designed for a local/demo environment.
- The security controls are designed for academic demonstration.
- More advanced machine-learning-based detection could be added in future work.
Possible improvements include:
- More advanced prompt injection detection
- Machine-learning-based detection of suspicious instructions
- More detailed policy engine for access control
- Better admin interface for reviewing quarantined documents
- Expanded evaluation using larger document collections
- More detailed performance and security metrics
- Improved deployment hardening
Developed by Liza as a final-year Cybercrime and IT Security project.