RAG Shield — Secure Retrieval-Augmented Generation System

RAG Shield is an academic final-year project focused on improving the security of Retrieval-Augmented Generation (RAG) systems.

The project demonstrates the difference between a basic RAG pipeline and a secure RAG pipeline. The secure version applies access control, prompt injection detection, data masking, document quarantine, chunk exclusion, context minimisation, and security logging before information is sent to the language model.

Project Overview

Retrieval-Augmented Generation allows a language model to answer user questions using internal documents. However, this creates security risks because retrieved documents may contain sensitive information, secrets, infrastructure details, or malicious instructions.

In a basic RAG system, relevant chunks are often sent directly to the LLM. If those chunks contain private data or hidden malicious instructions, the model may expose sensitive information or follow unsafe instructions.

RAG Shield was developed as a proof-of-concept system to reduce these risks by filtering, classifying, and sanitising retrieved content before it reaches the LLM.

https://showcase.itcarlow.ie/C00290978/index.html

Main Features

Basic RAG pipeline for comparison
Secure RAG pipeline with security controls
Query-level prompt injection detection
Document-level prompt injection scanning
Chunk-level prompt injection scanning
Document quarantine for high-risk content
Chunk exclusion for suspicious content
Role-based access control for retrieved documents and chunks
L0 and L1 masking layers for sensitive information
Secret, PII, and internal infrastructure detection
Context minimisation before sending data to the LLM
PostgreSQL and pgvector-based semantic search
Security logging
Grafana dashboard support for monitoring security events

Security Controls Implemented

1. Prompt Injection Detection

The system scans user queries, uploaded documents, and document chunks for suspicious instructions.

Examples of suspicious behaviour include:

attempts to override previous instructions
attempts to reveal the system prompt
attempts to extract secrets or credentials
malicious instructions hidden inside retrieved documents
fake system or developer messages inside documents
tool misuse instructions
exfiltration-style instructions

Depending on the severity, the system can either exclude a specific chunk or quarantine the whole document.

2. Document Quarantine

If a document contains high-risk or repeated suspicious instructions, it can be quarantined.

A quarantined document is excluded from the retrieval pipeline and is not sent to the LLM.

This helps prevent indirect prompt injection attacks where malicious instructions are hidden inside documents.

3. Chunk-Level Exclusion

If only one part of a document is suspicious, the system can exclude only the unsafe chunk while keeping the rest of the document available.

This allows the system to still use safe information without sending suspicious content to the LLM.

4. Role-Based Access Control

Documents and chunks are assigned access levels.

Users can only retrieve content that matches their role and permission level.

This helps prevent broken access control, where a low-privileged user could retrieve internal or sensitive information.

5. Data Masking

The project supports different masking layers:

L0 masking — stricter masking for lower-privileged users
L1 masking — lighter masking for more privileged users

Sensitive values can be replaced with placeholders before being sent to the LLM.

Examples of masked data include:

email addresses
phone numbers
private IP addresses
internal hostnames
tokens
credentials
database connection strings
secrets

6. Context Minimisation

Instead of sending full documents to the LLM, the system only sends a small number of relevant, authorised, and sanitised chunks.

This reduces unnecessary data exposure and supports a more privacy-aware RAG design.

Technology Stack

Python
FastAPI
PostgreSQL
pgvector
Redis
Docker / Docker Compose
Ollama / local LLM integration
Microsoft Presidio
Grafana
HTML / CSS / JavaScript

Architecture

The project compares two RAG pipelines.

Basic RAG Pipeline

User submits a query
System retrieves relevant chunks
Retrieved content is sent directly to the LLM
LLM generates an answer

This version is intentionally insecure and is used for comparison.

Secure RAG Pipeline

User submits a query
Query is scanned for prompt injection
Retrieval is filtered by access control
Retrieved chunks are checked for security flags
Unsafe chunks are excluded
Sensitive data is masked
Only safe and minimal context is sent to the LLM
Security events are logged

Example Security Scenarios

The project demonstrates several RAG security risks and mitigations:

Direct prompt injection
Indirect prompt injection through retrieved documents
Secret exposure
Sensitive data leakage
Broken access control
Document quarantine
Chunk-level exclusion
Difference between basic and secure RAG behaviour

How to Run

1. Clone the repository

git clone https://github.com/777liza/RAG_Shield_Final.git
cd RAG_Shield_Final

2. Create a virtual environment

python -m venv .venv

3. Activate the virtual environment

Windows:

.venv\Scripts\activate

macOS / Linux:

source .venv/bin/activate

4. Install dependencies

pip install -r requirements.txt

5. Configure environment variables

Create a .env file in the project root.

Example:

DB_DSN=postgresql://user:password@localhost:5432/rag_db
REDIS_URL=redis://localhost:6379
OLLAMA_BASE_URL=http://localhost:11434/v1
MODEL_NAME=llama3.2

Do not commit the .env file to GitHub.

6. Run the application

uvicorn main:app --reload

Then open:

http://localhost:8000

If the main application file is called app.py instead of main.py, use:

uvicorn app:app --reload

Docker

If Docker Compose is configured, the project can also be started with:

docker compose up --build

Academic Purpose

This project was developed as an academic final-year project for Cybercrime and IT Security.

The goal of the project is to demonstrate common security risks in Retrieval-Augmented Generation systems and show how security controls can be added before retrieved information is sent to the LLM.

Limitations

This project is a proof of concept and is not intended to be used as a production-ready security product without further testing and hardening.

Known limitations include:

Prompt injection detection is rule-based.
Rule-based detection may produce false positives.
Some malicious instructions can be difficult to detect reliably.
The system was designed for a local/demo environment.
The security controls are designed for academic demonstration.
More advanced machine-learning-based detection could be added in future work.

Future Work

Possible improvements include:

More advanced prompt injection detection
Machine-learning-based detection of suspicious instructions
More detailed policy engine for access control
Better admin interface for reviewing quarantined documents
Expanded evaluation using larger document collections
More detailed performance and security metrics
Improved deployment hardening

Author

Developed by Liza as a final-year Cybercrime and IT Security project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Shield — Secure Retrieval-Augmented Generation System

Project Overview

Main Features

Security Controls Implemented

1. Prompt Injection Detection

2. Document Quarantine

3. Chunk-Level Exclusion

4. Role-Based Access Control

5. Data Masking

6. Context Minimisation

Technology Stack

Architecture

Basic RAG Pipeline

Secure RAG Pipeline

Example Security Scenarios

How to Run

1. Clone the repository

2. Create a virtual environment

3. Activate the virtual environment

4. Install dependencies

5. Configure environment variables

6. Run the application

Docker

Academic Purpose

Limitations

Future Work

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Shield — Secure Retrieval-Augmented Generation System

Project Overview

Main Features

Security Controls Implemented

1. Prompt Injection Detection

2. Document Quarantine

3. Chunk-Level Exclusion

4. Role-Based Access Control

5. Data Masking

6. Context Minimisation

Technology Stack

Architecture

Basic RAG Pipeline

Secure RAG Pipeline

Example Security Scenarios

How to Run

1. Clone the repository

2. Create a virtual environment

3. Activate the virtual environment

4. Install dependencies

5. Configure environment variables

6. Run the application

Docker

Academic Purpose

Limitations

Future Work

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages