A zero-trust AI gateway for real-time PII masking, dynamic LLM routing, and telemetry logging.
Sentinel provides a secure, asynchronous Python gateway that intercepts data-sensitive AI prompts. It detects and masks PII using Presidio, storing the masked values in a Redis-backed token vault. The system then dynamically routes queries via Langchain and logs telemetry to a serverless SQLite database.
Most enterprises (or even small groups) want to adopt AI assistants but face a hard constraint: sensitive data (names, emails, phone numbers) cannot be sent to third-party LLM providers. The typical answer is to either ban cloud LLMs entirely or trust the provider's data handling.
Sentinel sits between the user-facing chatbot (Microsoft Copilot Studio) and the LLMs, acting as a governance layer that strips user-sensitive before any prompt leaves the network, vaults the original values in a TTL-scoped Redis store, and restores them in the response. Simple queries stay on a local Ollama instance that never touches the internet; only complex prompts that need a larger model are forwarded to Gemini, with PII already removed.
- Frontend: Microsoft Copilot Studio (via Ngrok)
- API Gateway: Python 3.11, FastAPI, Uvicorn, Pydantic
- LLM Orchestration: LangChain (Ollama, Google Gemini)
- PII Engine: Microsoft Presidio + spaCy (
en_core_web_sm) - State Management: Redis (internal Docker network)
- Telemetry: SQLite (host-mounted volume)
- Docker & Docker Compose
- A Google Gemini API key
- Ngrok account (for Copilot Studio integration)
-
Clone the repository:
git clone https://github.com/<your-org>/sentinel.git cd sentinel
-
Configure environment variables:
SENTINEL_API_KEY=<your-secure-random-key> GEMINI_API_KEY=<your-gemini-api-key> -
Launch services via Docker:
docker compose up --build -d
-
Pull a local model:
docker exec -it llm-local ollama pull llama3 -
Test the gateway:
curl -X POST http://localhost:8000/v1/chat \ -H "Content-Type: application/json" \ -H "X-API-Key: <your-sentinel-api-key>" \ -d '{"session_id": "test-001", "message": "Hello, how are you?"}'
Copilot Studio can use Sentinel as its backend by calling the /v1/chat endpoint through an Ngrok HTTPS tunnel.
# Expose the local gateway to the public internet
ngrok http 8000Note: Copy the secure forwarding URL generated by Ngrok (e.g.,
https://abc123.ngrok-free.app).
Navigate to your Microsoft Copilot Studio portal and create a new Custom Connector to link your agent to the Sentinel gateway.
Configure the security and routing settings as follows:
- Base URL:
<YOUR_NGROK_HTTPS_URL> - Authentication Type: API Key
- API Key Parameter Name:
X-API-Key - Parameter Location:
Header - API Key Value: The value of your
.envSENTINEL_API_KEY
Create a new POST action pointing to the /v1/chat endpoint. You must define the Request and Response schemas so Copilot Studio understands the API contract.
Click to view JSON Schemas
Request Payload Schema:
{
"session_id": "string",
"message": "string"
}Response Payload Schema:
{
"reply": "string",
"metadata": {
"routed_to": "string",
"pii_entities_masked": 0,
"latency_ms": 0
}
}Inside your Copilot Studio dialogue tree, add a node to "Call an action" and select your Sentinel connector. Map the Copilot system variables to the JSON request payload:
message: Map to the user's raw text input (the conversation turn).session_id: Map toSystem.Conversation.Id.- Output: Map the API's
replyresponse field to a standard Copilot chat bubble node.
The agent will now route all messages through Sentinel.
[ Client ]
|
| POST /v1/chat (X-API-Key)
v
+---------------------------------------+
| 1. Auth & Intercept (FastAPI) |
+---------------------------------------+
|
| (Internal Network)
v
+---------------------------------------+
| 2. PII Masking (Presidio) | <---> [ Redis Vault ]
+---------------------------------------+
|
v
+---------------------------------------+
| 3. Semantic Router (LangChain) |
+---------------------------------------+
/ \
(Simple) (Complex)
/ \
[ Local Ollama ] [ Google Gemini ]
\ /
+----------+----------+
|
v
+---------------------------------------+
| 4. PII Unmasking | <---> [ Redis Vault ]
+---------------------------------------+
|
| (Async)
+------------> [ SQLite Telemetry DB ]
| (Host Volume)
v
[ JSON Response ]
- redis-vault and llm-local are on an internal Docker bridge network with zero host port exposure.
- Gemini is called over HTTPS from the gateway container — PII is already stripped before the request leaves.
- data/.db is a host volume mount (
./data:/data), not a network service.
POST /v1/chat — requires X-API-Key header.
Request:
{
"session_id": "unique-session-identifier",
"message": "user prompt text"
}Response (200):
{
"reply": "unmasked AI response",
"metadata": {
"routed_to": "llama-3-local | gemini-3.1-flash-lite-preview",
"pii_entities_masked": 2,
"latency_ms": 1205
}
}| Status | Meaning |
|---|---|
401 |
Missing or invalid X-API-Key |
429 |
Rate limit exceeded (>10 req/min/IP) |
sentinel/
├── .env # API keys (not committed)
├── docker-compose.yml # Zero-trust container orchestration
├── Dockerfile # Gateway image build
├── requirements.txt # Pinned Python dependencies
│
├── data/ # Host volume for persistent storage
│ └── telemetry.db # SQLite (auto-generated at runtime)
│
└── app/
├── config.py # Pydantic BaseSettings (cached)
├── main.py # FastAPI entry point & endpoint orchestration
├── schemas.py # Pydantic request/response models
├── security.py # API key verification & rate limiting
├── database.py # SQLite init & telemetry logging
├── vault.py # Redis + Presidio PII masking/unmasking
└── router.py # LangChain LLM routing (Ollama vs Gemini)