Cited RAG chat over a local corpus, routed through Cordon so PII and secret guardrails apply to every LLM call.
This is the companion app for cordon. It demonstrates a real, production-shaped pattern: a RAG service where every retrieval-augmented LLM call passes through a policy-aware gateway. The pitch in one sentence: your RAG output is only as safe as the prompts you feed the model, and "the retrieved context" is a prompt.
- A 220-line lexical retriever (regex tokenization + bigram phrase bonus + path/title bonus, no embeddings, no vector DB). See
src/cordon_rag/retrieval.py. - A seeded corpus of 45 Wikipedia-derived CC BY-SA documents across three categories: car manufacturing & repair, medical, tech/IT.
- A FastAPI service that retrieves, asks Cordon to answer with
[1]/[2]/...citations, persists the conversation, and surfaces the gateway's routing decision in the response. - A single-file HTML console at
/so you can drive it without a frontend project.
┌──────────┐ POST /api/query ┌───────────────────┐ retrieve ┌──────────────────┐
│ console │ ─────────────────▶ │ cordon-rag │ ──────────▶ │ local corpus │
│ / SDK │ │ │ │ 45 markdown │
└──────────┘ │ ┌─────────────┐ │ └──────────────────┘
│ │ build msgs │ │
│ │ persist │ │ ┌──────────────────┐
│ └─────────────┘ │ ─POST /v1─▶ │ cordon │
└───────────────────┘ │ policy engine │
postgres │ PII/secret │
│ guardrails │
└──────────────────┘
│
▼
openai / anthropic / ollama
Most RAG demos call the LLM provider directly. That's fine until your retrieved context happens to contain a customer email, an API key in a chat-log snippet, or a regulated identifier you didn't realize you had in your knowledge base. Once it's in the prompt, it's leaving your network.
Sending the LLM call through cordon means:
- Sensitive context auto-reroutes to local Llama instead of being shipped to OpenAI/Anthropic.
- Obvious secrets in retrieved chunks (API keys, credit cards) get blocked before they hit any provider.
- Every retrieval-augmented response lands in cordon's
gateway_request_logswith the matched rule, redactions, cost, and latency — auditable from one place.
The same compliance story that justifies cordon for chat APIs justifies cordon-rag for AI search.
cordon-rag needs cordon running. Easiest path: clone both and use the bundled compose stack.
git clone https://github.com/batbrainy/cordon.git
git clone https://github.com/batbrainy/cordon-rag.git
cd cordon-rag
docker compose upThe compose file starts cordon, postgres, redis, and cordon-rag together. Visit http://localhost:8090 for the console.
# Pick a category and ask a question — gets cited answer + gateway routing decision
curl -X POST http://localhost:8090/api/query \
-H "Content-Type: application/json" \
-d '{
"query": "How do regenerative braking systems recover energy?",
"category": "car_manufacturing_repair"
}'Response shape:
{
"answer": "Regenerative braking captures kinetic energy that would otherwise be lost as heat [1], using the electric motor in reverse as a generator [2]...",
"sources": [
{"index": 1, "doc_id": "car_manufacturing_repair-014", "title": "Regenerative braking", "path": "library/docs/...", "snippet": "...", "score": 12.4},
{"index": 2, "doc_id": "car_manufacturing_repair-022", "title": "Hybrid drivetrains", "path": "library/docs/...", "snippet": "...", "score": 8.1}
],
"gateway": {
"provider": "openai",
"model": "gpt-4o-mini",
"action": "allow",
"matched_rule": "default_allow",
"request_id": "req-abc123"
},
"conversation_id": "0190a8d4-..."
}The gateway block is the cordon audit trail — provider chosen, policy rule that matched, request id you can grep in gateway_request_logs. The same response shape applies whether cordon routed to OpenAI, Anthropic, or local Llama.
| Method | Path | Notes |
|---|---|---|
POST |
/api/query |
Run a RAG query. Pass conversation_id to continue a thread. |
GET |
/api/conversations |
List recent conversations. |
GET |
/api/conversations/{id} |
Fetch a conversation with all messages and citations. |
GET |
/api/docs |
List the seeded corpus. Filter with ?category=. |
GET |
/ |
Single-file HTML console. |
GET |
/health |
Liveness probe. |
| Env | Default | Notes |
|---|---|---|
DATABASE_URL |
postgres in compose | Any SQLAlchemy URL. SQLite works for dev. |
CORDON_BASE_URL |
http://gateway:8080 |
Where cordon lives. |
CORDON_API_KEY |
pk_live_dev_changeme |
Match the key issued by your cordon instance. |
CORDON_DEFAULT_MODEL |
gpt-4o-mini |
Forwarded to cordon; cordon picks the provider per policy. |
RETRIEVAL_TOP_K |
5 | Number of chunks to include per query. |
ALLOWED_ORIGINS |
http://localhost:8090 |
CORS allowlist. |
pip install -e ".[dev]"
ruff check srcMIT for the code. The bundled corpus under src/cordon_rag/seeded_data/library/ is CC BY-SA 4.0 (Wikipedia); see per-file frontmatter and LICENSE.