Skip to content

batbrainy/cordon-rag

Repository files navigation

cordon-rag

Cited RAG chat over a local corpus, routed through Cordon so PII and secret guardrails apply to every LLM call.

This is the companion app for cordon. It demonstrates a real, production-shaped pattern: a RAG service where every retrieval-augmented LLM call passes through a policy-aware gateway. The pitch in one sentence: your RAG output is only as safe as the prompts you feed the model, and "the retrieved context" is a prompt.

What's in the box

  • A 220-line lexical retriever (regex tokenization + bigram phrase bonus + path/title bonus, no embeddings, no vector DB). See src/cordon_rag/retrieval.py.
  • A seeded corpus of 45 Wikipedia-derived CC BY-SA documents across three categories: car manufacturing & repair, medical, tech/IT.
  • A FastAPI service that retrieves, asks Cordon to answer with [1]/[2]/... citations, persists the conversation, and surfaces the gateway's routing decision in the response.
  • A single-file HTML console at / so you can drive it without a frontend project.
┌──────────┐  POST /api/query   ┌───────────────────┐  retrieve   ┌──────────────────┐
│ console  │ ─────────────────▶ │   cordon-rag      │ ──────────▶ │  local corpus    │
│  / SDK   │                    │                   │             │  45 markdown     │
└──────────┘                    │  ┌─────────────┐  │             └──────────────────┘
                                │  │ build msgs  │  │
                                │  │ persist     │  │             ┌──────────────────┐
                                │  └─────────────┘  │ ─POST /v1─▶ │     cordon       │
                                └───────────────────┘             │  policy engine   │
                                       postgres                   │  PII/secret      │
                                                                  │  guardrails      │
                                                                  └──────────────────┘
                                                                       │
                                                                       ▼
                                                            openai / anthropic / ollama

Why this matters

Most RAG demos call the LLM provider directly. That's fine until your retrieved context happens to contain a customer email, an API key in a chat-log snippet, or a regulated identifier you didn't realize you had in your knowledge base. Once it's in the prompt, it's leaving your network.

Sending the LLM call through cordon means:

  • Sensitive context auto-reroutes to local Llama instead of being shipped to OpenAI/Anthropic.
  • Obvious secrets in retrieved chunks (API keys, credit cards) get blocked before they hit any provider.
  • Every retrieval-augmented response lands in cordon's gateway_request_logs with the matched rule, redactions, cost, and latency — auditable from one place.

The same compliance story that justifies cordon for chat APIs justifies cordon-rag for AI search.

Quickstart

cordon-rag needs cordon running. Easiest path: clone both and use the bundled compose stack.

git clone https://github.com/batbrainy/cordon.git
git clone https://github.com/batbrainy/cordon-rag.git
cd cordon-rag
docker compose up

The compose file starts cordon, postgres, redis, and cordon-rag together. Visit http://localhost:8090 for the console.

Try it (curl)

# Pick a category and ask a question — gets cited answer + gateway routing decision
curl -X POST http://localhost:8090/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do regenerative braking systems recover energy?",
    "category": "car_manufacturing_repair"
  }'

Response shape:

{
  "answer": "Regenerative braking captures kinetic energy that would otherwise be lost as heat [1], using the electric motor in reverse as a generator [2]...",
  "sources": [
    {"index": 1, "doc_id": "car_manufacturing_repair-014", "title": "Regenerative braking", "path": "library/docs/...", "snippet": "...", "score": 12.4},
    {"index": 2, "doc_id": "car_manufacturing_repair-022", "title": "Hybrid drivetrains", "path": "library/docs/...", "snippet": "...", "score": 8.1}
  ],
  "gateway": {
    "provider": "openai",
    "model": "gpt-4o-mini",
    "action": "allow",
    "matched_rule": "default_allow",
    "request_id": "req-abc123"
  },
  "conversation_id": "0190a8d4-..."
}

The gateway block is the cordon audit trail — provider chosen, policy rule that matched, request id you can grep in gateway_request_logs. The same response shape applies whether cordon routed to OpenAI, Anthropic, or local Llama.

API

Method Path Notes
POST /api/query Run a RAG query. Pass conversation_id to continue a thread.
GET /api/conversations List recent conversations.
GET /api/conversations/{id} Fetch a conversation with all messages and citations.
GET /api/docs List the seeded corpus. Filter with ?category=.
GET / Single-file HTML console.
GET /health Liveness probe.

Configuration

Env Default Notes
DATABASE_URL postgres in compose Any SQLAlchemy URL. SQLite works for dev.
CORDON_BASE_URL http://gateway:8080 Where cordon lives.
CORDON_API_KEY pk_live_dev_changeme Match the key issued by your cordon instance.
CORDON_DEFAULT_MODEL gpt-4o-mini Forwarded to cordon; cordon picks the provider per policy.
RETRIEVAL_TOP_K 5 Number of chunks to include per query.
ALLOWED_ORIGINS http://localhost:8090 CORS allowlist.

Development

pip install -e ".[dev]"
ruff check src

License

MIT for the code. The bundled corpus under src/cordon_rag/seeded_data/library/ is CC BY-SA 4.0 (Wikipedia); see per-file frontmatter and LICENSE.

About

Cited RAG chat over a local corpus, routed through cordon so PII and secret guardrails apply to every LLM call.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages