Guarded RAG: answers grounded in retrieved context, refusal when there's no support, and an eval harness that puts a number on it.
The failure mode of RAG isn't bad retrieval. It's the confident answer with nothing behind it. rag-guard is a small, runnable pipeline that makes that hard: it refuses when retrieval finds no support, checks the answer against the context, redacts PII from the output, and traces every step. Pure-stdlib core, zero runtime dependencies, bring your own model.
π§© One layer of a five-repo cost-governance stack for operating AI agents cost-efficiently; bow is the flagship that operates the stack in production.
"how long is shipping?" β grounded answer, sources=[ship] β
"quantum chromodynamics?" β refuses (no support), model not called β
- Refuse-when-unsupported. If the top retrieval score is below threshold, the pipeline refuses and never even calls the model. No support, no answer.
- Groundedness check. After the model answers, verify the answer is actually backed by the retrieved context; flag it if not. (Lexical-overlap proxy here, swappable for an NLI/LLM judge behind the same interface.)
- PII output filter. Emails, phones, SSNs, and card-like numbers are redacted from whatever the model returns.
Every result carries a trace (what was retrieved + scores, refused?, grounded?) so the system is auditable.
pip install guarded-ragZero runtime dependencies β it's stdlib all the way down. (PyPI name is guarded-rag β the import is still rag_guard.)
from rag_guard import Retriever, RagGuard
from rag_guard import FakeProvider # swap for a real model provider
ret = Retriever([
{"id": "ship", "text": "Standard shipping takes 3 to 5 business days."},
{"id": "returns", "text": "Return any item within 30 days for a full refund."},
])
rag = RagGuard(ret, FakeProvider("Shipping takes 3 to 5 business days."))
print(rag.answer("how long does shipping take"))
# {'answer': 'Shipping takes 3 to 5 business days.', 'refused': False,
# 'grounded': True, 'support': 1.0, 'sources': ['ship', 'returns'], 'trace': {...}}
print(rag.answer("quantum chromodynamics")["refused"]) # True: refuses, no supportfrom rag_guard.evaluate import evaluate
cases = [
{"query": "how long does shipping take", "gold": "ship", "expect_refusal": False},
{"query": "quantum chromodynamics", "expect_refusal": True},
]
print(evaluate(rag, cases))
# {'n': 2, 'refusal_accuracy': 1.0, 'retrieval_hit_rate': 1.0, 'grounded_rate': 1.0, 'cases': [...]}Re-run the eval on any model or config change to catch regressions before a user does.
A real run, not a demo fixture. The two cases above are an illustration. They score 1.0 across the board, so don't read anything into them. bin/eval_real.py runs a 20-case labeled set over a 12-doc corpus through a live model (claude -p):
PYTHONPATH=. python3 bin/eval_real.py # requires claude CLI on PATH
# {'n': 20, 'refusal_accuracy': 0.9, 'retrieval_hit_rate': 1.0, 'grounded_rate': 0.8824}The two refusal misses were out-of-corpus identity questions ("who's the CEO?") that scored just over threshold, but the groundedness guard still flagged both, so nothing unsupported got through unflagged. Full output lands in eval/results.json.
The model sits behind a one-method seam: complete(prompt) -> str. FakeProvider keeps tests/CI deterministic and key-free; a real provider drops in without touching the pipeline or guards. Retrieval is the same: the stdlib TF-IDF Retriever is a stand-in for real embeddings / a vector DB behind retrieve().
Any object with complete(prompt) -> str works. Here's an Anthropic provider in stdlib only, no SDK required:
import json, os, urllib.request
class AnthropicProvider:
def __init__(self, model="claude-sonnet-4-5", max_tokens=512):
self.model, self.max_tokens = model, max_tokens
def complete(self, prompt: str) -> str:
req = urllib.request.Request(
"https://api.anthropic.com/v1/messages",
data=json.dumps({
"model": self.model,
"max_tokens": self.max_tokens,
"messages": [{"role": "user", "content": prompt}],
}).encode(),
headers={
"x-api-key": os.environ["ANTHROPIC_API_KEY"],
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
)
with urllib.request.urlopen(req) as resp:
return json.load(resp)["content"][0]["text"]
rag = RagGuard(ret, AnthropicProvider())git clone https://github.com/Jott2121/rag-guard && cd rag-guard
pip install -e ".[dev]" && python -m pytest -q # tests pass on Python 3.11-3.13
python bin/demo.py # see grounded answer, refusal, PII redaction, evalCI (badge above) runs the same suite across Python 3.11, 3.12, and 3.13 on every push.
The core retriever + guards are surface-agnostic, so the repo also ships the pieces to run them over a real knowledge base with a three-tier truth ladder:
- Local notes β
corpus.pyturns a folder of markdown into{id, text, source, weight}chunks;index.pybuilds a persisted, fingerprinted TF-IDF index that rebuilds only when the corpus changes (a note is searchable the moment it's saved);service.pykeeps a warm singleton andcli.pyis astdin β JSONquery entry. - The web β when local notes can't ground the answer,
webverify.pyescalates to web search (injected, so the core stays network-free and unit-testable). - Corroboration β a claim is only
WEB-VERIFIEDwhen β₯2 independent, authority-weighted sources agree (official > established > social; social-only can't clear the bar).
Answers from the guarded pipeline carry a confidence stamp and never come back empty:
β GROUNDED β backed by your notes
β WEB-VERIFIED β β₯2 independent sources agree (cited)
β SINGLE SOURCE β found on the web, one source only (cited)
β SOURCES CONFLICT β sources disagree; both shown
β UNVERIFIED β couldn't back it anywhere (best-effort, flagged)
How I actually run it: a Claude Code UserPromptSubmit hook (bin/hook_userpromptsubmit.py)
grounds every terminal session against my own notes + wikis β advisory (a hook can't force a
refusal), silent when nothing relevant matches. That part is live. A companion assistant-side
wrap (it lives in my chief-of-staff repo, not this one) turns the same core into real
refuse-teeth β the returned string is the delivered message, with a corrective retry loop and
labeled fallback; it's built and adversarially reviewed but not yet activated.
Honest v1 limits: retrieval is lexical TF-IDF (swap embeddings behind the same retrieve() seam);
content-level syndication detection is deferred (independence is by-publisher for now); the
contradicts_local flag is defined but not yet surfaced.
None of the ideas here are novel β and a guardrail library shouldn't pretend otherwise. What's uncommon is the packaging: a small, readable, zero-dependency implementation you can audit in one sitting, with an eval harness, rather than a trained model, a heavyweight framework, or a hosted service. The lineage:
- Corrective / Self / Agentic RAG. "Escalate to the web when local retrieval is weak" is literally the core of Corrective RAG (CRAG) (arXiv 2401.15884); the groundedness self-check echoes Self-RAG (arXiv 2310.11511). Frameworks operationalize the same loops β LangGraph agentic RAG (docs) and LlamaIndex agentic RAG.
- Guardrails & groundedness eval. Refuse-when-unsupported and groundedness scoring are commodity: Guardrails AI (repo), NVIDIA NeMo Guardrails (repo), Amazon Bedrock contextual grounding check (docs), and the eval libraries RAGAS faithfulness (docs) and TruLens (RAG triad).
- Cross-source corroboration is an active research area, not a solved primitive: resolving conflicting evidence with source credibility (arXiv 2505.17762), corroborating-and-refuting evidence retrieval (arXiv 2503.07937), and multi-source disagreement modeling (arXiv 2602.18693); consumer answer engines like Perplexity productize cited web answers. rag-guard sits downstream of this literature β it packages authority-weighted corroboration as a small guard tier; it does not contribute a new verification method.
The claim, precisely: not "I invented self-verifying RAG," but "here's a clean, tested, dependency-free implementation of the guardrail patterns, wired into a real workflow, with the confidence and limitations labeled honestly."
A guardrail library has to hold itself to its own bar, so the repo is gated:
- Coverage-gated test matrix β pytest on Python 3.11β3.13, build fails below the coverage floor (currently 99% covered).
- CodeQL β
security-extendedstatic analysis on every push, PR, and weekly; findings surface in the Security tab. - Pinned supply chain β GitHub Actions pinned to commit SHAs, kept current by Dependabot.
- Branch protection β
mainrequires CI + CodeQL to pass before a merge. - Disclosure policy β see SECURITY.md; private reporting is enabled.
Built by Jeff Otterson (Jott2121). Companion to agent-gate (an MCP gate for agent work), bow, fleet-mode, and agent-cost-attribution. This one's job is simple: if the context can't back the answer, the answer doesn't ship. MIT.
