AI-powered document Q&A with built-in trust verification.
Because confidence matters when AI answers your business questions.
Quick Start • Architecture • Benchmarks • Ecosystem
Real-time WebSocket streaming with multi-stage status: retrieve → generate → verify trust.
Generic AI chatbots confidently make up facts. For business-critical use cases — compliance, safety documentation, internal knowledge bases — this "hallucination" is a dealbreaker.
Existing RAG solutions retrieve relevant docs, but still blindly trust the LLM's output.
TrustRAG ships with a 4-factor Trust Score that tells users why each answer is (or isn't) reliable.
| Factor | Weight | What it catches |
|---|---|---|
| Retrieval Similarity | 40% | Is the source actually relevant? |
| Source Count | 20% | Is the answer backed by multiple sources? |
| Source Agreement | 20% | Do sources agree with each other? |
| Hallucination Check | 20% | Does a second LLM agree the answer is grounded? |
Every answer ships with:
- 0-100 confidence score
- Source tracing down to page/paragraph
- 3-pass consistency check
- Secondary-LLM hallucination verification
- Full audit trail
graph TD
A[User Question] --> B[FastAPI Backend]
B --> C[Query Embedding<br/>fastembed local]
B --> D[Keyword Search<br/>Postgres tsvector]
C --> E[(pgvector<br/>semantic)]
E & D --> F[RRF Fusion]
F --> G[Top-K Sources]
G --> H[Groq Llama 3.3 70B<br/>Streaming Generation]
H --> I{Trust Score Engine}
I --> J[Retrieval Similarity 40%]
I --> K[Source Count 20%]
I --> L[Source Agreement 20%]
I --> M[Secondary LLM<br/>Hallucination Check 20%]
J & K & L & M --> N[Trust Score 0-100]
N --> O[Audit Trail]
O --> P[Response to User]
style I fill:#ffd700,stroke:#333,stroke-width:3px
style N fill:#90ee90,stroke:#333,stroke-width:2px
- Frontend: https://trustrag.vercel.app
- Backend: https://trustrag-production.up.railway.app
- Health probe:
/health(HEAD + GET both 200)
Pipeline runs Llama 3.3 70B Versatile via Groq, with merged self-check (HTTP path) and Postgres-backed query cache. Optimized to 5-10s cache-miss / sub-300ms cache-hit on Railway free tier (1GB RAM, 0.5 vCPU) via embedding cleanup + cache + UptimeRobot keep-alive. Latency engineering details →
Measured 2026-04-23 on 15-query synthetic construction-safety subset
(5 semantic + 5 keyword + 5 hybrid). Pipeline ran on
llama-3.1-8b-instant due to 70B daily-quota exhaustion that day;
RAGAS judged by Groq 8B for free-tier-friendly throughput.
| Metric | Semantic-only | Hybrid (RRF k=60) | Δ |
|---|---|---|---|
| RAGAS Faithfulness | 0.241 | 0.377 | +13.6pp ✓ |
| RAGAS Answer Relevancy | 0.729 | 0.596 | -13.3pp |
| RAGAS Context Precision | 0.128 | 0.101 | -2.7pp |
| RAGAS Context Recall | 0.377 | 0.273 | -10.4pp |
| Substring Match (overall) | 0.333 | 0.357 | +2.4pp ✓ |
| ↳ Semantic-leaning q | 0.300 | 0.400 | +10pp ✓ |
| ↳ Keyword-leaning q | 0.400 | 0.200 | -20pp |
Honest read: hybrid genuinely improves faithfulness (less hallucination, +13.6pp) and substring match on semantic queries (+10pp). Keyword-query degradation reflects 8B's difficulty synthesizing from broader RRF retrieval — likely closes on 70B. Sample is small (14-15q valid each side); deltas have ±5-10pp noise.
See docs/releases/v0.3.0-hybrid.md
for full methodology + raw JSONs in eval/results/.
git clone https://github.com/jigangz/trustrag
cd trustrag
cp .env.example .env
# Add your free Groq API key
docker compose up
# Frontend: http://localhost:5173Uses:
- Groq (free tier) — Llama 3.3 70B for generation
- fastembed (local) — BAAI/bge-small-en-v1.5, no API key
- pgvector + tsvector (self-hosted Postgres)
$0 to run.
# LangChain integration
pip install trustrag-langchain
# MCP server (for Claude Desktop / Cursor)
pip install trustrag-mcp
# Evaluation pipeline
pip install trustrag-eval| Integration | Package | Status |
|---|---|---|
| LangChain (Retriever + Tool + LangGraph Agent w/ trust budget) | trustrag-langchain |
v0.1.0 |
| MCP (Claude Desktop, Cursor, Claude Code; 3 tools) | trustrag-mcp |
v0.1.2 |
| RAGAS Eval Pipeline (Groq + Gemini judge variants) | trustrag-eval |
v0.1.0 |
| n8n Workflow Templates | integrations/n8n/ | 3 workflows |
Three tools available end-to-end (verified live against production
Railway with trustrag-mcp 0.1.2 — full I/O log in
docs/mcp-verification.md):
trustrag_query— knowledge base lookup with trust score + 4-factor breakdown + citationstrustrag_upload_document— PDF ingestion (parsed, chunked, embedded, indexed in pgvector)trustrag_get_audit_log— fetch low-trust query history with client-side trust/time filtering
Sample trustrag_query output (real, from production):
**Answer** (Trust: 72.8/100):
According to the sources, OSHA requires fall protection for employees on a
scaffold more than 10 feet above a lower level [Source: OSHA3150.pdf, page 41]...
**Sources**:
- OSHA3150.pdf (page 41, similarity 0.75)
- OSHA3150.pdf (page 5, similarity 0.74)
- OSHA3150.pdf (page 13, similarity 0.73)
**Trust Breakdown**: {agreement: 18.1, retrieval: 74.2, source_count: 15.0, hallucination: 10.0}
Setup: add to claude_desktop_config.json:
{
"mcpServers": {
"trustrag": {
"command": "uvx",
"args": ["trustrag-mcp"],
"env": { "TRUSTRAG_BACKEND_URL": "https://trustrag-production.up.railway.app" }
}
}
}See docs/releases/v0.5.0-mcp.md for details.
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/documents/upload |
Upload and process a PDF |
| GET | /api/documents |
List uploaded documents |
| DELETE | /api/documents/{id} |
Remove a document |
| POST | /api/query |
Ask a question with trust verification |
| WS | /api/ws |
WebSocket streaming queries |
| GET | /api/audit |
View query audit trail |
| GET | /api/health |
Health check |
- v0.2 Enhancement Design
- v2 Completion Design (latency + benchmark + release)
- Implementation Plan (11 tasks)
- Benchmark Results
- v0.2.0-streaming — WebSocket streaming
- v0.3.0-hybrid — Hybrid retrieval (measured)
- v0.4.0-langchain — LangChain + LangGraph agent
- v0.5.0-mcp — MCP Claude Desktop demo
- v1.0.0 — Production-grade
MIT — see LICENSE.
If you find TrustRAG useful, please star — it helps others discover it.
