A collection of production-quality RAG systems built with Weaviate — exploring what it actually takes to go beyond "vector DB plus a prompt."
Robust RAG is not simply a "vector DB plus a prompt" — it's a layered decision system where each stage has a clear responsibility:
| Stage | Responsibility |
|---|---|
| Ingest | Normalize, parse structure, attach metadata — every source is versioned for traceability |
| Index | Embeddings + BM25; parent-child storage preserves context without sacrificing retrieval precision |
| Retrieve | Hybrid search, MMR diversity filtering, relevance thresholding — bad evidence never reaches the model |
| Generate | Bounded prompt — the model answers only from retrieved context, never from parametric memory |
| Evaluate | Instrument faithfulness and relevance at every stage; silent failures are the hardest to catch in production |
This layering directly mitigates the three root causes of most production RAG failures: bad evidence, weak retrieval, and poor uncertainty handling.
The guiding principle in this repo is simple: use frameworks only where they remove undifferentiated plumbing, and avoid them where they obscure the critical path.
LlamaIndex fits this principle at the ingestion boundary. It provides fast, flexible document parsing and chunking, eliminating routine wiring that adds no real value.
When a use case pushes beyond generic framework behaviour — tighter latency, customisation needs, or stability constraints — the strategy shifts to a minimal custom orchestration layer, where retrieval logic, prompt construction, and observability remain explicit and fully transparent.
pdf-rag-ts — TypeScript
PDF Q&A with hybrid BM25 + semantic search, four chunking strategies, query expansion, MMR diversity filtering, relevance thresholding, and HyDE. Powered by Gemini for embeddings and generation, LlamaParse for structured PDF parsing.
Stack: TypeScript · Weaviate · Gemini · LlamaParse
pdf-rag-python — Python
The Python sibling — same hybrid search pipeline with a Redis-backed semantic cache that serves repeated or semantically similar questions without hitting the LLM again.
Stack: Python · Weaviate · Ollama · LlamaParse · Redis
rag-tutorial — Tutorial
Builds a RAG pipeline from scratch over a 7k-book dataset. Covers collection setup, ingestion, semantic search, and generative search — a clean starting point for understanding how the pieces fit together.
Stack: TypeScript · Weaviate · Ollama
See each project's README for full setup instructions.