An advanced Retrieval-Augmented Generation (RAG) system combining agentic AI, semantic search, and lexical ranking for intelligent document retrieval and synthesis.
This project extends the foundational RAG concepts from the Anthropic Academy RAG Course with production-grade implementations of multiple retrieval strategies and agentic decision-making.
This system demonstrates three complementary retrieval approaches:
- Agentic Search: Claude makes intelligent decisions about when and what to retrieve using tool use
- Semantic Retrieval: VoyageAI embeddings with vector similarity search
- Lexical Retrieval: BM25 keyword-based ranking
- Hybrid Ranking: Reciprocal Rank Fusion combining semantic and lexical results
The combination enables sophisticated queries across complex, multi-domain documents with high precision and recall.
- Agentic reasoning with Claude Sonnet 4.6 (tool use)
- Dual retrieval mechanisms (semantic + lexical)
- Hybrid ranking with Reciprocal Rank Fusion (RRF)
- Persistent local vector database (Chroma)
- Custom VectorIndex with cosine/euclidean distance metrics
- Production-grade BM25 implementation
- Streamlit web interface
- Type-safe, validated codebase
git clone https://github.com/adwibha/rag-agentic-search.git
cd rag-agentic-search
pip install -r requirements.txtcp .env.example .env
# Edit .env with your API keys:
# ANTHROPIC_API_KEY=sk-ant-...
# VOYAGE_API_KEY=pa-...python -m src.ingestThis will:
- Read and chunk the report document
- Generate embeddings using VoyageAI
- Populate Chroma vector database
- Create persistent storage in
./chroma_db/
streamlit run app.pyOpen http://localhost:8501 in your browser.
src/
├── agent.py - Claude agentic loop with tool use
├── chunker.py - Document segmentation (3 strategies)
├── embedder.py - VoyageAI embedding wrapper
├── ingest.py - Data ingestion pipeline
├── retrieval.py - VectorIndex, BM25Index, HybridRetriever
└── vector_store.py - Chroma persistence wrapper
notebooks/
├── 001_chunking.ipynb - Document chunking exploration
├── 002_embeddings.ipynb - Embedding generation
├── 003_vectordb.ipynb - Vector database implementation
├── 004_bm25.ipynb - BM25 keyword search
└── 005_hybrid.ipynb - Hybrid retrieval with RRF
app.py - Streamlit web interface
report.md - Sample interdisciplinary research document
requirements.txt - Python dependencies
.env.example - API key template
Semantic similarity using embeddings with configurable distance metrics.
from src.retrieval import VectorIndex
from src.embedder import VoyageEmbedder
embedder = VoyageEmbedder()
index = VectorIndex(
distance_metric="cosine",
embedding_fn=embedder.embed_documents
)
index.add_documents(documents)
results = index.search("query text", k=5)Features:
- Cosine and Euclidean distance metrics
- Batch embedding support
- Dimension validation
- Custom embedding function support
Keyword-based ranking using the BM25 algorithm.
from src.retrieval import BM25Index
index = BM25Index(k1=1.5, b=0.75)
index.add_documents(documents)
results = index.search("query text", k=5)Features:
- Configurable k1 and b parameters
- Custom tokenization
- IDF calculation and scoring
- Score normalization
Reciprocal Rank Fusion combining multiple indexes.
from src.retrieval import VectorIndex, BM25Index, HybridRetriever
hybrid = HybridRetriever(bm25_index, vector_index)
results = hybrid.search("query text", k=5, k_rrf=60)Features:
- Balanced scoring from multiple sources
- Duplicate removal
- Configurable k_rrf parameter
- Optimal for diverse query types
Claude decides when and what to retrieve using tool use.
from src.agent import AgenticRAG
agent = AgenticRAG(vector_store, embedder)
answer, retrieved_chunks = agent.query("What about XDR-471?")Features:
- Tool use pattern for agent reasoning
- Multi-turn query refinement capability
- Retrieved chunk transparency
- Grounded responses backed by document content
The sample document (report.md) contains an interdisciplinary research review covering:
- Medical Research - XDR-471 syndrome findings
- Software Engineering - Project Phoenix stability
- Financial Analysis - Quarterly performance review
- Scientific Experimentation - Material composite properties
- Legal Developments - IP and regulatory compliance
- Product Engineering - Hardware specifications
- Historical Research - Galveston Accords analysis
- Project Management - Multi-phase project tracking
- Pharmaceutical Development - Clinical trial data
- Cybersecurity Analysis - Incident response documentation
This realistic multi-domain document demonstrates the system's ability to handle complex cross-domain queries.
- LLM: Claude Sonnet 4.6 (Anthropic API)
- Embeddings: VoyageAI voyage-3-large
- Vector Database: Chroma (local, persistent)
- Search Algorithms: Custom implementations (BM25, RRF)
- Frontend: Streamlit
- Language: Python 3.9+
graph TD
A["User Query<br/>(Streamlit Interface)"]
B["Claude Agent<br/>(Tool Use Pattern)"]
C{"Search<br/>Needed?"}
D["HybridRetriever"]
E["VectorIndex<br/>(Semantic)"]
F["VoyageAI<br/>Embeddings"]
G["BM25Index<br/>(Lexical)"]
H["Keyword<br/>Matching"]
I["Reciprocal Rank<br/>Fusion"]
J["Claude Answer<br/>Generation"]
K["Results Display<br/>(Streamlit)"]
A --> B
B --> C
C -->|Yes| D
C -->|No| J
D --> E
D --> G
E --> F
G --> H
F --> I
H --> I
I --> J
J --> K
style A fill:#4A90E2,stroke:#2E5C8A,color:#fff
style B fill:#7B68EE,stroke:#4B3B9B,color:#fff
style C fill:#FF6B6B,stroke:#C92A2A,color:#fff
style D fill:#50C878,stroke:#2D7A4A,color:#fff
style E fill:#87CEEB,stroke:#4A7C9E,color:#fff
style F fill:#FFB347,stroke:#B8860B,color:#000
style G fill:#87CEEB,stroke:#4A7C9E,color:#fff
style H fill:#FFB347,stroke:#B8860B,color:#000
style I fill:#DDA0DD,stroke:#8B6B8B,color:#fff
style J fill:#90EE90,stroke:#4B7D4B,color:#000
style K fill:#4A90E2,stroke:#2E5C8A,color:#fff
from src.embedder import VoyageEmbedder
from src.retrieval import VectorIndex
embedder = VoyageEmbedder()
index = VectorIndex(embedding_fn=embedder.embed_documents)
index.add_documents([{"content": text} for text in chunks])
results = index.search("XDR-471 findings", k=3)
for doc, score in results:
print(f"Score: {score:.3f} | {doc['content'][:100]}")from src.retrieval import BM25Index
bm25 = BM25Index()
bm25.add_documents([{"content": text} for text in chunks])
results = bm25.search("Project Phoenix stability", k=5)
for doc, score in results:
print(f"BM25 Score: {score:.3f} | {doc['content'][:100]}")from src.retrieval import VectorIndex, BM25Index, HybridRetriever
hybrid = HybridRetriever(bm25_index, vector_index)
results = hybrid.search("research findings", k=5, k_rrf=60)
for doc, score in results:
print(f"RRF Score: {score:.3f} | {doc['content'][:100]}")agent = AgenticRAG(vector_store, embedder)
answer, retrieved = agent.query(
"What are the key research findings?"
)
print("Answer:")
print(answer)
print(f"\nRetrieved {len(retrieved)} sections")| Aspect | Semantic | Lexical | Hybrid |
|---|---|---|---|
| Speed | ~100ms | <50ms | ~200ms |
| Query Understanding | High | Low | High |
| Exact Matches | Poor | Excellent | Good |
| Semantic Understanding | Excellent | None | Excellent |
| API Cost | High | None | High |
| Best For | Paraphrased, abstract | Specific terms | Mixed queries |
jupyter notebookNotebooks demonstrate:
- Document chunking strategies (001_chunking)
- Embedding generation (002_embeddings)
- Vector database implementation (003_vectordb)
- BM25 ranking algorithm (004_bm25)
- Hybrid retrieval with RRF (005_hybrid)
All Python files compile and import correctly:
python -m py_compile src/*.py app.pyUse mypy for static type analysis (optional):
pip install mypy
mypy src/ app.pyANTHROPIC_API_KEY=sk-ant-... # Claude API key
VOYAGE_API_KEY=pa-... # VoyageAI API keyAdjust behavior by modifying code or environment:
# Distance metric for semantic search
index = VectorIndex(distance_metric="euclidean")
# BM25 tuning
bm25 = BM25Index(k1=2.0, b=0.5)
# Hybrid ranking
retriever.search(query, k=10, k_rrf=60)Typical query latencies:
- Embedding generation: 2-5 seconds (API call)
- Vector search: <100ms (local)
- BM25 search: <50ms (local)
- Claude response: 3-10 seconds
- Total end-to-end: 8-25 seconds
Memory usage: ~500MB for Chroma with 11 sections
- Single document only (extensible to multiple documents)
- No conversation history (each query independent)
- Batch API calls limited by VoyageAI rate limits
- Claude context window limits responses to ~2000 tokens
- Multi-document support with source tracking
- Conversation memory and context preservation
- Query expansion and refinement
- Caching layer for repeated queries
- Streaming responses for better UX
- Cloud deployment (Hugging Face Spaces)
- Advanced query analysis and reformulation
- Anthropic Academy RAG Course
- Claude API Documentation
- VoyageAI Documentation
- Chroma Vector Database
- BM25 Algorithm
- Reciprocal Rank Fusion
By studying this codebase, you will understand:
- How RAG systems combine retrieval and generation
- Agentic patterns with Claude's tool use feature
- Semantic search with embeddings
- Lexical search with BM25
- Hybrid ranking strategies
- Production-grade Python practices
- Integration with multiple APIs
- Building user interfaces for AI systems
MIT
This project extends the RAG concepts from the Anthropic Academy RAG Course with custom implementations of advanced retrieval strategies and agentic reasoning patterns.