Agentic RAG System

An advanced Retrieval-Augmented Generation (RAG) system combining agentic AI, semantic search, and lexical ranking for intelligent document retrieval and synthesis.

This project extends the foundational RAG concepts from the Anthropic Academy RAG Course with production-grade implementations of multiple retrieval strategies and agentic decision-making.

Overview

This system demonstrates three complementary retrieval approaches:

Agentic Search: Claude makes intelligent decisions about when and what to retrieve using tool use
Semantic Retrieval: VoyageAI embeddings with vector similarity search
Lexical Retrieval: BM25 keyword-based ranking
Hybrid Ranking: Reciprocal Rank Fusion combining semantic and lexical results

The combination enables sophisticated queries across complex, multi-domain documents with high precision and recall.

Features

Agentic reasoning with Claude Sonnet 4.6 (tool use)
Dual retrieval mechanisms (semantic + lexical)
Hybrid ranking with Reciprocal Rank Fusion (RRF)
Persistent local vector database (Chroma)
Custom VectorIndex with cosine/euclidean distance metrics
Production-grade BM25 implementation
Streamlit web interface
Type-safe, validated codebase

Quick Start

Installation

git clone https://github.com/adwibha/rag-agentic-search.git
cd rag-agentic-search
pip install -r requirements.txt

Configuration

cp .env.example .env
# Edit .env with your API keys:
# ANTHROPIC_API_KEY=sk-ant-...
# VOYAGE_API_KEY=pa-...

Ingest Document

python -m src.ingest

This will:

Read and chunk the report document
Generate embeddings using VoyageAI
Populate Chroma vector database
Create persistent storage in ./chroma_db/

Run Application

streamlit run app.py

Open http://localhost:8501 in your browser.

Project Structure

src/
├── agent.py         - Claude agentic loop with tool use
├── chunker.py       - Document segmentation (3 strategies)
├── embedder.py      - VoyageAI embedding wrapper
├── ingest.py        - Data ingestion pipeline
├── retrieval.py     - VectorIndex, BM25Index, HybridRetriever
└── vector_store.py  - Chroma persistence wrapper

notebooks/
├── 001_chunking.ipynb      - Document chunking exploration
├── 002_embeddings.ipynb    - Embedding generation
├── 003_vectordb.ipynb      - Vector database implementation
├── 004_bm25.ipynb          - BM25 keyword search
└── 005_hybrid.ipynb        - Hybrid retrieval with RRF

app.py              - Streamlit web interface
report.md           - Sample interdisciplinary research document
requirements.txt    - Python dependencies
.env.example        - API key template

Core Components

VectorIndex (Semantic Search)

Semantic similarity using embeddings with configurable distance metrics.

from src.retrieval import VectorIndex
from src.embedder import VoyageEmbedder

embedder = VoyageEmbedder()
index = VectorIndex(
    distance_metric="cosine",
    embedding_fn=embedder.embed_documents
)
index.add_documents(documents)
results = index.search("query text", k=5)

Features:

Cosine and Euclidean distance metrics
Batch embedding support
Dimension validation
Custom embedding function support

BM25Index (Lexical Search)

Keyword-based ranking using the BM25 algorithm.

from src.retrieval import BM25Index

index = BM25Index(k1=1.5, b=0.75)
index.add_documents(documents)
results = index.search("query text", k=5)

Features:

Configurable k1 and b parameters
Custom tokenization
IDF calculation and scoring
Score normalization

HybridRetriever (Combined Strategy)

Reciprocal Rank Fusion combining multiple indexes.

from src.retrieval import VectorIndex, BM25Index, HybridRetriever

hybrid = HybridRetriever(bm25_index, vector_index)
results = hybrid.search("query text", k=5, k_rrf=60)

Features:

Balanced scoring from multiple sources
Duplicate removal
Configurable k_rrf parameter
Optimal for diverse query types

AgenticRAG (Intelligent Retrieval)

Claude decides when and what to retrieve using tool use.

from src.agent import AgenticRAG

agent = AgenticRAG(vector_store, embedder)
answer, retrieved_chunks = agent.query("What about XDR-471?")

Features:

Tool use pattern for agent reasoning
Multi-turn query refinement capability
Retrieved chunk transparency
Grounded responses backed by document content

Document

The sample document (report.md) contains an interdisciplinary research review covering:

Medical Research - XDR-471 syndrome findings
Software Engineering - Project Phoenix stability
Financial Analysis - Quarterly performance review
Scientific Experimentation - Material composite properties
Legal Developments - IP and regulatory compliance
Product Engineering - Hardware specifications
Historical Research - Galveston Accords analysis
Project Management - Multi-phase project tracking
Pharmaceutical Development - Clinical trial data
Cybersecurity Analysis - Incident response documentation

This realistic multi-domain document demonstrates the system's ability to handle complex cross-domain queries.

Technology Stack

LLM: Claude Sonnet 4.6 (Anthropic API)
Embeddings: VoyageAI voyage-3-large
Vector Database: Chroma (local, persistent)
Search Algorithms: Custom implementations (BM25, RRF)
Frontend: Streamlit
Language: Python 3.9+

Architecture

graph TD
    A["User Query<br/>(Streamlit Interface)"]
    B["Claude Agent<br/>(Tool Use Pattern)"]
    C{"Search<br/>Needed?"}
    D["HybridRetriever"]
    E["VectorIndex<br/>(Semantic)"]
    F["VoyageAI<br/>Embeddings"]
    G["BM25Index<br/>(Lexical)"]
    H["Keyword<br/>Matching"]
    I["Reciprocal Rank<br/>Fusion"]
    J["Claude Answer<br/>Generation"]
    K["Results Display<br/>(Streamlit)"]

    A --> B
    B --> C
    C -->|Yes| D
    C -->|No| J
    D --> E
    D --> G
    E --> F
    G --> H
    F --> I
    H --> I
    I --> J
    J --> K

    style A fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style B fill:#7B68EE,stroke:#4B3B9B,color:#fff
    style C fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style D fill:#50C878,stroke:#2D7A4A,color:#fff
    style E fill:#87CEEB,stroke:#4A7C9E,color:#fff
    style F fill:#FFB347,stroke:#B8860B,color:#000
    style G fill:#87CEEB,stroke:#4A7C9E,color:#fff
    style H fill:#FFB347,stroke:#B8860B,color:#000
    style I fill:#DDA0DD,stroke:#8B6B8B,color:#fff
    style J fill:#90EE90,stroke:#4B7D4B,color:#000
    style K fill:#4A90E2,stroke:#2E5C8A,color:#fff

Usage Examples

Basic Semantic Search

from src.embedder import VoyageEmbedder
from src.retrieval import VectorIndex

embedder = VoyageEmbedder()
index = VectorIndex(embedding_fn=embedder.embed_documents)
index.add_documents([{"content": text} for text in chunks])

results = index.search("XDR-471 findings", k=3)
for doc, score in results:
    print(f"Score: {score:.3f} | {doc['content'][:100]}")

Keyword Search with BM25

from src.retrieval import BM25Index

bm25 = BM25Index()
bm25.add_documents([{"content": text} for text in chunks])

results = bm25.search("Project Phoenix stability", k=5)
for doc, score in results:
    print(f"BM25 Score: {score:.3f} | {doc['content'][:100]}")

Hybrid Search with RRF

from src.retrieval import VectorIndex, BM25Index, HybridRetriever

hybrid = HybridRetriever(bm25_index, vector_index)
results = hybrid.search("research findings", k=5, k_rrf=60)
for doc, score in results:
    print(f"RRF Score: {score:.3f} | {doc['content'][:100]}")

Agentic Query

agent = AgenticRAG(vector_store, embedder)
answer, retrieved = agent.query(
    "What are the key research findings?"
)
print("Answer:")
print(answer)
print(f"\nRetrieved {len(retrieved)} sections")

Retrieval Strategy Comparison

Aspect	Semantic	Lexical	Hybrid
Speed	~100ms	<50ms	~200ms
Query Understanding	High	Low	High
Exact Matches	Poor	Excellent	Good
Semantic Understanding	Excellent	None	Excellent
API Cost	High	None	High
Best For	Paraphrased, abstract	Specific terms	Mixed queries

Development

Running Jupyter Notebooks

jupyter notebook

Notebooks demonstrate:

Document chunking strategies (001_chunking)
Embedding generation (002_embeddings)
Vector database implementation (003_vectordb)
BM25 ranking algorithm (004_bm25)
Hybrid retrieval with RRF (005_hybrid)

Code Validation

All Python files compile and import correctly:

python -m py_compile src/*.py app.py

Type Checking

Use mypy for static type analysis (optional):

pip install mypy
mypy src/ app.py

Configuration

Environment Variables

ANTHROPIC_API_KEY=sk-ant-...      # Claude API key
VOYAGE_API_KEY=pa-...              # VoyageAI API key

Retrieval Parameters

Adjust behavior by modifying code or environment:

# Distance metric for semantic search
index = VectorIndex(distance_metric="euclidean")

# BM25 tuning
bm25 = BM25Index(k1=2.0, b=0.5)

# Hybrid ranking
retriever.search(query, k=10, k_rrf=60)

Performance

Typical query latencies:

Embedding generation: 2-5 seconds (API call)
Vector search: <100ms (local)
BM25 search: <50ms (local)
Claude response: 3-10 seconds
Total end-to-end: 8-25 seconds

Memory usage: ~500MB for Chroma with 11 sections

Known Limitations

Single document only (extensible to multiple documents)
No conversation history (each query independent)
Batch API calls limited by VoyageAI rate limits
Claude context window limits responses to ~2000 tokens

Future Enhancements

Multi-document support with source tracking
Conversation memory and context preservation
Query expansion and refinement
Caching layer for repeated queries
Streaming responses for better UX
Cloud deployment (Hugging Face Spaces)
Advanced query analysis and reformulation

Related Resources

Learning Outcomes

By studying this codebase, you will understand:

How RAG systems combine retrieval and generation
Agentic patterns with Claude's tool use feature
Semantic search with embeddings
Lexical search with BM25
Hybrid ranking strategies
Production-grade Python practices
Integration with multiple APIs
Building user interfaces for AI systems

License

MIT

Attribution

This project extends the RAG concepts from the Anthropic Academy RAG Course with custom implementations of advanced retrieval strategies and agentic reasoning patterns.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
001_chunking.ipynb		001_chunking.ipynb
002_embeddings.ipynb		002_embeddings.ipynb
003_vectordb.ipynb		003_vectordb.ipynb
004_bm25.ipynb		004_bm25.ipynb
005_hybrid.ipynb		005_hybrid.ipynb
NOTEBOOKS.md		NOTEBOOKS.md
README.md		README.md
app.py		app.py
report.md		report.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG System

Overview

Features

Quick Start

Installation

Configuration

Ingest Document

Run Application

Project Structure

Core Components

VectorIndex (Semantic Search)

BM25Index (Lexical Search)

HybridRetriever (Combined Strategy)

AgenticRAG (Intelligent Retrieval)

Document

Technology Stack

Architecture

Usage Examples

Basic Semantic Search

Keyword Search with BM25

Hybrid Search with RRF

Agentic Query

Retrieval Strategy Comparison

Development

Running Jupyter Notebooks

Code Validation

Type Checking

Configuration

Environment Variables

Retrieval Parameters

Performance

Known Limitations

Future Enhancements

Related Resources

Learning Outcomes

License

Attribution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages