Skip to content

adwibha/rag-agentic-search

Repository files navigation

Agentic RAG System

An advanced Retrieval-Augmented Generation (RAG) system combining agentic AI, semantic search, and lexical ranking for intelligent document retrieval and synthesis.

This project extends the foundational RAG concepts from the Anthropic Academy RAG Course with production-grade implementations of multiple retrieval strategies and agentic decision-making.

Overview

This system demonstrates three complementary retrieval approaches:

  • Agentic Search: Claude makes intelligent decisions about when and what to retrieve using tool use
  • Semantic Retrieval: VoyageAI embeddings with vector similarity search
  • Lexical Retrieval: BM25 keyword-based ranking
  • Hybrid Ranking: Reciprocal Rank Fusion combining semantic and lexical results

The combination enables sophisticated queries across complex, multi-domain documents with high precision and recall.

Features

  • Agentic reasoning with Claude Sonnet 4.6 (tool use)
  • Dual retrieval mechanisms (semantic + lexical)
  • Hybrid ranking with Reciprocal Rank Fusion (RRF)
  • Persistent local vector database (Chroma)
  • Custom VectorIndex with cosine/euclidean distance metrics
  • Production-grade BM25 implementation
  • Streamlit web interface
  • Type-safe, validated codebase

Quick Start

Installation

git clone https://github.com/adwibha/rag-agentic-search.git
cd rag-agentic-search
pip install -r requirements.txt

Configuration

cp .env.example .env
# Edit .env with your API keys:
# ANTHROPIC_API_KEY=sk-ant-...
# VOYAGE_API_KEY=pa-...

Ingest Document

python -m src.ingest

This will:

  • Read and chunk the report document
  • Generate embeddings using VoyageAI
  • Populate Chroma vector database
  • Create persistent storage in ./chroma_db/

Run Application

streamlit run app.py

Open http://localhost:8501 in your browser.

Project Structure

src/
├── agent.py         - Claude agentic loop with tool use
├── chunker.py       - Document segmentation (3 strategies)
├── embedder.py      - VoyageAI embedding wrapper
├── ingest.py        - Data ingestion pipeline
├── retrieval.py     - VectorIndex, BM25Index, HybridRetriever
└── vector_store.py  - Chroma persistence wrapper

notebooks/
├── 001_chunking.ipynb      - Document chunking exploration
├── 002_embeddings.ipynb    - Embedding generation
├── 003_vectordb.ipynb      - Vector database implementation
├── 004_bm25.ipynb          - BM25 keyword search
└── 005_hybrid.ipynb        - Hybrid retrieval with RRF

app.py              - Streamlit web interface
report.md           - Sample interdisciplinary research document
requirements.txt    - Python dependencies
.env.example        - API key template

Core Components

VectorIndex (Semantic Search)

Semantic similarity using embeddings with configurable distance metrics.

from src.retrieval import VectorIndex
from src.embedder import VoyageEmbedder

embedder = VoyageEmbedder()
index = VectorIndex(
    distance_metric="cosine",
    embedding_fn=embedder.embed_documents
)
index.add_documents(documents)
results = index.search("query text", k=5)

Features:

  • Cosine and Euclidean distance metrics
  • Batch embedding support
  • Dimension validation
  • Custom embedding function support

BM25Index (Lexical Search)

Keyword-based ranking using the BM25 algorithm.

from src.retrieval import BM25Index

index = BM25Index(k1=1.5, b=0.75)
index.add_documents(documents)
results = index.search("query text", k=5)

Features:

  • Configurable k1 and b parameters
  • Custom tokenization
  • IDF calculation and scoring
  • Score normalization

HybridRetriever (Combined Strategy)

Reciprocal Rank Fusion combining multiple indexes.

from src.retrieval import VectorIndex, BM25Index, HybridRetriever

hybrid = HybridRetriever(bm25_index, vector_index)
results = hybrid.search("query text", k=5, k_rrf=60)

Features:

  • Balanced scoring from multiple sources
  • Duplicate removal
  • Configurable k_rrf parameter
  • Optimal for diverse query types

AgenticRAG (Intelligent Retrieval)

Claude decides when and what to retrieve using tool use.

from src.agent import AgenticRAG

agent = AgenticRAG(vector_store, embedder)
answer, retrieved_chunks = agent.query("What about XDR-471?")

Features:

  • Tool use pattern for agent reasoning
  • Multi-turn query refinement capability
  • Retrieved chunk transparency
  • Grounded responses backed by document content

Document

The sample document (report.md) contains an interdisciplinary research review covering:

  1. Medical Research - XDR-471 syndrome findings
  2. Software Engineering - Project Phoenix stability
  3. Financial Analysis - Quarterly performance review
  4. Scientific Experimentation - Material composite properties
  5. Legal Developments - IP and regulatory compliance
  6. Product Engineering - Hardware specifications
  7. Historical Research - Galveston Accords analysis
  8. Project Management - Multi-phase project tracking
  9. Pharmaceutical Development - Clinical trial data
  10. Cybersecurity Analysis - Incident response documentation

This realistic multi-domain document demonstrates the system's ability to handle complex cross-domain queries.

Technology Stack

  • LLM: Claude Sonnet 4.6 (Anthropic API)
  • Embeddings: VoyageAI voyage-3-large
  • Vector Database: Chroma (local, persistent)
  • Search Algorithms: Custom implementations (BM25, RRF)
  • Frontend: Streamlit
  • Language: Python 3.9+

Architecture

graph TD
    A["User Query<br/>(Streamlit Interface)"]
    B["Claude Agent<br/>(Tool Use Pattern)"]
    C{"Search<br/>Needed?"}
    D["HybridRetriever"]
    E["VectorIndex<br/>(Semantic)"]
    F["VoyageAI<br/>Embeddings"]
    G["BM25Index<br/>(Lexical)"]
    H["Keyword<br/>Matching"]
    I["Reciprocal Rank<br/>Fusion"]
    J["Claude Answer<br/>Generation"]
    K["Results Display<br/>(Streamlit)"]

    A --> B
    B --> C
    C -->|Yes| D
    C -->|No| J
    D --> E
    D --> G
    E --> F
    G --> H
    F --> I
    H --> I
    I --> J
    J --> K

    style A fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style B fill:#7B68EE,stroke:#4B3B9B,color:#fff
    style C fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style D fill:#50C878,stroke:#2D7A4A,color:#fff
    style E fill:#87CEEB,stroke:#4A7C9E,color:#fff
    style F fill:#FFB347,stroke:#B8860B,color:#000
    style G fill:#87CEEB,stroke:#4A7C9E,color:#fff
    style H fill:#FFB347,stroke:#B8860B,color:#000
    style I fill:#DDA0DD,stroke:#8B6B8B,color:#fff
    style J fill:#90EE90,stroke:#4B7D4B,color:#000
    style K fill:#4A90E2,stroke:#2E5C8A,color:#fff
Loading

Usage Examples

Basic Semantic Search

from src.embedder import VoyageEmbedder
from src.retrieval import VectorIndex

embedder = VoyageEmbedder()
index = VectorIndex(embedding_fn=embedder.embed_documents)
index.add_documents([{"content": text} for text in chunks])

results = index.search("XDR-471 findings", k=3)
for doc, score in results:
    print(f"Score: {score:.3f} | {doc['content'][:100]}")

Keyword Search with BM25

from src.retrieval import BM25Index

bm25 = BM25Index()
bm25.add_documents([{"content": text} for text in chunks])

results = bm25.search("Project Phoenix stability", k=5)
for doc, score in results:
    print(f"BM25 Score: {score:.3f} | {doc['content'][:100]}")

Hybrid Search with RRF

from src.retrieval import VectorIndex, BM25Index, HybridRetriever

hybrid = HybridRetriever(bm25_index, vector_index)
results = hybrid.search("research findings", k=5, k_rrf=60)
for doc, score in results:
    print(f"RRF Score: {score:.3f} | {doc['content'][:100]}")

Agentic Query

agent = AgenticRAG(vector_store, embedder)
answer, retrieved = agent.query(
    "What are the key research findings?"
)
print("Answer:")
print(answer)
print(f"\nRetrieved {len(retrieved)} sections")

Retrieval Strategy Comparison

Aspect Semantic Lexical Hybrid
Speed ~100ms <50ms ~200ms
Query Understanding High Low High
Exact Matches Poor Excellent Good
Semantic Understanding Excellent None Excellent
API Cost High None High
Best For Paraphrased, abstract Specific terms Mixed queries

Development

Running Jupyter Notebooks

jupyter notebook

Notebooks demonstrate:

  • Document chunking strategies (001_chunking)
  • Embedding generation (002_embeddings)
  • Vector database implementation (003_vectordb)
  • BM25 ranking algorithm (004_bm25)
  • Hybrid retrieval with RRF (005_hybrid)

Code Validation

All Python files compile and import correctly:

python -m py_compile src/*.py app.py

Type Checking

Use mypy for static type analysis (optional):

pip install mypy
mypy src/ app.py

Configuration

Environment Variables

ANTHROPIC_API_KEY=sk-ant-...      # Claude API key
VOYAGE_API_KEY=pa-...              # VoyageAI API key

Retrieval Parameters

Adjust behavior by modifying code or environment:

# Distance metric for semantic search
index = VectorIndex(distance_metric="euclidean")

# BM25 tuning
bm25 = BM25Index(k1=2.0, b=0.5)

# Hybrid ranking
retriever.search(query, k=10, k_rrf=60)

Performance

Typical query latencies:

  • Embedding generation: 2-5 seconds (API call)
  • Vector search: <100ms (local)
  • BM25 search: <50ms (local)
  • Claude response: 3-10 seconds
  • Total end-to-end: 8-25 seconds

Memory usage: ~500MB for Chroma with 11 sections

Known Limitations

  • Single document only (extensible to multiple documents)
  • No conversation history (each query independent)
  • Batch API calls limited by VoyageAI rate limits
  • Claude context window limits responses to ~2000 tokens

Future Enhancements

  • Multi-document support with source tracking
  • Conversation memory and context preservation
  • Query expansion and refinement
  • Caching layer for repeated queries
  • Streaming responses for better UX
  • Cloud deployment (Hugging Face Spaces)
  • Advanced query analysis and reformulation

Related Resources

Learning Outcomes

By studying this codebase, you will understand:

  1. How RAG systems combine retrieval and generation
  2. Agentic patterns with Claude's tool use feature
  3. Semantic search with embeddings
  4. Lexical search with BM25
  5. Hybrid ranking strategies
  6. Production-grade Python practices
  7. Integration with multiple APIs
  8. Building user interfaces for AI systems

License

MIT

Attribution

This project extends the RAG concepts from the Anthropic Academy RAG Course with custom implementations of advanced retrieval strategies and agentic reasoning patterns.

About

Advanced RAG system combining agentic AI, semantic search, and lexical ranking. Extends Anthropic Academy RAG Course concepts with Claude tool_use, VectorIndex, BM25, and Reciprocal Rank Fusion.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors