Skip to content

Rahuljangs/atomicrag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AtomicRAG

"Vector search finds what looks similar. AtomicRAG finds what's actually connected."

Stop retrieving chunks. Start retrieving facts.

AtomicRAG is a graph-based Retrieval-Augmented Generation (RAG) library that breaks documents into atomic knowledge units, links them through an entity graph, and retrieves answers via iterative graph traversal. It significantly outperforms traditional vector-only RAG on complex, multi-hop queries.

Inspired by the Clue-RAG research paper. Reimagined as a production-ready, model-agnostic Python library.


Why AtomicRAG?

Traditional RAG retrieves chunks -- large blocks of text matched by cosine similarity. This works for simple queries but fails when:

  • The answer requires connecting information across multiple documents
  • Two chunks look similar but are about different things (disambiguation)
  • The query requires multi-hop reasoning ("Who founded the company that acquired X?")

AtomicRAG solves this by:

  1. Decomposing chunks into atomic facts (Knowledge Units) -- self-contained statements
  2. Linking facts through an entity graph -- products, people, features, versions
  3. Traversing the graph to find connected information, not just similar text

Features

  • Graph-based RAG -- multi-partite graph (Chunks -> Knowledge Units -> Entities)
  • 3 extraction methods -- llm (best quality), vocabulary (near-zero cost), sentence (zero cost)
  • Model-agnostic -- bring any LLM and embedding model (OpenAI, Gemini, Ollama, HuggingFace, or your own)
  • Parallel processing -- configurable concurrency for LLM calls and embedding batches
  • Zero required dependencies -- core needs only numpy. Everything else is optional.
  • Every parameter configurable -- chunk size, prompts, traversal depth, beam size, scoring, callbacks
  • DB-agnostic output -- export to JSON, dict, PostgreSQL, or any storage you want
  • Protocol-based -- no forced inheritance. If your class has a generate() method, it works.
  • Save and reload -- build the graph once, query it forever without re-indexing
  • Scales to 100K+ chunks -- vocabulary method: 88 LLM calls for 100K documents (measured), vs 100K+ for per-chunk systems

Table of Contents


Benchmark Results

Evaluated on GraphRAG-Bench (Medical) — 2,062 questions across 4 difficulty levels from NCCN clinical guidelines.

Answer Accuracy (ACC) — GraphRAG-Bench Medical

Model Fact Retrieval Complex Reasoning Ctx Summarize Creative Gen Avg ACC
HippoRAG2 66.28 61.98 63.08 68.05 64.85
Fast-GraphRAG 60.93 61.73 67.88 65.93 64.12
LightRAG 63.32 61.32 63.14 67.91 63.92
RAG (w/ rerank) 64.73 58.64 65.75 60.61 62.43
HippoRAG 56.14 55.87 59.86 64.43 59.08
RAPTOR 54.07 53.20 58.73 62.38 57.10
AtomicRAG (vocab) 61.45 55.85 62.35 45.87 56.38
MS-GraphRAG 38.63 47.04 41.87 53.11 45.16

Leaderboard scores use GPT-4o-mini as judge; AtomicRAG uses Gemini 2.5 Flash. See detailed results for full analysis.

Retrieval Quality — GraphRAG-Bench Medical

Metric Fact Retrieval Complex Reasoning Ctx Summarize Creative Gen Average
Context Relevance 85.54 80.60 82.53 78.61 81.82
Context Recall 77.37 62.77 46.71 47.56 58.60
Faithfulness 91.29 85.49 89.40 74.30 85.12

Evaluated on all 2,062 questions with Gemini 2.5 Pro as judge. See detailed results for methodology and per-type breakdowns.

The Efficiency Advantage — Measured on Public Corpora

AtomicRAG's vocabulary method builds the knowledge graph with near-constant LLM calls regardless of corpus size. Every other GraphRAG system makes 1-3 LLM calls per chunk.

Measured results (spaCy NER + noun chunks → frequency filter → batch divide, max_terms_per_call=500):

Corpus Documents Chunks AtomicRAG (vocab) Per-Chunk Systems Reduction
PubMedQA 1,000 2,043 7 calls 2,043 calls 292x
MS MARCO 10,000 10,001 12 calls 10,001 calls 833x
MS MARCO 50,000 50,008 49 calls 50,008 calls 1,020x
MS MARCO 100,000 100,020 88 calls 100,020 calls 1,137x
WikiText-103 10,000 67,796 226 calls 67,796 calls 300x
WikiText-103 29,023 198,145 523 calls 198,145 calls 379x

100x more documents = only ~12x more LLM calls. At enterprise scale, AtomicRAG achieves competitive accuracy at <0.1% of the indexing cost of other GraphRAG systems.

See benchmark/README.md for the complete analysis with cost breakdowns, retrieval metrics, and reproduction steps.


Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager

Install

# Core library (only requires numpy)
pip install atomicrag

# With a specific LLM/embedding provider:
pip install atomicrag[openai]        # OpenAI (GPT-4, text-embedding-3)
pip install atomicrag[gemini]        # Google Gemini (gemini-2.5-flash, gemini-embedding-001)
pip install atomicrag[ollama]        # Ollama (local models like llama3)
pip install atomicrag[huggingface]   # HuggingFace (sentence-transformers)
pip install atomicrag[langchain]     # LangChain adapter (wrap any LangChain model)

# With database storage:
pip install atomicrag[pgvector]      # PostgreSQL + pgvector

# Everything:
pip install atomicrag[all]

Install from source (development)

git clone https://github.com/Rahuljangs/atomicrag.git
cd atomicrag
pip install -e ".[dev]"

Quick Start

The fastest way to get running -- 5 lines of code:

from atomicrag import IndexPipeline, RetrievePipeline
from atomicrag.integrations.openai import OpenAILLM, OpenAIEmbedding

# 1. Build the knowledge graph from your documents
graph = IndexPipeline(
    llm=OpenAILLM(api_key="sk-..."),
    embedding=OpenAIEmbedding(api_key="sk-..."),
).run(["Your document text here...", "Another document..."])

# 2. Query it
results = RetrievePipeline(
    graph=graph,
    llm=OpenAILLM(api_key="sk-..."),
    embedding=OpenAIEmbedding(api_key="sk-..."),
).search("What are the key features?")

# 3. Use the results
for item in results.items:
    print(f"{item.score:.3f}: {item.content}")

Extraction Methods

AtomicRAG offers 3 extraction methods to balance cost, speed, and quality. Choose the one that fits your scale:

llm -- Best Quality (Default)

Uses your LLM to extract atomic facts from every chunk. Produces disambiguated, self-contained knowledge units.

config = AtomicRAGConfig(ku_extraction_method="llm")
  • Quality: Highest -- LLM resolves pronouns, simplifies complex sentences
  • Cost: ~$0.07 per chunk (Gemini Flash) / ~$0.09 (GPT-4o-mini)
  • Speed: ~10-30 seconds per chunk
  • Best for: Small to medium corpora (<10K chunks)

vocabulary -- Near-Zero Cost (Recommended for Large Corpora)

Builds a global entity vocabulary from the entire corpus using NLP (spaCy NER + n-grams), sends the vocabulary to the LLM in a single batch call for filtering and categorisation, then sentence-splits all chunks and matches entities via fast string search.

config = AtomicRAGConfig(ku_extraction_method="vocabulary")
  • Quality: ~93% of LLM quality (tested on real documents)
  • Cost: 7–88 LLM calls for the entire corpus (measured on 1K–100K docs)
  • Speed: Minutes for 100K+ chunks
  • Best for: Large corpora (10K-1M+ chunks)

sentence -- Zero Cost (Pure NLP)

Uses spaCy sentence splitting with NER-based entity extraction. No LLM calls at all. Falls back to regex splitting if spaCy is not installed.

config = AtomicRAGConfig(ku_extraction_method="sentence")
  • Quality: 70-80% of LLM quality
  • Cost: $0 -- no API calls
  • Speed: Fastest -- processes 100K chunks in seconds
  • Best for: Prototyping, budget-zero scenarios, or when LLM access is unavailable

Comparison

Method Quality Cost (300K chunks) LLM Calls Speed
llm 100% ~$270 300,000 ~22 hours
vocabulary 93% ~$5-15 1-5 ~30 minutes
sentence 70-80% $0 0 ~2 minutes

Parallel Processing

Speed up indexing with concurrent LLM calls and embedding batches:

config = AtomicRAGConfig(
    ku_concurrency=10,          # 10 parallel LLM calls for KU extraction
    embedding_concurrency=10,   # 10 parallel embedding batch calls
    embedding_batch_size=200,   # 200 texts per batch
)

Benchmark (real document, Gemini 2.5 Flash)

Mode Time Speedup
Sequential (concurrency=1) 352s 1x
Parallel (concurrency=10) 53s 6.7x

Parallel processing works with all three extraction methods. For the vocabulary method, it parallelises the LLM filtering calls when the candidate list is large enough to require multiple batches.


Step-by-Step Guide

Step 1: Choose your models

Pick an LLM and an embedding model. AtomicRAG supports any provider:

# Option A: OpenAI
from atomicrag.integrations.openai import OpenAILLM, OpenAIEmbedding
llm = OpenAILLM(api_key="sk-...", model="gpt-4o-mini")
embedding = OpenAIEmbedding(api_key="sk-...", model="text-embedding-3-small")

# Option B: Google Gemini
from atomicrag.integrations.gemini import GeminiLLM, GeminiEmbedding
llm = GeminiLLM(api_key="AIza...", model="gemini-2.5-flash")
embedding = GeminiEmbedding(api_key="AIza...", model="models/gemini-embedding-001")

# Option C: Local Ollama (no API key needed)
from atomicrag.integrations.ollama import OllamaLLM, OllamaEmbedding
llm = OllamaLLM(model="llama3")
embedding = OllamaEmbedding(model="nomic-embed-text")

# Option D: Mix and match (e.g., Ollama LLM + HuggingFace embeddings)
from atomicrag.integrations.ollama import OllamaLLM
from atomicrag.integrations.huggingface import HuggingFaceEmbedding
llm = OllamaLLM(model="llama3")
embedding = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Step 2: Prepare your documents

Documents can be plain strings or dicts with metadata:

# Simple: list of strings
documents = [
    "First document text...",
    "Second document text...",
]

# With metadata: list of dicts
documents = [
    {"text": "First document text...", "doc_id": "doc-001"},
    {"text": "Second document text...", "doc_id": "doc-002"},
]

Step 3: Build the knowledge graph

from atomicrag import IndexPipeline, AtomicRAGConfig

config = AtomicRAGConfig(
    chunk_size=800,
    ku_extraction_method="vocabulary",  # Cost-efficient for large corpora
    ku_concurrency=10,                  # Parallel processing
    verbose=True,
)

graph = IndexPipeline(llm=llm, embedding=embedding, config=config).run(documents)

print(graph.stats())
# {'chunks': 15, 'knowledge_units': 142, 'entities': 87, 'edges': 312}

Step 4: Save the graph (optional but recommended)

# Save to JSON -- you never need to re-index
graph.to_json("my_knowledge_graph.json")

# Load it back later
from atomicrag.models.graph import KnowledgeGraph
graph = KnowledgeGraph.from_json("my_knowledge_graph.json")

Step 5: Query the graph

from atomicrag import RetrievePipeline

retriever = RetrievePipeline(graph=graph, llm=llm, embedding=embedding, config=config)

results = retriever.search("What are the security certifications?")

for item in results.items:
    print(f"Score: {item.score:.3f}")
    print(f"Content: {item.content[:200]}")
    print(f"Entities: {item.entity_names}")
    print()

Step 6: Use the results in your application

# Get as JSON (for APIs)
json_str = results.to_json()

# Get as dict (for further processing)
data = results.to_dict()

# Access individual fields
results.query                # The original query
results.entities_extracted   # Entities found in the query
results.graph_stats          # How many nodes were traversed
results.items[0].content     # Top result text
results.items[0].score       # Relevance score (0-1)
results.items[0].entity_names  # Entities in the retrieval path

Bring Your Own Model

AtomicRAG uses Python Protocol classes -- no inheritance required. Any object with the right methods works automatically.

Custom LLM

Your class just needs a generate(prompt: str) -> str method:

import requests

class MyCustomLLM:
    def __init__(self, endpoint):
        self.endpoint = endpoint

    def generate(self, prompt: str) -> str:
        response = requests.post(self.endpoint, json={"prompt": prompt})
        return response.json()["text"]

# Use it directly
graph = IndexPipeline(llm=MyCustomLLM("http://my-api/v1/generate"), embedding=...).run(docs)

Custom Embedding

Your class needs embed_text(text: str) -> list[float] and embed_batch(texts: list[str]) -> list[list[float]]:

from sentence_transformers import SentenceTransformer

class MyEmbedding:
    def __init__(self):
        self.model = SentenceTransformer("all-MiniLM-L6-v2")

    def embed_text(self, text: str) -> list[float]:
        return self.model.encode(text).tolist()

    def embed_batch(self, texts: list[str]) -> list[list[float]]:
        return self.model.encode(texts).tolist()

Wrap an existing LangChain model

from atomicrag.integrations.langchain import LangChainLLMAdapter, LangChainEmbeddingAdapter
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = LangChainLLMAdapter(ChatOpenAI(model="gpt-4o-mini"))
embedding = LangChainEmbeddingAdapter(OpenAIEmbeddings())

How It Works

Documents --> Chunks --> [LLM / NLP] --> Knowledge Units --> [NER] --> Entities
                                              |                         |
                                        Embed (vectors)           Embed (vectors)
                                              |                         |
                                        +------------- Graph ---------------+
                                        |  KU <--> Entity <--> KU <--> ... |
                                        +-----------------------------------+
                                                    |
                 Query --> Entity Anchoring --> Graph Traversal --> Ranking --> Results

Index Pipeline (offline, run once)

Step Component What it does
1 TextChunker Splits documents into overlapping chunks (configurable size, overlap, strategy)
2 KnowledgeUnitExtractor Extracts atomic facts -- via LLM (llm method), NLP sentence split + LLM vocab filter (vocabulary method), or pure NLP (sentence method)
3 EntityExtractor Identifies entities (products, people, features) from each fact via LLM metadata or spaCy NER
4 GraphBuilder Embeds all KUs and entities (with parallel batch support), assembles the multi-partite graph

Retrieve Pipeline (online, per query)

Step Component What it does
1 Anchoring Extracts entities from query, finds matching graph nodes + top-K similar KUs
2 GraphTraversal Q-Iter algorithm: iteratively expands through entity-KU connections with beam search
3 ResultRanker Scores KUs by similarity, groups by source chunk, returns top-N

The Q-Iter Algorithm

Based on the Clue-RAG paper, Q-Iter improves retrieval through:

  1. Entity Anchoring -- find where the query connects to the graph
  2. Iterative Expansion -- hop through Entity -> KU -> Entity connections (configurable depth)
  3. Query Updating -- subtract retrieved embeddings from the query vector to find diverse (not redundant) information
  4. Beam Search Pruning -- keep only the top-M most relevant paths at each depth

The Vocabulary Extraction Method

For large corpora where per-chunk LLM calls are too expensive:

  1. NLP Scan -- Extract candidate terms from all chunks using spaCy NER + noun chunks + n-grams
  2. Frequency Filter -- Remove terms appearing fewer than vocab_min_term_freq times (corpus-wide)
  3. LLM Filter -- Send candidates to LLM in batches of 500 for filtering and type categorisation (7–88 total calls for 1K–100K documents, measured on PubMedQA, MS MARCO, and WikiText-103)
  4. Sentence Split -- Split all chunks into sentences (one KU per sentence, no LLM)
  5. Entity Matching -- Fast regex-based matching of entities to KUs using compiled patterns sorted by length (longest-first to avoid substring collisions)

This pipeline decouples indexing cost from corpus size. LLM calls scale with vocabulary size (bounded by natural language), not document count.


Configuration

Every parameter has a sensible default. Override only what you need:

from atomicrag import AtomicRAGConfig

config = AtomicRAGConfig(
    # -- Extraction Method --
    ku_extraction_method="vocabulary",  # "llm", "vocabulary", or "sentence"

    # -- Chunking --
    chunk_size=500,
    chunk_overlap=100,
    chunk_strategy="sentence",         # "recursive", "sentence", or "fixed"

    # -- Knowledge Unit Extraction --
    ku_extraction_prompt="Your custom prompt here: {text_chunk}",
    ku_max_units_per_chunk=30,
    ku_batch_size=5,

    # -- Parallel Processing --
    ku_concurrency=10,                 # Parallel LLM calls
    embedding_concurrency=10,          # Parallel embedding batches
    embedding_batch_size=200,

    # -- Vocabulary Method --
    vocab_min_term_freq=2,             # Min corpus frequency for candidates
    vocab_max_terms_per_llm_call=500,  # Terms per LLM filtering call

    # -- Entity Extraction --
    entity_extraction_method="llm",    # "llm" or "spacy"
    entity_merge_similar=True,

    # -- Retrieval --
    anchor_top_k=10,
    traversal_depth=3,                 # 1=fast, 2=balanced, 3=thorough
    beam_size=15,
    query_update_weight=1.0,           # 0=off, 1=full diversity
    result_top_n=10,
    min_score_threshold=0.3,
    score_aggregation="max",           # "mean", "max", or "sum"

    # -- Progress --
    verbose=True,
    on_progress=lambda cur, total, stage: print(f"{stage}: {cur}/{total}"),
)

Load config from file

# From JSON
config = AtomicRAGConfig.from_json("config.json")

# From YAML (requires pyyaml)
config = AtomicRAGConfig.from_yaml("config.yaml")

# From environment variables (prefix: ATOMICRAG_)
# e.g., ATOMICRAG_CHUNK_SIZE=500, ATOMICRAG_VERBOSE=true
config = AtomicRAGConfig.from_env()

# From dict
config = AtomicRAGConfig.from_dict({"chunk_size": 500, "traversal_depth": 3})

Custom prompts

Override any LLM prompt used internally:

config = AtomicRAGConfig(
    # Custom KU extraction prompt (must contain {text_chunk})
    ku_extraction_prompt="""
    Break this text into individual facts. Each fact should be self-contained.
    Output JSON: {{"knowledge_units": [{{"content": "...", "entities": ["..."]}}]}}
    Text: {text_chunk}
    """,

    # Custom entity extraction prompt (must contain {text})
    entity_extraction_prompt="Find all entities in: {text}\nJSON: ...",

    # Custom query entity prompt (must contain {query})
    query_entity_prompt="Extract key concepts from: {query}\nJSON: ...",
)

You can also set prompts via environment variables:

  • ATOMICRAG_KU_PROMPT -- overrides KU extraction prompt
  • ATOMICRAG_ENTITY_PROMPT -- overrides entity extraction prompt
  • ATOMICRAG_QUERY_ENTITY_PROMPT -- overrides query entity prompt

Built-in Integrations

Provider LLM Class Embedding Class Install
OpenAI OpenAILLM(api_key, model) OpenAIEmbedding(api_key, model) pip install atomicrag[openai]
Google Gemini GeminiLLM(api_key, model) GeminiEmbedding(api_key, model) pip install atomicrag[gemini]
Ollama OllamaLLM(host, model) OllamaEmbedding(host, model) pip install atomicrag[ollama]
HuggingFace -- HuggingFaceEmbedding(model_name) pip install atomicrag[huggingface]
LangChain LangChainLLMAdapter(lc_llm) LangChainEmbeddingAdapter(lc_emb) pip install atomicrag[langchain]

All integrations import lazily -- you only need the SDK for the provider you use. The Gemini integration automatically detects and supports both the new google-genai SDK and the legacy google-generativeai SDK.


Storage Adapters

JSON (built-in, no extra dependencies)

from atomicrag.storage.json_storage import JSONStorage

storage = JSONStorage("my_graph.json")
storage.save(graph)                  # Save
graph = storage.load()               # Load
print(storage.exists())              # Check if file exists

PostgreSQL + pgvector

from atomicrag.storage.pgvector_storage import PGVectorStorage

storage = PGVectorStorage("postgresql://user:pass@localhost:5432/mydb", schema="atomicrag")
storage.create_tables()              # Create tables (idempotent)
storage.save(graph)                  # Save graph to DB
graph = storage.load()               # Load graph from DB

Direct serialization (no adapter needed)

# To/from JSON file
graph.to_json("graph.json")
graph = KnowledgeGraph.from_json("graph.json")

# To/from Python dict
data = graph.to_dict()
graph = KnowledgeGraph.from_dict(data)

Output Format

All outputs are plain Python dataclasses with no vendor lock-in.

RetrievalResult

results = retriever.search("my query")

results.query                # "my query"
results.entities_extracted   # ["Entity1", "Entity2"] -- extracted from query
results.graph_stats          # {"kus_retrieved": 47, "kus_scored": 42, "items_returned": 6}
results.items                # List[RetrievalItem]

RetrievalItem

item = results.items[0]

item.content                 # The retrieved text (original chunk or aggregated KUs)
item.score                   # Relevance score (0.0 to 1.0)
item.source_chunk_id         # UUID of the source chunk
item.knowledge_unit_ids      # List of KU UUIDs that contributed to this result
item.entity_names            # Entity names in the retrieval path
item.metadata                # Any additional metadata

Export

results.to_json()            # Returns JSON string
results.to_json("out.json")  # Writes to file
results.to_dict()            # Returns plain dict

Architecture

atomicrag/
├── __init__.py                  # Public API: IndexPipeline, RetrievePipeline, AtomicRAGConfig
├── config.py                    # AtomicRAGConfig -- every tunable parameter
├── models/
│   ├── protocols.py             # BaseLLM, BaseEmbedding (Protocol classes)
│   ├── graph.py                 # Chunk, KnowledgeUnit, Entity, KnowledgeGraph
│   └── results.py               # RetrievalItem, RetrievalResult
├── index/
│   ├── pipeline.py              # IndexPipeline orchestrator
│   ├── chunker.py               # TextChunker (recursive, sentence, fixed)
│   ├── extractor.py             # KnowledgeUnitExtractor (LLM-based, per-chunk)
│   ├── vocabulary_extractor.py  # VocabularyExtractor (NLP + single LLM call)
│   ├── entity_extractor.py      # EntityExtractor (LLM or spaCy)
│   └── graph_builder.py         # GraphBuilder (parallel embed + assemble)
├── retrieve/
│   ├── pipeline.py              # RetrievePipeline orchestrator
│   ├── anchoring.py             # Entity + Semantic anchoring
│   ├── traversal.py             # Q-Iter graph traversal with query updating
│   └── ranking.py               # ResultRanker (score + group + sort)
├── integrations/                # OpenAI, Gemini, Ollama, LangChain, HuggingFace
├── storage/                     # JSON file, PostgreSQL adapters
└── utils/
    ├── prompts.py               # Default prompts (all overridable)
    └── similarity.py            # Cosine similarity, top-K search

Configuration Reference

Extraction Method

Parameter Type Default Description
ku_extraction_method str "llm" Extraction strategy: "llm" (per-chunk LLM), "vocabulary" (global vocab + sentence split), or "sentence" (pure NLP)

Chunking

Parameter Type Default Description
chunk_size int 1000 Maximum characters per chunk
chunk_overlap int 200 Character overlap between consecutive chunks
chunk_strategy str "recursive" Splitting strategy: "recursive", "sentence", or "fixed"
chunk_separators list None Custom separator list for recursive splitting

Knowledge Unit Extraction

Parameter Type Default Description
ku_extraction_prompt str None Custom prompt template (must contain {text_chunk})
ku_max_units_per_chunk int 50 Maximum KUs to extract per chunk
ku_batch_size int 10 Chunks to process per LLM batch
ku_concurrency int 1 Parallel LLM calls (1 = sequential)

Vocabulary Method

Parameter Type Default Description
vocab_min_term_freq int 2 Minimum corpus frequency for a candidate term
vocab_max_terms_per_llm_call int 500 Maximum terms to send per LLM filtering call

Entity Extraction

Parameter Type Default Description
entity_extraction_prompt str None Custom prompt (must contain {text})
entity_extraction_method str "llm" Method: "llm" or "spacy"
entity_merge_similar bool True Merge entities with same normalized name
entity_types list None Restrict to specific entity types

Embedding

Parameter Type Default Description
embedding_batch_size int 100 Texts to embed per API batch call
embedding_concurrency int 1 Parallel embedding batch calls (1 = sequential)
embedding_dimensions int None Expected dimensions (auto-detected if None)

Graph Building

Parameter Type Default Description
deduplicate_entities bool True Merge duplicate entity nodes
min_entity_occurrences int 1 Discard entities appearing fewer times

Retrieval: Anchoring

Parameter Type Default Description
query_entity_prompt str None Custom prompt for query entity extraction
anchor_top_k int 10 Top-K KUs for semantic anchoring
entity_match_threshold float 0.8 Min similarity for fuzzy entity matching

Retrieval: Q-Iter Traversal

Parameter Type Default Description
traversal_depth int 2 Number of graph hops (1=fast, 2=balanced, 3=thorough)
beam_size int 10 Beam search width per depth level
query_update_weight float 1.0 Weight for query updating (0=disabled, 1=full)
max_kus_per_depth int 50 Cap on KUs collected per depth level

Retrieval: Ranking

Parameter Type Default Description
result_top_n int 6 Number of final results to return
min_score_threshold float 0.0 Minimum cosine similarity score to include
group_by_chunk bool True Aggregate KU scores per source chunk
score_aggregation str "mean" Aggregation: "mean", "max", or "sum"

Progress / Callbacks

Parameter Type Default Description
verbose bool False Print progress messages to stdout
on_chunk_processed callable None fn(chunk_index, total_chunks)
on_batch_complete callable None fn(batch_index, total_batches)
on_progress callable None fn(current, total, stage_name)

License

Apache 2.0 -- use freely in commercial and open-source projects.

Citation

If you use AtomicRAG in your research or projects, please cite:

@software{atomicrag,
  title={AtomicRAG: Graph-based RAG using Atomic Knowledge Units},
  author={Rahul Jangir},
  year={2026},
  url={https://github.com/Rahuljangs/atomicrag}
}

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, pre-commit hooks, and guidelines.

About

Graph-based RAG that extracts atomic knowledge units from documents, builds a multi-partite entity graph, and retrieves answers via iterative graph traversal. Outperforms vector-only RAG on complex, multi-hop queries. Plug-and-play with any Postgres + pgvector stack.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages