This repository implements an Agentic GraphRAG system for Medical Diagnosis. It ingests medical literature, extracts structured clinical knowledge into a Neo4j knowledge graph with hierarchical communities, and answers diagnostic questions through multi-strategy retrieval and an agentic plan–research–verify reasoning loop.
- Schema-driven knowledge graph built from a runtime-injectable schema of medical entity and relation types that drives every extractor, resolver, summarizer, and prompt.
- Three-extractor fusion combining GLiNER NER, GLiREL relation extraction, and LLM extraction, merged with configurable union, intersection, max-score, or GLiNER-primary strategies.
- Two-stage entity resolution using a deterministic SemHash/MinHash pre-filter followed by clustering, BM25 + cosine candidate retrieval, and LLM deduplication.
- Hierarchical community detection using the Leiden algorithm (Graspologic or Neo4j GDS), with LLM-generated community reports summarising each cluster.
- Four vector collections (entity, relation, chunk, community) for complementary semantic search.
- Layered retrieval combining atomic search methods, pluggable rerankers, and data-only recipes, fused with Reciprocal Rank Fusion or a cross-encoder.
- Agentic plan–research–verify loop that decomposes the question, runs parallel researchers over retrieval tools, and gates synthesis on a deterministic sufficiency check.
- Pluggable storage with Neo4j for the graph and Qdrant or Weaviate for vectors.
The system is built using:
- GLiNER and GLiREL for local zero-shot entity and relation extraction.
- Graspologic / Neo4j GDS for Hierarchical Leiden community detection.
- Neo4j for the persistent knowledge graph with typed nodes, edges, and hierarchical communities.
- Qdrant / Weaviate for vector search and hybrid (dense + BM25) retrieval.
- BAML for type-safe, schema-injected LLM functions.
- LangGraph for orchestrating the extraction, embedding, and search pipelines.
- DeepAgents for the multi-agent reasoning loop.
- ZeroEntropy for embeddings and reranking.
The system stores everything in a single typed property graph in Neo4j with five node labels (Document, Chunk, Entity, Community, CommunityReport) and seven edge types:
(:Document) <-[:PART_OF]- (:Chunk) -[:HAS_ENTITY]-> (:Entity:<Type>)
(:Chunk) -[:NEXT_CHUNK]-> (:Chunk)
(:Entity) -[:RELATES_TO {type, description, score}]-> (:Entity)
(:Entity) -[:IN_COMMUNITY]-> (:Community)
(:Community)-[:PARENT_COMMUNITY]-> (:Community)
(:Community)-[:HAS_REPORT]-> (:CommunityReport)
- An entity carries its
name, medicallabel(Disease, Drug, …), an optionaldescription, an extractionscore, and free-formschema_properties. Its identity is(name, label). - Resolution later fills in a
canonical_nameand a list ofaliases(for example, metformin → metformin HCl, Glucophage). - A
provenancerecord tracks which extractors produced the entity, the surface forms seen in the text, and the source chunk ids and offsets.
- A relation is a directed subject–predicate–object triple (
head → type → tail) with adescription,score, and properties. - Before resolution, endpoints are known only by name; resolution links them to canonical entity ids.
- The schema constrains each relation's valid head and tail types -
TREATED_BYonly connectsDisease → {Drug, DrugClass, Procedure}- which the extractor uses to reject implausible triples. - All relations persist as generic
:RELATES_TOedges with the medical type in thetypeproperty.
- After resolution, hierarchical Leiden partitions the resolved relation graph into nested communities. Each
Communityrecords its level, its parent community, and the entities and relations it contains. - An LLM generates a
CommunityReportfor each community, bottom-up by level: a title, a summary, structured findings, and a clinical-importance rating. Lower-level reports roll up into higher-level ones. - Community reports give the agent a thematic, cluster-level view, so a broad question can be answered from a single summary instead of many low-level facts.
The schema is a first-class runtime value. It is injected into every extractor, resolver, summarizer, and prompt, and into BAML as dynamic enum types, so the LLM is constrained to the schema rather than merely prompted with it. Each type carries natural-language hints used to steer GLiNER and GLiREL, descriptions used in LLM prompts, and (for relations) the allowed head and tail label sets.
The default schema defines 13 entity types:
DiseaseDrugDrugClassSymptomPathogenAnatomicalStructureProcedureDiagnosticTestRiskFactorGeneProteinPathwayMechanismOfAction
and 25 relation types, grouped by clinical role:
| Group | Relations |
|---|---|
| Clinical (disease-centered) | HAS_SYMPTOM, TREATED_BY, DIAGNOSED_BY, CAUSED_BY, HAS_GENETIC_CAUSE, AFFECTS, HAS_COMPLICATION, DIFFERENTIAL_FOR, HAS_RISK_FACTOR |
| Pharmacological (drug-centered) | BELONGS_TO_CLASS, TARGETS, INHIBITS, ACTIVATES, METABOLIZED_BY, INTERACTS_WITH, CONTRAINDICATED_IN, CAUSES_ADVERSE_EFFECT, MONITORED_BY, HAS_MECHANISM |
| Molecular | ENCODES, PARTICIPATES_IN |
| Structural | IS_A, PART_OF, INNERVATED_BY, SUPPLIED_BY |
The schema is inspired by SNOMED CT relationship types, the UMLS semantic network, and clinical reasoning patterns.
The system has two data flows: ingestion (text into a knowledge graph and vector store) and retrieval using an agent.
Corpus ingestion streams documents, chunks them, and runs a LangGraph pipeline of graph construction components. State accumulates into a KnowledgeGraph that is written to Neo4j and the vector store. The stages run in order:
- Extract: Each chunk is processed by up to three extractors that share the injected schema:
- GLiNER runs batch zero-shot NER, steered by each entity type's natural-language label.
- GLiREL runs zero-shot relation extraction over GLiNER's entity spans, so GLiREL requires GLiNER.
- LLM runs a BAML extraction function with the valid entity and relation types injected as dynamic enums, constraining the model to the schema. Chunks are processed concurrently with bounded concurrency and isolated retries.
- Combine: The three extractor outputs are merged per chunk using a configurable strategy:
union(superset, merging provenance),intersection(only items every extractor found),max_score(highest-confidence version), orgliner_primary(GLiNER spans supplemented by the LLM). - Normalize: Within a chunk, entities are deduplicated by
(normalized name, label)and relations by(normalized head, normalized tail, type), merging sources, surface forms, chunk ids, scores, and properties. Low-confidence and too-short entities are filtered. - Aggregate: A deterministic, zero-LLM set-union across all chunks collapses the same keys into one cross-chunk candidate set.
- Resolve: Entity and relation-type names are resolved in two stages: a deterministic deduplicator collapses exact and near-exact variants (SemHash, MinHash-LSH) and clusters the residual names; then, within each cluster, BM25 + cosine fusion retrieves candidates and an LLM selects exact duplicates and a single canonical alias. The result writes
canonical_nameandaliasesonto entities and links relation endpoints to canonical entity ids. - Detect communities: Hierarchical Leiden runs over the resolved relation graph. Two interchangeable backends share one base class: graspologic-native (default) and Neo4j GDS. The output is a tree of communities with levels and parents.
- Summarize: Communities are walked bottom-up by level; a degree-ranked, token-budgeted context is built for each and passed to an LLM that produces a titled report with structured findings and a clinical-importance rating.
Resolution runs before community detection so the graph is clustered over canonical entities.
After graph construction, the embeddable fields of each model are vectorised and upserted into four vector collections.
| Source | Text embedded |
|---|---|
| Entity | canonical name |
| Relation | three representations (below) |
| Chunk | chunk text |
| CommunityReport | report summary |
A single relation is embedded three ways, all keyed by the same relation id, because one vector cannot capture the predicate, the participants, and the full statement at once:
- Edge fact - the relation description (for example, metformin treats type 2 diabetes).
- Edge type - the predicate alone (for example, TREATED_BY).
- Full SPO - the subject–predicate–object sentence.
Collections support lazy creation, batch upsert, and native vector quantization.
Retrieval uses a layered design: types of search (Vector, Hybrid, Fulltext, BFS) compose into retrievers, which a Search Engine fans out over according to a recipe and fuses with a pluggable reranker.
- Entity retriever runs hybrid search over entity names, optionally expanding from the matched seed entities with bounded, degree-aware BFS.
- Relation retriever runs hybrid search over the relation collection, then hydrates full edges (head, tail, type, description) from the graph.
- Chunk retriever runs hybrid search over raw passages.
- Community retriever searches community report summaries for thematic answers.
- Text-to-Cypher retriever has an LLM translate the question into a read-only Cypher query against the schema, with few-shot examples and bounded retry.
Recipes are data-only constants that select which methods to run and how to fuse them, so a new strategy requires no orchestration change.
| Recipe | Methods | Reranker |
|---|---|---|
entity / relation / chunk / community |
single method | RRF |
hybrid_rrf |
entity + relation + chunk + community | RRF |
hybrid_cross_encoder |
all four + BFS | cross-encoder |
bfs_expand |
entity + BFS | RRF |
text2cypher |
text-to-Cypher | RRF |
Results from the recipe's methods are gathered concurrently and fused with Reciprocal Rank Fusion or a cross-encoder reranker. MMR and node-distance rerankers are also available. Fusion degrades gracefully, ignoring any method that returns no results.
The agent is a DeepAgents harness that coordinates three subagents with an orchestrator agent:
- Planner decomposes the clinical question into focused sub-questions, each mapped to a retrieval recipe. It has no tools and works from the question text.
- Researcher is spawned in parallel, one per sub-question. Each holds the retrieval tools (entity, relation, chunk, community, hybrid, text-to-Cypher search, and a community map-reduce tool) and returns evidence items with citations.
- Verifier reads the gathered evidence and returns a structured assessment - a coverage score, an evidence-depth score, missing pieces, targeted follow-ups, and unsupported claims. A deterministic numeric gate decides whether the evidence is sufficient.
When the gate fails, the targeted follow-ups seed another planning round, so the loop converges on missing information rather than repeating searches. When it passes (or iterations are exhausted), the orchestrator synthesises a structured answer with answer text, source citations, a confidence score, clinical caveats, and an answerability flag.
Every LLM interaction outside the agent loop is a typed BAML function with explicit inputs, outputs, retry policy, and provider configuration.
We provide loaders for Medical Question-Answer benchmarks across three evaluation formats.
| Dataset | Description | Format |
|---|---|---|
| MedQA (USMLE) | USMLE-style clinical vignettes testing broad medical knowledge and diagnostic reasoning. | 4-option MCQ |
| MedMCQA | AIIMS and NEET-PG entrance questions covering medical subjects, topics, and expert explanations. | 4-option MCQ |
| PubMedQA | Biomedical research questions answered from linked PubMed abstracts. | yes/no/maybe |
| MMLU-Med | Medical and biology subset of MMLU covering clinical, anatomy, genetics, and professional medicine topics. | 4-option MCQ |
| MMLU-Pro (Health) | Health-domain professional and biomedical questions from the more challenging MMLU-Pro benchmark. | 10-option MCQ |
| MedXpertQA (Text) | Text-only specialty-board style questions across clinical tasks, specialties, and body systems. | ~10-option MCQ |
| CareQA (MCQ) | English healthcare exam questions derived from Spain's MIR/FSE specialist training exams. | 4-option MCQ |
| NEJM Q&A | Translated Israeli residency board exam questions across clinical specialties. | 4–5-option MCQ |
| PubHealthBench | UKHSA public-health guidance questions grounded in UK government source documents. | 4-option MCQ |
| SuperGPQA-Med | Graduate-level medical knowledge questions from the medicine subset of SuperGPQA. | up to 10-option MCQ |
| Dataset | Description | Format |
|---|---|---|
| HealthBench | Realistic health conversations with physician-written criteria for safe, complete responses. | Multi-turn rubric-scored conversations |
| RAR-Med | Medical reasoning prompts paired with checklist-style rubrics for structured reward scoring. | Instance-specific rubrics per prompt |
| Dataset | Description | Format |
|---|---|---|
| MedCaseReasoning | PMC case-report benchmark for final diagnosis and clinician-aligned reasoning. | Diagnosis from structured case prompts |
| CareQA (Reasoning) | Open-ended English questions rephrased from Spanish MIR/FSE healthcare exams. | Open-ended clinical questions |
| PubHealthBench (Freeform) | Free-text public-health answers grounded in UKHSA guidance documents. | Free-form public-health answers |
| NEJM Diagnostic Reasoning | Open-ended diagnosis generation from full NEJM clinicopathological case records. | Diagnosis from full CPC vignettes |
We provide streaming loaders for medical text corpora.
| Corpus | Description |
|---|---|
| USMLE Textbooks | English USMLE preparation textbooks covering core preclinical and clinical medicine. |
| StatPearls | Peer-reviewed point-of-care clinical reference articles from NCBI Bookshelf. |
| PubMed Abstracts | Biomedical literature titles and abstracts from PubMed. |
| PMC Case Reports | Full-text PubMed Central case reports describing patient presentations, workups, diagnoses, and outcomes. |
| Meditron Clinical Guidelines | Clinical practice guidelines from authoritative health organizations for diagnosis, treatment, and care management. |
The project uses uv for dependency management.
git clone https://github.com/avnlp/agentic-med-diag.git
cd agentic-med-diag
pip install uv && uv syncCreate a .env file with the required credentials. Settings are env-overridable per subsystem, for example:
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
QDRANT_URL=http://localhost:6333
AGENT_BASE_URL=https://api.openai.com/v1
AGENT_API_KEY=your_api_key
AGENT_MODEL=gpt-5.5We provide end-to-end pipelines for ingestion and question answering.
# Ingestion
uv run am-diag-ingest
uv run am-diag-ingest --corpus pubmed,statpearls --batch-size 50
# QA runner
uv run am-diag-qa
uv run am-diag-qa --datasets careqa,medqa --limit 10Ingest a corpus into the knowledge graph:
from am_diag.loaders.corpus import StatPearlsCorpusLoader
from am_diag.db.graph import create_neo4j_client
from am_diag.vector.embedding import ZembedEmbedder
from am_diag.ingestion import run_corpus_ingestion
report = await run_corpus_ingestion(
corpus_loader=StatPearlsCorpusLoader(),
graph_store=create_neo4j_client(),
vector_store=vector_store,
embedder=ZembedEmbedder(),
batch_size=100,
)Search with multi-strategy retrieval:
from am_diag.retrieval import SearchEngine, RetrievalConfig
engine = SearchEngine(
config=RetrievalConfig(),
vector_store=vector_store,
graph_store=graph_store,
embedder=embedder,
schema=MEDICAL_GRAPHRAG_SCHEMA,
reranker=reranker,
)
results = await engine.search("What treats hypertension in chronic kidney disease?",
recipe="hybrid_rrf")Answer a clinical question with the agent:
from am_diag.agents import answer_question, AgentSettings
answer = await answer_question(
"What are first-line treatments for hypertension in a patient with type 2 diabetes?",
search_engine=engine,
settings=AgentSettings(),
)am_diag/
├── common/
│ ├── data_models/ # all data models (Entity, Relation, Community, Chunk, ...)
│ ├── cypher/ # Cypher files
│ └── schema/ # Medical Schema
├── chunking/ # Recursive-character + markitdown chunkers
├── graph_construction/
│ ├── extract/ # GLiNER, GLiREL, LLM extractors + combiner
│ ├── normalize.py # Per-chunk dedup/normalization
│ ├── aggregate.py # Cross-chunk set-union
│ ├── resolve/ # Deterministic + cluster + LLM resolution
│ └── community/ # Leiden / GDS detection + Summarization
├── ingestion/ # LangGraph extraction / embedding / search pipelines
├── pipelines/ # End-to-end ingestion + Question Answering
├── db/
│ ├── graph/ # Neo4j client + record serialization
│ └── vector/ # Qdrant / Weaviate stores
├── vector/ # Embedders + rerankers
├── retrieval/ # methods · retrievers · rerankers · recipes · SearchEngine
├── agents/ # Agent harness
├── llm/ # BAML sources + generated client
└── loaders/ # Corpus loaders + Dataset loaders
Please see the CONTRIBUTING.md for contribution guidelines.
- Microsoft GraphRAG - hierarchical communities and community reports
- Graphiti - layered graph retrieval and search recipes
- Cognee - DataPoint-based graph memory and ECL pipelines
- KG-Gen - knowledge graph extraction with cluster + LLM alias deduplication
- OptimusKG - biomedical knowledge-graph construction and reasoning
- YouTu-GraphRAG - hierarchical agentic graph retrieval
- LightRAG - dual-level graph + vector retrieval
- PathRAG - relational-path pruning over the graph
- MIRAGE / MedRAG - medical RAG ablations and corpora
- MEDITRON-70B - medical pretraining with the GAP-Replay corpus
- GLiNER / GLiREL - generalist NER and relation extraction
- BAML, LangGraph, DeepAgents, ZeroEntropy
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

