feat: graph-aware retrieval planner (opt-in capability) by Programmer7129 · Pull Request #1 · Vrin-cloud/engram

Programmer7129 · 2026-06-08T19:15:42Z

Summary

Adds a one-LLM-call up-front retrieval planner that produces a structured RetrievalPlan (expected answer type, priority predicates, optional hop sequence) from a compressed view of the relevant fact-graph slice. The plan biases — never replaces — the existing PPR + beam + triple-ANN + Cohere Rerank pipeline.

This is the first capability in the OSS RAG starter space that pairs HippoRAG-2-style PPR with an explicit pre-retrieval plan. Existing public KG-RAG systems (HippoRAG 2, MS GraphRAG, LightRAG, DAVIS) either skip planning entirely or do reactive agentic loops (latency cliffs).

What's in

New modules:

src/engram/core/graph_view.py — CompressedGraphView, EntityNeighborhood + build_query_graph_view. Pure logic over backend.fact_graph.
src/engram/dialogue/prompts/retrieval_plan.py — RetrievalPlan / HopStep schemas + prompt with 4 worked examples + confidence calibration.
src/engram/dialogue/retrieval_planner.py — async plan_retrieval with confidence-floor abstention (0.5), graceful no-op on empty view or LLM error.
benchmarks/failure_tagger.py — Phase 0 LLM classifier; tagged 120 n=200 failures, 22.5% planner-addressable.

Plumbing:

kg_hybrid_neighbors gains plan kwarg → drives predicate_boost in beam_search, post-fusion fact-type filter (capped at 30% removal), plan-aware Cohere Rerank query suffix.
beam_search_facts gains predicate_boost + multiplier (default 1.5x).
answer_one caches plan per question across IRCoT rounds.
--retrieval-planner flag in benchmarks/musique.py, default OFF.

Pre-existing main hardening (ported):

_LMDB_MAX_KEY_BYTES (480) guards in entity/alias/fact upsert paths — prevents cold-path BadValsizeError on LLM-extracted runaway names.
exc_info=True on the swallowed background-task warning so cold-path failures surface tracebacks.

n=100 ablation (kg-hybrid + IRCoT + synth OFF)

Run	EM	F1	Plan fire rate
No-planner baseline	0.40	0.5475	n/a
Planner-on (refined prompt)	0.39	0.5389	8% confident plans

Flat metrics within run-to-run variance (±0.04 EM at n=100). Shipping as opt-in capability, not as a metric-lift feature. The planner is correct in isolation; the lift is bottlenecked by the upstream entity extractor producing 30-40% query-slot noise and by n=100 sample variance exceeding the +0.02 gate.

Positioning: Engram becomes the only OSS RAG starter with explicit graph-aware planning + structured retrieval traces — capability differentiation, not benchmark dominance.

Test plan

366/366 unit + integration tests pass
9 new graph_view tests, 8 planner-dialogue tests, 3 plan-biased-retrieval integration tests
ruff check + ruff format --check clean
Default --retrieval-planner OFF — runs without the flag are byte-identical to main
LMDB guards verified end-to-end on n=100 store build
README mention of the new opt-in capability (follow-up commit on main)

🤖 Generated with Claude Code

Adds a one-LLM-call up-front retrieval planner that produces a structured `RetrievalPlan` (expected answer type, priority predicates, optional hop sequence) from a compressed view of the relevant fact- graph slice. The plan biases — never replaces — the existing PPR + beam + triple-ANN + Cohere Rerank pipeline. This is the first capability in the OSS RAG starter space that pairs HippoRAG-2-style PPR with an explicit pre-retrieval plan, decided in one LLM call before any retrieval runs. Existing public KG-RAG systems (HippoRAG 2, MS GraphRAG, LightRAG, DAVIS) either skip planning entirely or do reactive agentic loops (latency cliffs). New modules: - src/engram/core/graph_view.py — CompressedGraphView, EntityNeighborhood, EdgeSummary + build_query_graph_view(). Pure logic over backend.fact_graph + backend.get_entity. Top-K by edge confidence, corpus-wide predicate histogram. - src/engram/dialogue/prompts/retrieval_plan.py — RetrievalPlan, HopStep schemas + build_retrieval_plan_prompt with 4 worked examples. Confidence calibration guidance (0.8-1.0 high, <0.3 rare-abstention). - src/engram/dialogue/retrieval_planner.py — async plan_retrieval with confidence-floor abstention (default 0.5), graceful no-op on empty view or LLM error, optional raw_plan_sink for diagnostics. - benchmarks/failure_tagger.py — Phase 0 LLM classifier that tagged 120 n=200 failures by mode; 22.5% are planner-addressable (mostly answer-type mismatches). Plumbing: - benchmarks/retrieval.py — kg_hybrid_neighbors gains `plan` kwarg. Plan drives: predicate_boost in beam_search_facts, post-fusion fact-type filter (capped at 30% removal so a wrong plan can't starve the reader), plan-aware Cohere Rerank query suffix. - src/engram/core/kg_retrieval.py — beam_search_facts gains predicate_boost + multiplier (default 1.5x). - benchmarks/runner.py — answer_one builds view + plan once per question (cached across IRCoT rounds), threads to retrieval. - benchmarks/musique.py — `--retrieval-planner` (default OFF) and `--trace-retrieval-plan PATH` flags. Pre-existing main-branch hardening, ported in this commit: - src/engram/backends/memory.py — _LMDB_MAX_KEY_BYTES (480) + _key_too_long() guards in entity / alias / fact upsert paths. Skip-with-warning when an LLM-extracted name would exceed LMDB's 511-byte key cap. Prevents cold-path BadValsizeError that was silently killing graph builds on the n=100 fixture. - src/engram/dialogue/orchestrator.py — exc_info=True on the swallowed background-task warning so future cold-path failures surface with tracebacks. Tests: 366/366 pass. 9 unit tests for graph_view, 8 for the planner dialogue, 3 integration tests for plan-biased retrieval (Plankton voiced_by chain, plan=None passthrough, filter-cap safety). n=100 ablation (kg-hybrid + IRCoT + synth OFF, same store): - No-planner baseline: EM 0.40, F1 0.5475 - Planner-on (refined prompt): EM 0.39, F1 0.5389; 8/100 plans fire confidently, run-to-run variance ±0.04 EM exceeds plausible signal. Verdict: shipping as opt-in capability, not metric-lift feature. Default OFF. The planner is correct in isolation (tests pass, fires when input is good) but the lift is bottlenecked by the upstream entity extractor producing 30-40% query-slot noise and by n=100 sample variance exceeding the +0.02 gate. Engram's selling point becomes "the only OSS RAG starter with explicit graph-aware planning + structured retrieval traces" — capability differentiation, not benchmark dominance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CI fixes after the planner commit: - src/engram/backends/memory.py: add module-level `logger = logging.getLogger(__name__)`. The LMDB key-length guards (cherry-picked from feat/slm-voices) called logger.warning but the original main-branch module had no logger import. - ruff check --fix: remove unused imports in the new test modules (EntityNeighborhood / DEFAULT_PREDICATE_TOP_N from test_core_graph_view.py; HopStep was unused in one test). - ruff format: standardize formatting on new files + a few existing benchmarks files that had drifted (decomposition.py, reranker.py whitespace). Verified: ruff check clean, ruff format --check clean, 366/366 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CI runs pytest -m "not integration and not slow", but the new integration test test_kg_retrieval_with_plan.py wasn't marked, so CI collected it and the test failed importing BM25Index (which requires the `benchmarks` extra not installed in CI). Add tests/integration/conftest.py that scopes pytest_collection_modify items to items whose file path is under tests/integration/, then adds the integration marker. Path-scoped (not session-global) so tests outside this directory keep their existing markers. Verified: pytest tests/ -m "not integration and not slow" selects 363 unit tests (deselects 3 integration), all 366 still pass when run without the filter. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Programmer7129 and others added 3 commits June 8, 2026 12:02

Programmer7129 force-pushed the feat/graph-aware-planner branch from 88ac426 to ff9cff8 Compare June 8, 2026 19:21

Programmer7129 merged commit 259dacf into main Jun 8, 2026
4 checks passed

Programmer7129 deleted the feat/graph-aware-planner branch June 8, 2026 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: graph-aware retrieval planner (opt-in capability)#1

feat: graph-aware retrieval planner (opt-in capability)#1
Programmer7129 merged 3 commits into
mainfrom
feat/graph-aware-planner

Programmer7129 commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Programmer7129 commented Jun 8, 2026

Summary

What's in

n=100 ablation (kg-hybrid + IRCoT + synth OFF)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant