Skip to content

fix(decompose): multi-hop search via LLM query decomposition (P0 #3 retrieval lift)#189

Draft
7xuanlu wants to merge 7 commits into
mainfrom
feature/multi-hop-decompose-v2
Draft

fix(decompose): multi-hop search via LLM query decomposition (P0 #3 retrieval lift)#189
7xuanlu wants to merge 7 commits into
mainfrom
feature/multi-hop-decompose-v2

Conversation

@7xuanlu
Copy link
Copy Markdown
Owner

@7xuanlu 7xuanlu commented May 25, 2026

Summary

  • New decompose.rs module — LLM rewrites compound queries into 2-4 standalone sub-queries. Compound-vs-not classifier baked into single-pass prompt. Salvaged design from prior feature/query-decomposition branch (now superseded).
  • MemoryDB::search_memory_decomposed fans sub-queries out via search_memory, merges by id keeping max score, sorts + truncates.
  • HTTP decompose: bool flag on /api/memory/search. MCP RecallParams.decompose opt-in. RetrievalConfig.decomposition_enabled global kill switch (defaults true).
  • Token-cost telemetry: DecomposeOutput { sub_queries, input_tokens, output_tokens } + estimate_tokens helper. search_memory_decomposed_with_stats returns the stats.
  • Eval-harness variants: save_locomo_decomposed_baseline, save_longmemeval_decomposed_baseline (both #[ignore]d).

Why

Origin LoCoMo multi-hop = 37%. agentmemory hits 95.2% R@5 on LongMemEval-S with 3-channel (BM25 + vec + graph BFS depth=2). SuperLocalMemory V3.3 reports +23.8pp on multi-hop from spreading-activation. P0 #3 attacks multi-hop via query rewriting (Angle A); KG-tier independent retrieval (Angle B) is deferred to Phase 2 pending measurement.

Failure-mode policy preserved verbatim from salvaged design: "Never errors on normal failure paths (timeout, provider error, malformed JSON, empty array). All such cases return Ok(vec![query.to_string()]) after a log::warn! so the caller can detect 'not decomposed' via len() <= 1."

Test plan

  • 19 unit + integration tests pass (cargo test -p origin-core --lib decompose, search_memory_decomposed, cargo test -p origin-server --lib search_decompose_routing_tests, cargo test -p origin-mcp --lib)
  • cargo check --workspace clean post-rebase on main
  • All 4 routing branches tested: decompose=false / decompose=true+llm=Some / decompose=true+llm=None (fallback) / decompose=true+disabled-by-config (fallback)
  • Manual: search with decompose=true on a compound query ("What changed about my opinion on X after Y?") → verify multiple sub-queries logged + merged results
  • GPU eval validation: LoCoMo multi-hop lift (target 37% → ≥55%) via cargo test -p origin-core --test eval_harness save_locomo_decomposed_baseline -- --ignored --nocapture
  • Cost gate: per-query token report verifies decompose adds ≤256 input tokens

Follow-ups (not in this PR)

  • Surface DecomposeOutput through wire types so HTTP clients see token-cost.
  • KG-tier independent retrieval (Angle B) as the second multi-hop attack — entities + observations + pages surface as parallel scored channel.

🤖 Generated with Claude Code

7xuanlu and others added 7 commits May 24, 2026 15:40
Phase A foundation for P0 #3 — multi-hop via query decomposition. Adds
crates/origin-core/src/decompose.rs with the salvaged compound-vs-not
classify+decompose prompt and a defensive single-LLM-call rewriter that
returns 1..=4 sub-queries.

Scaffold scope only: module + prompt + 3 unit tests covering valid JSON,
malformed JSON fallback, and single-element passthrough. Search
integration (search_memory_decomposed), surface exposure (HTTP route,
MCP tool), and cost gating land in subsequent commits.

Failure-mode policy: every failure path (timeout, provider error,
missing JSON brackets, malformed JSON, empty array) logs a warn! and
returns Ok(vec![query.to_string()]) so the caller can detect
"not decomposed" via len() <= 1. Decomposition is a quality lift, not
a correctness gate; degrades silently to single-query.

Salvaged design from feature/query-decomposition (greenfield rewrite).
References docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md
section "P0 #3 — Multi-hop via query decomposition".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 #3 Phase B foundation. Adds MemoryDB::search_memory_decomposed which
calls crate::decompose::decompose_query (Phase A scaffold from 695828b),
then fans sub-query searches out and merges by SearchResult.id, keeping
the max score per id. Sorts descending and truncates to limit.

If llm is None or the decomposer returns a single-element vec (its
silent-fallback contract on any failure), passthrough to plain
search_memory — no behavior change for the common single-hop path.

Two passthrough tests added: None-llm and mock single-element JSON.
The compound case (mock returning a 2-element array) is deferred to a
follow-up commit that wires this through the eval harness — the search
fixtures used here only seed two memories so the merge path is exercised
but the multi-hop quality lift can only be measured against LoCoMo.

See docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 #3 Phase C surface. Wires the search_memory_decomposed fan-out from
f46666a through the daemon's `/api/memory/search` endpoint and adds a
config knob so cost-sensitive deployments can disable the per-query LLM
overhead globally.

Wire change: `SearchMemoryRequest` gains a `decompose: bool` field
(serde-default false), so existing callers and the on-disk wire shape
are unchanged. Three call sites in `crates/origin-mcp/src/tools.rs` are
updated to construct the new field explicitly (no Default impl yet — adding
one would be speculative surface beyond this task).

Routing logic in `handle_search_memory`:

  - `decompose=false` (default) -> plain `search_memory`. Unchanged.
  - `decompose=true` AND `state.llm.is_some()` AND
    `tuning.retrieval.decomposition_enabled` -> `search_memory_decomposed`,
    passing the cloned LLM provider through.
  - `decompose=true` but `state.llm.is_none()` -> warn + fall back to
    plain search.
  - `decompose=true` but `decomposition_enabled=false` -> warn + fall back
    to plain search.

Every fallback degrades the request silently to plain search rather than
erroring. Decomposition is a quality lift, not a correctness gate — same
contract as the underlying `decompose_query` (see 695828b).

Cost gate: new `RetrievalConfig { decomposition_enabled: bool }` under
`TuningConfig.retrieval`, defaulting to `true` (non-disruptive — the
feature only runs when the caller opts in via the request param). Lets a
deployment disable the extra LLM call per request without rebuilding.

Locking: snapshots `db`, `llm`, and `decomposition_enabled` out of the
state read guard up front so the guard drops before the (potentially
LLM-bound) `search_memory_decomposed` await. Matches the AGENTS.md
"never hold a RwLock guard across .await" rule.

Tests: 3 axum integration tests in
`memory_routes::search_decompose_routing_tests` cover the routing
branches that don't require a live LLM (decompose=false; decompose=true
+ llm=None; decompose=true + decomposition_enabled=false). The
compound-query lift path (mock LLM returns 2+ sub-queries) is already
covered by `search_memory_decomposed` unit tests in
`crates/origin-core/src/db.rs:22295` — `MockProvider` is `#[cfg(test)]`
inside origin-core and not reachable from origin-server tests, so
replicating it here would force a test-util surface expansion outside
the scope of this commit. Noted as a follow-up if useful.

Verification:
  cargo check --workspace                                 OK
  cargo test -p origin-server --lib                       60 passed
  cargo test -p origin-types --lib                        57 passed
  cargo test -p origin-core --lib tuning::tests           11 passed
  cargo test -p origin-mcp --lib search_memory_request    1 passed
  cargo clippy -p origin-{types,core,server,mcp} --lib --tests  clean
  cargo fmt --all -- --check                              clean

References docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md
section "P0 #3 — Multi-hop via query decomposition". Closes the
Phase C "HTTP surface + cost gating" task on top of the Phase A
scaffold (695828b) and Phase B fan-out (f46666a).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface multi-hop decomposition to MCP clients and the offline eval
suite so the new search_memory_decomposed path can actually be measured
and used end-to-end.

MCP wrapper (origin-mcp):
- RecallParams gains `decompose: Option<bool>` with schemars description
  noting LLM cost and multi-hop use case.
- recall_impl forwards the flag to SearchMemoryRequest.decompose.
- Tool description mentions decompose=true triggers multi-hop split.
- 2 new unit tests: forwarding from JSON in -> wire request out, and
  schema-advertises-decompose drift guard.
- 2 existing call sites in tests (space_roundtrip_e2e, type_contract)
  fixed to pass decompose: None.

Eval harness (origin-core):
- run_locomo_eval_decomposed and run_longmemeval_eval_decomposed mirror
  the _expanded runners; only difference is search_memory_decomposed in
  place of search_memory_expanded, and retrieval_method stamped as
  "search_memory_decomposed" in ReportEnv.
- save_locomo_decomposed_baseline and save_longmemeval_decomposed_baseline
  added as #[ignore]'d baseline-save tests in tests/eval_harness.rs.

Verification:
- cargo check --workspace --tests: clean.
- cargo clippy -p origin-mcp -p origin-core --tests: no warnings.
- cargo test -p origin-mcp --lib: 190 passed (3 new decompose tests).
- cargo test -p origin-core --lib eval::: 176 passed.
- cargo test -p origin-core --test eval_harness -- --list | grep
  decomposed: shows both new baselines.

Cost telemetry deferred: ReportEnv has no total_tokens_used or cost_usd
field today. Adding one would touch every variant of every report shape
and is out of scope for this commit. Tracked as follow-up — judge
already records token spend in EnrichmentMode::from_env, so the surface
exists; just needs unification across retrieval and judge phases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-query LLM cost on the decompose path was invisible. Add `chars / 4`
token estimation and emit one `log::info!` line per call under the
`[decompose]` tag covering input and output, including the timeout +
provider-error fallback paths (output=0, with a parenthetical reason).

Also exports a forward-looking `DecomposeOutput` struct + public
`estimate_tokens(&str)` helper. Neither is wired into
`search_memory_decomposed` yet — the return type stays
`Vec<SearchResult>` and the stats land in logs only. Surfacing tokens
through wire types so deployments can gate the decompose path under a
cost budget is the follow-up commit.

Scope decisions:
- Used the recommended scope-down from the brief: log internally, do
  not change `search_memory_decomposed` return shape, do not touch any
  caller of `decompose_query`. One-file diff stays surgical.
- `LlmProvider::generate` returns `Result<String, LlmError>` with no
  token counts on the trait, so the heuristic is the only honest
  option today. Documented as a gating estimate, not an exact figure.
- Failure-path logs also emit the input-only estimate so a deployment
  watching the log stream still sees the cost of failed decompose
  attempts (timeouts, provider errors).

Tests: 3 new (`test_estimate_tokens_basic`,
`test_decompose_output_estimates_tokens`,
`test_decompose_logs_token_estimate`), 3 existing stay green.
`cargo check --workspace`, `cargo clippy -p origin-core --all-targets`,
and the full `cargo test -p origin-core --lib` (1156 passed) all clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `decompose_query_with_stats` (returns full DecomposeOutput) and
`MemoryDB::search_memory_decomposed_with_stats` (returns results plus
Option<DecomposeOutput>) as sibling APIs. Existing `decompose_query`
and `search_memory_decomposed` are now thin wrappers that drop the
stats — no caller changes required.

Stats are Some(stats) only when the multi-hop fan-out actually ran
(LLM returned 2+ sub-queries). On the no-LLM shortcut and the
single-element passthrough, the result is None; the decompose call's
token estimates still emit via log::info! under the [decompose] tag.

Wire-types exposure (origin-types + origin-server + origin-mcp) is a
follow-up commit. This commit lands the in-core surface so eval
harness + server handlers can opt into the stats variant.

Tests:
  - decompose::tests::test_decompose_query_with_stats_returns_stats
  - decompose::tests::test_decompose_query_returns_just_sub_queries
  - db::tests::test_search_memory_decomposed_with_stats_returns_stats

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant