fix(decompose): multi-hop search via LLM query decomposition (P0 #3 retrieval lift) by 7xuanlu · Pull Request #189 · 7xuanlu/origin

7xuanlu · 2026-05-25T07:45:55Z

Summary

New decompose.rs module — LLM rewrites compound queries into 2-4 standalone sub-queries. Compound-vs-not classifier baked into single-pass prompt. Salvaged design from prior feature/query-decomposition branch (now superseded).
MemoryDB::search_memory_decomposed fans sub-queries out via search_memory, merges by id keeping max score, sorts + truncates.
HTTP decompose: bool flag on /api/memory/search. MCP RecallParams.decompose opt-in. RetrievalConfig.decomposition_enabled global kill switch (defaults true).
Token-cost telemetry: DecomposeOutput { sub_queries, input_tokens, output_tokens } + estimate_tokens helper. search_memory_decomposed_with_stats returns the stats.
Eval-harness variants: save_locomo_decomposed_baseline, save_longmemeval_decomposed_baseline (both #[ignore]d).

Why

Origin LoCoMo multi-hop = 37%. agentmemory hits 95.2% R@5 on LongMemEval-S with 3-channel (BM25 + vec + graph BFS depth=2). SuperLocalMemory V3.3 reports +23.8pp on multi-hop from spreading-activation. P0 #3 attacks multi-hop via query rewriting (Angle A); KG-tier independent retrieval (Angle B) is deferred to Phase 2 pending measurement.

Failure-mode policy preserved verbatim from salvaged design: "Never errors on normal failure paths (timeout, provider error, malformed JSON, empty array). All such cases return Ok(vec![query.to_string()]) after a log::warn! so the caller can detect 'not decomposed' via len() <= 1."

Test plan

19 unit + integration tests pass (cargo test -p origin-core --lib decompose, search_memory_decomposed, cargo test -p origin-server --lib search_decompose_routing_tests, cargo test -p origin-mcp --lib)
cargo check --workspace clean post-rebase on main
All 4 routing branches tested: decompose=false / decompose=true+llm=Some / decompose=true+llm=None (fallback) / decompose=true+disabled-by-config (fallback)
Manual: search with decompose=true on a compound query ("What changed about my opinion on X after Y?") → verify multiple sub-queries logged + merged results
GPU eval validation: LoCoMo multi-hop lift (target 37% → ≥55%) via cargo test -p origin-core --test eval_harness save_locomo_decomposed_baseline -- --ignored --nocapture
Cost gate: per-query token report verifies decompose adds ≤256 input tokens

Follow-ups (not in this PR)

Surface DecomposeOutput through wire types so HTTP clients see token-cost.
KG-tier independent retrieval (Angle B) as the second multi-hop attack — entities + observations + pages surface as parallel scored channel.

🤖 Generated with Claude Code

Phase A foundation for P0 #3 — multi-hop via query decomposition. Adds crates/origin-core/src/decompose.rs with the salvaged compound-vs-not classify+decompose prompt and a defensive single-LLM-call rewriter that returns 1..=4 sub-queries. Scaffold scope only: module + prompt + 3 unit tests covering valid JSON, malformed JSON fallback, and single-element passthrough. Search integration (search_memory_decomposed), surface exposure (HTTP route, MCP tool), and cost gating land in subsequent commits. Failure-mode policy: every failure path (timeout, provider error, missing JSON brackets, malformed JSON, empty array) logs a warn! and returns Ok(vec![query.to_string()]) so the caller can detect "not decomposed" via len() <= 1. Decomposition is a quality lift, not a correctness gate; degrades silently to single-query. Salvaged design from feature/query-decomposition (greenfield rewrite). References docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md section "P0 #3 — Multi-hop via query decomposition". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P0 #3 Phase B foundation. Adds MemoryDB::search_memory_decomposed which calls crate::decompose::decompose_query (Phase A scaffold from 695828b), then fans sub-query searches out and merges by SearchResult.id, keeping the max score per id. Sorts descending and truncates to limit. If llm is None or the decomposer returns a single-element vec (its silent-fallback contract on any failure), passthrough to plain search_memory — no behavior change for the common single-hop path. Two passthrough tests added: None-llm and mock single-element JSON. The compound case (mock returning a 2-element array) is deferred to a follow-up commit that wires this through the eval harness — the search fixtures used here only seed two memories so the merge path is exercised but the multi-hop quality lift can only be measured against LoCoMo. See docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P0 #3 Phase C surface. Wires the search_memory_decomposed fan-out from f46666a through the daemon's `/api/memory/search` endpoint and adds a config knob so cost-sensitive deployments can disable the per-query LLM overhead globally. Wire change: `SearchMemoryRequest` gains a `decompose: bool` field (serde-default false), so existing callers and the on-disk wire shape are unchanged. Three call sites in `crates/origin-mcp/src/tools.rs` are updated to construct the new field explicitly (no Default impl yet — adding one would be speculative surface beyond this task). Routing logic in `handle_search_memory`: - `decompose=false` (default) -> plain `search_memory`. Unchanged. - `decompose=true` AND `state.llm.is_some()` AND `tuning.retrieval.decomposition_enabled` -> `search_memory_decomposed`, passing the cloned LLM provider through. - `decompose=true` but `state.llm.is_none()` -> warn + fall back to plain search. - `decompose=true` but `decomposition_enabled=false` -> warn + fall back to plain search. Every fallback degrades the request silently to plain search rather than erroring. Decomposition is a quality lift, not a correctness gate — same contract as the underlying `decompose_query` (see 695828b). Cost gate: new `RetrievalConfig { decomposition_enabled: bool }` under `TuningConfig.retrieval`, defaulting to `true` (non-disruptive — the feature only runs when the caller opts in via the request param). Lets a deployment disable the extra LLM call per request without rebuilding. Locking: snapshots `db`, `llm`, and `decomposition_enabled` out of the state read guard up front so the guard drops before the (potentially LLM-bound) `search_memory_decomposed` await. Matches the AGENTS.md "never hold a RwLock guard across .await" rule. Tests: 3 axum integration tests in `memory_routes::search_decompose_routing_tests` cover the routing branches that don't require a live LLM (decompose=false; decompose=true + llm=None; decompose=true + decomposition_enabled=false). The compound-query lift path (mock LLM returns 2+ sub-queries) is already covered by `search_memory_decomposed` unit tests in `crates/origin-core/src/db.rs:22295` — `MockProvider` is `#[cfg(test)]` inside origin-core and not reachable from origin-server tests, so replicating it here would force a test-util surface expansion outside the scope of this commit. Noted as a follow-up if useful. Verification: cargo check --workspace OK cargo test -p origin-server --lib 60 passed cargo test -p origin-types --lib 57 passed cargo test -p origin-core --lib tuning::tests 11 passed cargo test -p origin-mcp --lib search_memory_request 1 passed cargo clippy -p origin-{types,core,server,mcp} --lib --tests clean cargo fmt --all -- --check clean References docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md section "P0 #3 — Multi-hop via query decomposition". Closes the Phase C "HTTP surface + cost gating" task on top of the Phase A scaffold (695828b) and Phase B fan-out (f46666a). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surface multi-hop decomposition to MCP clients and the offline eval suite so the new search_memory_decomposed path can actually be measured and used end-to-end. MCP wrapper (origin-mcp): - RecallParams gains `decompose: Option<bool>` with schemars description noting LLM cost and multi-hop use case. - recall_impl forwards the flag to SearchMemoryRequest.decompose. - Tool description mentions decompose=true triggers multi-hop split. - 2 new unit tests: forwarding from JSON in -> wire request out, and schema-advertises-decompose drift guard. - 2 existing call sites in tests (space_roundtrip_e2e, type_contract) fixed to pass decompose: None. Eval harness (origin-core): - run_locomo_eval_decomposed and run_longmemeval_eval_decomposed mirror the _expanded runners; only difference is search_memory_decomposed in place of search_memory_expanded, and retrieval_method stamped as "search_memory_decomposed" in ReportEnv. - save_locomo_decomposed_baseline and save_longmemeval_decomposed_baseline added as #[ignore]'d baseline-save tests in tests/eval_harness.rs. Verification: - cargo check --workspace --tests: clean. - cargo clippy -p origin-mcp -p origin-core --tests: no warnings. - cargo test -p origin-mcp --lib: 190 passed (3 new decompose tests). - cargo test -p origin-core --lib eval::: 176 passed. - cargo test -p origin-core --test eval_harness -- --list | grep decomposed: shows both new baselines. Cost telemetry deferred: ReportEnv has no total_tokens_used or cost_usd field today. Adding one would touch every variant of every report shape and is out of scope for this commit. Tracked as follow-up — judge already records token spend in EnrichmentMode::from_env, so the surface exists; just needs unification across retrieval and judge phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-query LLM cost on the decompose path was invisible. Add `chars / 4` token estimation and emit one `log::info!` line per call under the `[decompose]` tag covering input and output, including the timeout + provider-error fallback paths (output=0, with a parenthetical reason). Also exports a forward-looking `DecomposeOutput` struct + public `estimate_tokens(&str)` helper. Neither is wired into `search_memory_decomposed` yet — the return type stays `Vec<SearchResult>` and the stats land in logs only. Surfacing tokens through wire types so deployments can gate the decompose path under a cost budget is the follow-up commit. Scope decisions: - Used the recommended scope-down from the brief: log internally, do not change `search_memory_decomposed` return shape, do not touch any caller of `decompose_query`. One-file diff stays surgical. - `LlmProvider::generate` returns `Result<String, LlmError>` with no token counts on the trait, so the heuristic is the only honest option today. Documented as a gating estimate, not an exact figure. - Failure-path logs also emit the input-only estimate so a deployment watching the log stream still sees the cost of failed decompose attempts (timeouts, provider errors). Tests: 3 new (`test_estimate_tokens_basic`, `test_decompose_output_estimates_tokens`, `test_decompose_logs_token_estimate`), 3 existing stay green. `cargo check --workspace`, `cargo clippy -p origin-core --all-targets`, and the full `cargo test -p origin-core --lib` (1156 passed) all clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `decompose_query_with_stats` (returns full DecomposeOutput) and `MemoryDB::search_memory_decomposed_with_stats` (returns results plus Option<DecomposeOutput>) as sibling APIs. Existing `decompose_query` and `search_memory_decomposed` are now thin wrappers that drop the stats — no caller changes required. Stats are Some(stats) only when the multi-hop fan-out actually ran (LLM returned 2+ sub-queries). On the no-LLM shortcut and the single-element passthrough, the result is None; the decompose call's token estimates still emit via log::info! under the [decompose] tag. Wire-types exposure (origin-types + origin-server + origin-mcp) is a follow-up commit. This commit lands the in-core surface so eval harness + server handlers can opt into the stats variant. Tests: - decompose::tests::test_decompose_query_with_stats_returns_stats - decompose::tests::test_decompose_query_returns_just_sub_queries - db::tests::test_search_memory_decomposed_with_stats_returns_stats Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ompose-v2

7xuanlu and others added 7 commits May 24, 2026 15:40

Merge remote-tracking branch 'origin/main' into feature/multi-hop-dec…

4c4565f

…ompose-v2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(decompose): multi-hop search via LLM query decomposition (P0 #3 retrieval lift)#189

fix(decompose): multi-hop search via LLM query decomposition (P0 #3 retrieval lift)#189
7xuanlu wants to merge 7 commits into
mainfrom
feature/multi-hop-decompose-v2

7xuanlu commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

7xuanlu commented May 25, 2026

Summary

Why

Test plan

Follow-ups (not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant