fix(decompose): multi-hop search via LLM query decomposition (P0 #3 retrieval lift)#189
Draft
7xuanlu wants to merge 7 commits into
Draft
fix(decompose): multi-hop search via LLM query decomposition (P0 #3 retrieval lift)#1897xuanlu wants to merge 7 commits into
7xuanlu wants to merge 7 commits into
Conversation
Phase A foundation for P0 #3 — multi-hop via query decomposition. Adds crates/origin-core/src/decompose.rs with the salvaged compound-vs-not classify+decompose prompt and a defensive single-LLM-call rewriter that returns 1..=4 sub-queries. Scaffold scope only: module + prompt + 3 unit tests covering valid JSON, malformed JSON fallback, and single-element passthrough. Search integration (search_memory_decomposed), surface exposure (HTTP route, MCP tool), and cost gating land in subsequent commits. Failure-mode policy: every failure path (timeout, provider error, missing JSON brackets, malformed JSON, empty array) logs a warn! and returns Ok(vec![query.to_string()]) so the caller can detect "not decomposed" via len() <= 1. Decomposition is a quality lift, not a correctness gate; degrades silently to single-query. Salvaged design from feature/query-decomposition (greenfield rewrite). References docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md section "P0 #3 — Multi-hop via query decomposition". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 #3 Phase B foundation. Adds MemoryDB::search_memory_decomposed which calls crate::decompose::decompose_query (Phase A scaffold from 695828b), then fans sub-query searches out and merges by SearchResult.id, keeping the max score per id. Sorts descending and truncates to limit. If llm is None or the decomposer returns a single-element vec (its silent-fallback contract on any failure), passthrough to plain search_memory — no behavior change for the common single-hop path. Two passthrough tests added: None-llm and mock single-element JSON. The compound case (mock returning a 2-element array) is deferred to a follow-up commit that wires this through the eval harness — the search fixtures used here only seed two memories so the merge path is exercised but the multi-hop quality lift can only be measured against LoCoMo. See docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 #3 Phase C surface. Wires the search_memory_decomposed fan-out from f46666a through the daemon's `/api/memory/search` endpoint and adds a config knob so cost-sensitive deployments can disable the per-query LLM overhead globally. Wire change: `SearchMemoryRequest` gains a `decompose: bool` field (serde-default false), so existing callers and the on-disk wire shape are unchanged. Three call sites in `crates/origin-mcp/src/tools.rs` are updated to construct the new field explicitly (no Default impl yet — adding one would be speculative surface beyond this task). Routing logic in `handle_search_memory`: - `decompose=false` (default) -> plain `search_memory`. Unchanged. - `decompose=true` AND `state.llm.is_some()` AND `tuning.retrieval.decomposition_enabled` -> `search_memory_decomposed`, passing the cloned LLM provider through. - `decompose=true` but `state.llm.is_none()` -> warn + fall back to plain search. - `decompose=true` but `decomposition_enabled=false` -> warn + fall back to plain search. Every fallback degrades the request silently to plain search rather than erroring. Decomposition is a quality lift, not a correctness gate — same contract as the underlying `decompose_query` (see 695828b). Cost gate: new `RetrievalConfig { decomposition_enabled: bool }` under `TuningConfig.retrieval`, defaulting to `true` (non-disruptive — the feature only runs when the caller opts in via the request param). Lets a deployment disable the extra LLM call per request without rebuilding. Locking: snapshots `db`, `llm`, and `decomposition_enabled` out of the state read guard up front so the guard drops before the (potentially LLM-bound) `search_memory_decomposed` await. Matches the AGENTS.md "never hold a RwLock guard across .await" rule. Tests: 3 axum integration tests in `memory_routes::search_decompose_routing_tests` cover the routing branches that don't require a live LLM (decompose=false; decompose=true + llm=None; decompose=true + decomposition_enabled=false). The compound-query lift path (mock LLM returns 2+ sub-queries) is already covered by `search_memory_decomposed` unit tests in `crates/origin-core/src/db.rs:22295` — `MockProvider` is `#[cfg(test)]` inside origin-core and not reachable from origin-server tests, so replicating it here would force a test-util surface expansion outside the scope of this commit. Noted as a follow-up if useful. Verification: cargo check --workspace OK cargo test -p origin-server --lib 60 passed cargo test -p origin-types --lib 57 passed cargo test -p origin-core --lib tuning::tests 11 passed cargo test -p origin-mcp --lib search_memory_request 1 passed cargo clippy -p origin-{types,core,server,mcp} --lib --tests clean cargo fmt --all -- --check clean References docs/superpowers/p0-plan-retrieval-fixes-2026-05-24.md section "P0 #3 — Multi-hop via query decomposition". Closes the Phase C "HTTP surface + cost gating" task on top of the Phase A scaffold (695828b) and Phase B fan-out (f46666a). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface multi-hop decomposition to MCP clients and the offline eval suite so the new search_memory_decomposed path can actually be measured and used end-to-end. MCP wrapper (origin-mcp): - RecallParams gains `decompose: Option<bool>` with schemars description noting LLM cost and multi-hop use case. - recall_impl forwards the flag to SearchMemoryRequest.decompose. - Tool description mentions decompose=true triggers multi-hop split. - 2 new unit tests: forwarding from JSON in -> wire request out, and schema-advertises-decompose drift guard. - 2 existing call sites in tests (space_roundtrip_e2e, type_contract) fixed to pass decompose: None. Eval harness (origin-core): - run_locomo_eval_decomposed and run_longmemeval_eval_decomposed mirror the _expanded runners; only difference is search_memory_decomposed in place of search_memory_expanded, and retrieval_method stamped as "search_memory_decomposed" in ReportEnv. - save_locomo_decomposed_baseline and save_longmemeval_decomposed_baseline added as #[ignore]'d baseline-save tests in tests/eval_harness.rs. Verification: - cargo check --workspace --tests: clean. - cargo clippy -p origin-mcp -p origin-core --tests: no warnings. - cargo test -p origin-mcp --lib: 190 passed (3 new decompose tests). - cargo test -p origin-core --lib eval::: 176 passed. - cargo test -p origin-core --test eval_harness -- --list | grep decomposed: shows both new baselines. Cost telemetry deferred: ReportEnv has no total_tokens_used or cost_usd field today. Adding one would touch every variant of every report shape and is out of scope for this commit. Tracked as follow-up — judge already records token spend in EnrichmentMode::from_env, so the surface exists; just needs unification across retrieval and judge phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-query LLM cost on the decompose path was invisible. Add `chars / 4` token estimation and emit one `log::info!` line per call under the `[decompose]` tag covering input and output, including the timeout + provider-error fallback paths (output=0, with a parenthetical reason). Also exports a forward-looking `DecomposeOutput` struct + public `estimate_tokens(&str)` helper. Neither is wired into `search_memory_decomposed` yet — the return type stays `Vec<SearchResult>` and the stats land in logs only. Surfacing tokens through wire types so deployments can gate the decompose path under a cost budget is the follow-up commit. Scope decisions: - Used the recommended scope-down from the brief: log internally, do not change `search_memory_decomposed` return shape, do not touch any caller of `decompose_query`. One-file diff stays surgical. - `LlmProvider::generate` returns `Result<String, LlmError>` with no token counts on the trait, so the heuristic is the only honest option today. Documented as a gating estimate, not an exact figure. - Failure-path logs also emit the input-only estimate so a deployment watching the log stream still sees the cost of failed decompose attempts (timeouts, provider errors). Tests: 3 new (`test_estimate_tokens_basic`, `test_decompose_output_estimates_tokens`, `test_decompose_logs_token_estimate`), 3 existing stay green. `cargo check --workspace`, `cargo clippy -p origin-core --all-targets`, and the full `cargo test -p origin-core --lib` (1156 passed) all clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `decompose_query_with_stats` (returns full DecomposeOutput) and `MemoryDB::search_memory_decomposed_with_stats` (returns results plus Option<DecomposeOutput>) as sibling APIs. Existing `decompose_query` and `search_memory_decomposed` are now thin wrappers that drop the stats — no caller changes required. Stats are Some(stats) only when the multi-hop fan-out actually ran (LLM returned 2+ sub-queries). On the no-LLM shortcut and the single-element passthrough, the result is None; the decompose call's token estimates still emit via log::info! under the [decompose] tag. Wire-types exposure (origin-types + origin-server + origin-mcp) is a follow-up commit. This commit lands the in-core surface so eval harness + server handlers can opt into the stats variant. Tests: - decompose::tests::test_decompose_query_with_stats_returns_stats - decompose::tests::test_decompose_query_returns_just_sub_queries - db::tests::test_search_memory_decomposed_with_stats_returns_stats Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
decompose.rsmodule — LLM rewrites compound queries into 2-4 standalone sub-queries. Compound-vs-not classifier baked into single-pass prompt. Salvaged design from priorfeature/query-decompositionbranch (now superseded).MemoryDB::search_memory_decomposedfans sub-queries out viasearch_memory, merges by id keeping max score, sorts + truncates.decompose: boolflag on/api/memory/search. MCPRecallParams.decomposeopt-in.RetrievalConfig.decomposition_enabledglobal kill switch (defaultstrue).DecomposeOutput { sub_queries, input_tokens, output_tokens }+estimate_tokenshelper.search_memory_decomposed_with_statsreturns the stats.save_locomo_decomposed_baseline,save_longmemeval_decomposed_baseline(both#[ignore]d).Why
Origin LoCoMo multi-hop = 37%. agentmemory hits 95.2% R@5 on LongMemEval-S with 3-channel (BM25 + vec + graph BFS depth=2). SuperLocalMemory V3.3 reports +23.8pp on multi-hop from spreading-activation. P0 #3 attacks multi-hop via query rewriting (Angle A); KG-tier independent retrieval (Angle B) is deferred to Phase 2 pending measurement.
Failure-mode policy preserved verbatim from salvaged design: "Never errors on normal failure paths (timeout, provider error, malformed JSON, empty array). All such cases return
Ok(vec![query.to_string()])after a log::warn! so the caller can detect 'not decomposed' vialen() <= 1."Test plan
cargo test -p origin-core --lib decompose,search_memory_decomposed,cargo test -p origin-server --lib search_decompose_routing_tests,cargo test -p origin-mcp --lib)cargo check --workspaceclean post-rebase on maindecompose=trueon a compound query ("What changed about my opinion on X after Y?") → verify multiple sub-queries logged + merged resultscargo test -p origin-core --test eval_harness save_locomo_decomposed_baseline -- --ignored --nocaptureFollow-ups (not in this PR)
DecomposeOutputthrough wire types so HTTP clients see token-cost.🤖 Generated with Claude Code