feat(core): config-gated entity-aware ranking boost for hybrid search#969
feat(core): config-gated entity-aware ranking boost for hybrid search#969groksrc wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20299915ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| repo = SQLiteSearchRepository( | ||
| session_maker, | ||
| project_id=test_project.id, | ||
| app_config=app_config, | ||
| ) |
There was a problem hiding this comment.
Pass the stub provider into SQLiteSearchRepository
When these tests run on the SQLite path, app_config has semantic_search_enabled=True but the repository is constructed without embedding_provider, so SQLiteSearchRepository.__init__ creates the configured real provider before lines 113-115 replace it with the stub. That defeats the deterministic stub setup and can make this test load FastEmbed/cache artifacts or fail in environments without usable semantic dependencies; pass embedding_provider=provider here as the Postgres branch already does.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. Verified: SQLiteSearchRepository.init calls create_embedding_provider() at line 71 when semantic_search_enabled=True and no provider is passed — exactly the real-provider instantiation you flagged.
Fixed in dcc4a7a: pass embedding_provider=provider to the SQLiteSearchRepository constructor at construction time (matching what the Postgres branch already did), and removed the three after-the-fact attribute swaps (_semantic_enabled, _embedding_provider, _vector_dimensions) which are now redundant since init sets them correctly from the supplied provider.
All 60 tests in the suite pass; ruff and ty check clean.
Benchmark + tuning report for the entity-aware ranking boostI benchmarked the boost, swept the weight, ran an adversarial check, and updated the docs/config rationale. Decision: keep Setup
Weight sweep — LoCoMo
|
| config | recall@5 | recall@10 | MRR | content-hit |
|---|---|---|---|---|
| baseline (off) | 0.8370 | 0.9207 | 0.7591 | 0.1910 |
| w=0.15 (on) | 0.8370 | 0.9207 | 0.7591 | 0.1910 |
| w=0.3 (on) | 0.8370 | 0.9207 | 0.7591 | 0.1910 |
| w=0.5 (on) | 0.8370 | 0.9207 | 0.7591 | 0.1910 |
| w=1.0 (on) | 0.8370 | 0.9207 | 0.7591 | 0.1910 |
| w=2.0 (on) | 0.8370 | 0.9207 | 0.7591 | 0.1910 |
Per-category recall@5 / MRR were identical at every weight as well (single_hop 0.5490/0.5298, multi_hop 0.9189/0.7962, temporal 0.6154/0.5085, open_domain 0.9143/0.8386, adversarial 0.9149/0.8369). A per-query diff of the top-5 ordering between baseline and w=2.0 showed 0 / 199 queries changed, and per-hit scores were byte-identical.
Finding 1 — LoCoMo is blind to the boost. LoCoMo docs are titled by session id (locomo-c00-s01) and surface speaker names only in body text, never as entity titles or relation names. The boost matches query proper nouns against a candidate's title / linked relation names, so it never fires on this corpus — e.g. for "When did Caroline go to the LGBTQ support group?" the term caroline is correctly extracted but matches no title. The boost machinery is confirmed live (env → config → _extract_query_entity_terms all verified), it simply has nothing to match. LoCoMo therefore gives no signal to raise the default weight — and confirms the boost is non-regressive there (literally a no-op).
Adversarial check (LoCoMo can't see this)
Hand-built corpus where entity terms appear in titles, boost off vs on (w=0.15/0.5/2.0):
| query type | result |
|---|---|
German caps — Katze (gold: doc titled Katze) |
Gold stays #1 at all weights; boost amplifies proportionally. No harmful reorder. |
CamelCase — getUserById (gold: doc titled getUserById) |
Identical to boost-off at every weight — getUserById starts lowercase, so it's rejected as a non-entity token. Correctly inert. |
Title-Case — What Is The Plan For Q3 (gold: third quarter roadmap doc) |
Regression. Q3 is extracted as an entity term; even at w=0.15 a distractor that literally contains "Q3" is promoted above the more relevant "third quarter" doc. Worsens with weight. |
Finding 2 — the capitalization-only heuristic has a real false-positive mode. Title-Cased query phrasing injects spurious entity terms (any capitalized non-entity token), and that can flip a correct top result. So the adversarial check is not clean.
Decision & rationale
- Default: OFF (unchanged). No LoCoMo evidence supports enabling it (inert), and the adversarial check found a genuine Title-Case regression.
- Weight: 0.15 (unchanged). LoCoMo provides zero signal to tune the weight; the only datapoint favoring a higher weight was the single manual "Joanna" QA case, which the benchmark cannot corroborate. Raising the default would be unjustified by data and would only amplify the Title-Case false positive.
- No code/default changes — the PR's existing defaults are already the data-driven choice. I updated
docs/semantic-search.md(new "Benchmark findings" section + guidance) and theconfig.pyrationale comment/description to record why it stays off and where the boost actually helps (entity-heavy corpora with entity-titled notes / natural-case queries, i.e. the Entity-aware ranking boost: cross-conversation entity confusion dominates hard retrieval misses in LoCoMo benchmark #951 "Joanna" case).
Files changed: docs/semantic-search.md, src/basic_memory/config.py (commit b07af9b9).
Gates: ruff check + ruff format --check clean on changed files; uv run ty check src tests test-int → all checks passed; targeted pytest (entity-boost service, hybrid fusion, config) → 118 passed.
🤖 Generated with Claude Code
Status: back to draft — deferred until the boost is demonstrably usefulMaintainer decision following the benchmark + adversarial findings above: the implementation is sound and non-regressive, but there is currently no evidence it improves retrieval, and the capitalization heuristic has a confirmed false-positive class (Title-Case queries promoting literal-token distractors, e.g. "Q3"). We won't land dormant machinery. Likely path to make it land-able
🤖 Generated with Claude Code |
…arch Proper nouns in a query carry no extra weight against generic semantic similarity, so documents about a different entity on the same topic can outrank the document that actually names the queried entity (#951 cross-conversation confusion in the LoCoMo benchmark). Add an optional, lexical-only re-scoring pass to hybrid fusion: - Extract candidate entity terms from the query (capitalized / proper-noun tokens that are not common stopwords; trailing possessives stripped). - Count how many distinct query entity terms appear in each fused candidate's entity name (title) or a relation row's linked entity names. - Multiply matching candidates' fused scores by 1 + weight * min(matches, max_terms), promoting entity-matching docs. The boost runs over the full fused candidate set before the limit/offset cut, so a matching doc below the cutoff can be promoted into the returned window. It adds no model inference (index/lexical lookups only), so per-query latency overhead is trivial, and only affects hybrid retrieval. Behind three config flags, DEFAULT OFF pending LoCoMo benchmark validation: search_entity_boost_enabled, search_entity_boost_weight, search_entity_boost_max_terms. Documented in docs/semantic-search.md. Tests: unit coverage for entity-term extraction and the boost math; a hybrid-pipeline test showing reordering when enabled and unchanged ordering when disabled; and a service-level integration test over a real DB with a deterministic stub embedding provider proving an entity-matching doc outranks a higher-similarity non-matching doc only when enabled. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>
… construction time On the SQLite path, SQLiteSearchRepository.__init__ calls create_embedding_provider() when semantic_search_enabled=True and no provider is supplied, which could pull FastEmbed/cache artifacts or fail where semantic deps are unusable. The Postgres branch already passed the stub provider at construction; bring the SQLite branch in line by passing embedding_provider=provider to the constructor and removing the redundant after-the-fact attribute swaps. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>
Benchmarked the #951 entity-aware ranking boost against the LoCoMo retrieval suite (hybrid mode) and a hand-built adversarial corpus. LoCoMo is insensitive to the boost: sweeping the weight across 0.15/0.3/0.5/1.0/2.0 produced identical recall@5, recall@10, MRR, and content-hit at every point (no query reordered, no score changed). LoCoMo docs are keyed by session id and expose speaker names only in body text, never as entity titles or relation names, so the title/relation-matching boost never fires there. An adversarial check found a real regression mode: Title-Case queries inject spurious entity terms. 'What Is The Plan For Q3' extracts 'Q3' and, even at weight 0.15, promotes a literal-'Q3' document over the more relevant 'third quarter' document. Clean proper nouns (Katze) work; lowercase-leading identifiers (getUserById) are correctly ignored. Decision: keep search_entity_boost_enabled default off and the weight at 0.15. LoCoMo provides no signal to raise the weight, and the adversarial check is not clean. Document the findings and guidance; no code/default changes. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>
b07af9b to
d175879
Compare
Closes #951
Summary
Implements a config-gated, entity-aware ranking boost for hybrid search results (Option A from the issue triage). The boost is a post-fusion re-scoring pass applied to the full fused candidate set before the limit/offset cut, so a matching document below the cutoff can be promoted without changing the underlying FTS or vector retrieval.
Feature is default-off. No schema changes. No model inference — purely lexical/index lookups against existing data.
What changed
config.py— three newBasicMemoryConfigfields, all default-off:search_entity_boost_enabled(False) — master switchsearch_entity_boost_weight(0.15) — multiplicative weight (ge=0.0)search_entity_boost_max_terms(3) — cap on matched terms counted (gt=0)All three respect the existing
BASIC_MEMORY_env-var prefix.search_repository_base.py— new helpers and hook:_extract_query_entity_terms— extracts capitalized/proper-noun tokens from the query, strips possessives, drops stopwords_row_entity_match_count— matches query entity terms against a result row's title (relation rows carry a syntheticFromEntity -> ToEntitytitle from the search service, so this covers entity and relation hits)_apply_entity_boost— appliesscore * (1 + weight * min(matches, max_terms))to matched rows; non-matching rows are untouched so their relative order is preserved_search_hybridright before the final ranking sortsqlite_search_repository.py/postgres_search_repository.py— populate the three_entity_boost_*instance attributes from config, parallel to the existing_semantic_*wiring. The base class declares safe off-by-default defaults so other subclasses and test doubles are unaffected.docs/semantic-search.md— three new config rows added to the Configuration Reference table; new "Entity-Aware Ranking Boost" section explaining the mechanism, env vars, and the default-off/benchmark-pending status.Testing
tests/repository/test_search_repository_entity_boost.py— new dedicated test module covering: disabled-by-default (no score change), boost applied when enabled, boost capped atmax_terms, mixed results (matching vs non-matching docs), zero-weight no-op, and relation title matching both endpoints (test_relation_title_matches_both_endpoints)tests/repository/test_search_repository.py— all passing; hybrid search behaviour unchanged when boost is offuv run pytest tests/repository/ tests/test_config.py -xpasses cleanReview notes
_row_entity_match_countinternal construction is slightly misleading.docs/semantic-search.mdand the helper's docstring both say matching is performed against "a relation row's linked entity names". This is correct in practice — relation rows are indexed with a synthetic title of the formFromEntity -> ToEntityatsrc/basic_memory/services/search_service.py:921, andtest_relation_title_matches_both_endpointscovers it. However, the implementation readshaystack_parts = [row.title or ""], a single-element list with a leftover comment (# the relation_type itself is not an entity name, so it is excluded) describing a multi-source merge that never materialises. Worth simplifying tohaystack = row.title or ""to avoid implying dead/abandoned logic.No dedicated
test_config.pyassertions for the three new fields. The claimed test run includestest_config.pybut it passes only because it was unchanged and does not exercise the new fields. The defaults and Pydantic validators (ge=0.0,gt=0) are covered only indirectly via the repository tests. A small explicit default + env-var test would harden the config surface, though the Pydantic constraints make this low risk.Feature is dormant on merge (default-off, not benchmarked). This is consistent with the issue's "measurable commit-to-commit" framing and the triage's Option A recommendation, but reviewers should be aware the new code path is not active in any default configuration and has not been validated against the LoCoMo benchmark. The intent is to enable it once a benchmark run confirms neutral-or-positive impact.
🤖 Generated with Claude Code