fix(core): repair FTS half of hybrid search for natural-language queries#994
fix(core): repair FTS half of hybrid search for natural-language queries#994groksrc wants to merge 2 commits into
Conversation
Hybrid search was silently running vector-only on natural-language queries — the FTS branch contributed zero candidates. Two causes in the SQLite (and parallel Postgres) FTS query preparation: 1. Sentence punctuation forced phrase matching. A question like "When did Melanie paint a sunrise?" reached FTS5 as the exact phrase '"When did Melanie paint a sunrise?"*', which matches no document. The FTS5 tokenizer ignores this punctuation in the index, so stripping it from word edges loses nothing — but leaving it disabled the entire FTS contribution. _prepare_single_term now strips ?!.,;: from word edges of multi-word queries (interior characters — hyphens, slashes in permalinks/paths — untouched). 2. No relaxation when strict all-terms-AND matched nothing. Questions rarely have every word in one document, so even after (1) the strict AND returned zero rows. The hybrid path now retries once with an OR-joined, stopword-filtered, content-term query when the strict query is empty. bm25/ts_rank still rank multi-term matches first, and fusion with the vector branch keeps relaxed lexical candidates from dominating precision. The relaxation is gated behind a new allow_relaxed=False parameter on SearchRepositoryBase.search; only _search_hybrid opts in. Strict FTS behavior (search_type=text, title, permalink, link resolution) is unchanged — the service layer keeps its own conservative fallback. No config flag, default-safe. Discovered via the benchmark harness: two different fusion algorithms produced byte-identical rankings across 1,986 queries (impossible with two live sources), and instrumentation confirmed fts=0 on 40/40 sampled LoCoMo queries. Benchmark impact (corrected LoCoMo, 1,986 queries, same index, retrieval metrics — every category improves, no regression): recall@5 0.745 -> 0.823 (+7.9) MRR 0.618 -> 0.718 (+10.0) headline r5 0.734 -> 0.801, MRR 0.621 -> 0.706 Largest gains on open_domain (+0.10 r5) and adversarial (+0.12 r5); smallest on temporal (+0.003 r5 / +0.02 MRR). Tests: punctuation no longer phrase-quotes; relaxation builds the expected OR query and respects boolean/quoted/short-query intent; the hybrid opt-in surfaces a partial-overlap document while the default strict path still returns empty. Parallel coverage for Postgres. Full SQLite unit suite green (2968 passed); ty + ruff clean. Signed-off-by: Drew Cain <groksrc@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2462844beb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| retrieval_mode=SearchRetrievalMode.FTS, | ||
| limit=candidate_limit, | ||
| offset=0, | ||
| allow_relaxed=True, |
There was a problem hiding this comment.
Gate relaxed hybrid FTS with existing eligibility
When a HYBRID query has a strict FTS miss, this now enables OR-relaxation for every query shape, including short titles and numeric identifiers such as SPEC 16 or root note 1. The service-level relaxed FTS path explicitly rejects those cases in SearchService._is_relaxed_fts_fallback_eligible because OR-relaxing them over-broadens results; in hybrid, the relaxed FTS-only rows are then normalized up to 1.0 and can outrank the vector result the user actually needed. Please apply the same eligibility constraints before opting the hybrid FTS branch into relaxation.
Useful? React with 👍 / 👎.
CI Postgres shard caught two issues invisible to the local SQLite suite:
1. Postgres _prepare_single_term regression: the new edge-punctuation
strip ran after special-character cleaning, so an all-special-char
term ("()&!:") collapsed to empty and skipped the existing
NOSPECIALCHARS:* guard, emitting a malformed ":*". Folded the strip
into the word handlers so every guard survives, and added a
single-word empty guard.
2. Backend-specific test assumptions. Four tests in
test_search_repository.py (run under both backends via the
search_repository fixture) asserted SQLite FTS5 syntax and
SQLite-only strict-miss behavior. Postgres to_tsquery('english', ...)
auto-strips stopwords, so "When did Melanie paint a sunrise?" already
matches under strict AND. Made the four tests backend-aware via the
existing is_postgres_backend() helper, and switched the relaxation
integration test to a query with a word absent from the doc
("hiking") so the strict miss holds on both backends.
Reproduced and fixed against real Postgres (testcontainers): full
search test surface green on both backends (53 passed Postgres,
2968 SQLite), ruff + ty clean.
Signed-off-by: Drew Cain <groksrc@gmail.com>
df79850 to
8d4d1f1
Compare
Summary
Hybrid search has been silently running vector-only on natural-language queries — the full-text (FTS) branch contributed zero candidates. This restores it, with a large, regression-free retrieval improvement.
Two causes in FTS query preparation (SQLite, with parallel Postgres fixes):
Sentence punctuation forced exact-phrase matching. A question like
When did Melanie paint a sunrise?reached FTS5 as the phrase"When did Melanie paint a sunrise?"*, which matches no document. The FTS5 tokenizer ignores that punctuation in the index anyway, so stripping it from word edges loses nothing — but leaving it disabled the entire FTS contribution._prepare_single_termnow strips?!.,;:from the edges of multi-word query terms (interior-//for permalinks/paths untouched).No relaxation when strict all-terms-AND matched nothing. Questions rarely have every word in one document, so even after (1) the strict AND returned zero rows. The hybrid path now retries once with an OR-joined, stopword-filtered, content-term query when the strict query is empty.
bm25/ts_rankstill rank multi-term matches first, and fusion with the vector branch keeps relaxed lexical candidates from dominating precision.Scope / safety
allow_relaxed=Falseparameter onSearchRepositoryBase.search; only_search_hybridopts in.search_type=text,title,permalink, link resolution) are unchanged — the service layer keeps its own conservative fallback.How it was found
The benchmark harness produced two byte-identical rankings across 1,986 queries for two different fusion algorithms — impossible with two live retrieval sources. Instrumentation then confirmed
fts=0candidates on 40/40 sampled LoCoMo queries.Benchmark impact
Corrected LoCoMo (1,986 queries, same index, deterministic retrieval metrics). Every category improves; no regression:
Per category (recall@5): open_domain +0.10, adversarial +0.12, multi_hop +0.05, single_hop +0.03, temporal +0.003. (Retrieval metrics only; QA-accuracy re-run is queued separately.)
Tests
_relaxed_tsquery_text, punctuation, relaxed retry).ty+ruffclean.🤖 Generated with Claude Code