Skip to content

Fix search_graph query= multi-minute latency: two-step FTS5 subquery#302

Open
awconstable wants to merge 1 commit intoDeusData:mainfrom
arbor-education:fix/301-bm25-fts5-early-termination
Open

Fix search_graph query= multi-minute latency: two-step FTS5 subquery#302
awconstable wants to merge 1 commit intoDeusData:mainfrom
arbor-education:fix/301-bm25-fts5-early-termination

Conversation

@awconstable
Copy link
Copy Markdown

@awconstable awconstable commented Apr 30, 2026

Fixes #301

Root cause

search_graph with a query= argument uses SQLite FTS5 for BM25-ranked full-text search. The previous flat query:

SELECT ... FROM nodes_fts
JOIN nodes n ON n.id = nodes_fts.rowid
WHERE nodes_fts MATCH ?
  AND n.project = ?
  AND n.label NOT IN ('File','Folder',...)
ORDER BY bm25(nodes_fts) LIMIT 20

blocks FTS5's WAND/MaxScore early-exit optimisation. FTS5 can short-circuit ORDER BY bm25() LIMIT N only when it drives the entire query plan. The outer JOIN + WHERE n.project = ? predicate is invisible to the FTS5 planner — it must score every matching document before the outer filter can discard any of them. On a large codebase with 100K+ matches this causes 2–16 minute queries.

The same problem applied to the count query, making each search_graph call pay the full scan cost twice.

Changes

Two-step subquery (bm25_search in src/mcp/mcp.c)

The inner FTS5-only subquery has no outer predicates, so SQLite CAN early-terminate it:

SELECT ...
FROM (
    SELECT rowid, bm25(nodes_fts) AS base_rank
    FROM nodes_fts WHERE nodes_fts MATCH ?1
    ORDER BY base_rank LIMIT 2000          -- FTS5 early-terminates here
) fts
JOIN nodes n ON n.id = fts.rowid
WHERE n.project = ?2
  AND n.label NOT IN ('File','Folder',...)
ORDER BY rank LIMIT ?3 OFFSET ?4

The count query uses the same inner-limit subquery structure.

Trade-off: total in the response is now capped at BM25_INNER_LIMIT (2000) — it reflects how many of the top 2000 BM25 candidates passed the project/label filters, not the full matching node count. For a code search tool, getting the top 20 most relevant results in 500ms is far more useful than an exact count after 16 minutes.

Benchmark

Tested on a large codebase (~200K nodes, ~500MB database):

Query Before After Speedup
query=approve apps authorization school 18 023ms 569ms 32×
query=Group User Details Manage All Users 120 036ms 508ms 236×
query=dev portal approve integration third party 1 015 180ms 1 009ms 1006×

The ~500ms floor is cold-start I/O when spawning a fresh process against a ~500MB database. In the long-running MCP server (warm file cache) BM25 queries return in sub-millisecond time.

Tests

All store search tests pass. The MCP test suite has a pre-existing stack buffer overflow in build_project_list_error (unrelated to this change) that kills the test runner before MCP-layer tests run; the store-layer tests all complete cleanly.

Flat BM25 queries of the form:
  SELECT ... FROM nodes_fts JOIN nodes WHERE MATCH ? AND project=? ORDER BY bm25() LIMIT N
block FTS5 WAND/MaxScore early-exit — the outer JOIN+WHERE is invisible to
the FTS5 planner, so it scores every matching document before any filter fires.
On a large codebase with 100K+ matches this causes 2–16 minute queries.

Fix: two-step subquery.  The inner FTS5-only query:
  SELECT rowid, bm25(nodes_fts) FROM nodes_fts WHERE MATCH ? ORDER BY bm25() LIMIT 2000
can early-terminate because no outer predicate blocks it.  The outer query
then joins and filters at most BM25_INNER_LIMIT (2000) candidates.

The count query uses the identical inner-limit subquery, so it benefits too.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DeusData DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

search_graph query= takes minutes on large codebases — FTS5 early termination blocked by outer JOIN

2 participants