fix: param binding in search_similar_chunks with filters (regression from #655)#16
Open
dklymentiev wants to merge 2 commits into
Open
fix: param binding in search_similar_chunks with filters (regression from #655)#16dklymentiev wants to merge 2 commits into
dklymentiev wants to merge 2 commits into
Conversation
LLM agents sometimes pass tags as comma-separated string instead of array. All tag parameters now accept both formats via _parse_tags() helper. Fully backward-compatible -- arrays work as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…from #655)
When any of tags/date_from/date_to was set in SearchRequest, the filter
branch of search_similar_chunks bound the LIMIT integer to the
workspace_id slot (and vice versa), causing:
invalid input for query argument $N: 30 (expected str, got int)
and a 500 on every /search that carried a filter.
Root cause: #655 added a workspace filter inside the inner WHERE of the
filter branch, but left `params.append(limit)` above the workspace
append. Because this branch labels parameters through a running `idx`,
the early `limit` push shifted every subsequent slot by one:
$idx intended actually
---- ---------------- ----------------
$5 c.workspace_id limit (int) <- 500: expected str
$6 outer LIMIT ws (str)
Fix: append `limit` after the workspace parameter and reference it via
an explicit `outer_limit_ref` so the textual query and *params stay in
sync. No change to the no-filter branch (it pins positions 1..5
literally and is unaffected).
Reproduce before fix:
curl -X POST http://localhost:8000/search \
-H "X-Workspace: any-ws" \
-d '{"query":"x","date_from":"2026-01-01T00:00:00"}'
# 500 Internal Server Error
After fix: 200 with results scoped to the requested workspace.
Existing tests mock crud.EmbeddingCRUD at the class level
(tests/conftest.py), so the real SQL path was not covered; an
integration test for the filter branches is left to a follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /searchreturns 500 Internal Server Error on every request that carries any oftags,date_from, ordate_to. The failure comes from the workspace-scoped filter branch ofsearch_similar_chunksinmesh/crud.py, which binds the outerLIMITinteger to theworkspace_idslot (and vice versa).This is a regression introduced by PR #655 (chunk vector search broken with workspace RLS): that PR added a workspace predicate inside the dynamic filter branch but left
params.append(limit)above the workspace append, shifting every subsequent$Nslot by one.Root cause
The filter branch builds parameters through a running
idx. With, say, onlydate_fromset, the runtime state ends up as:$N$1$2$3$4$5$6asyncpg raises on
$5:The no-filter
elsebranch is unaffected because it pins positions 1..5 literally and does not go through the dynamicidxbuilder.Fix
Append
limitafter the workspace parameter and reference it via an explicitouter_limit_refso the textual query and*paramsstay in sync. Traced through for all 7 filter combinations (tags only, date_from only, date_to only, and every pair/triple) — bindings align in every case.Reproduction
Before this PR:
After this PR: same request returns 200 with documents correctly scoped to
any-workspaceand filtered by thecreated_atpredicate.Blast radius
mesh/crud.py)search_similar_chunks)Any caller that was previously getting a 500 from a filtered search (e.g. scout-matcher, HQ memory search with tags, MCP
memory_searchwith tags) will start getting correct results. No caller sees a behavior downgrade — the filter branch was 100% broken before this change.Tests
Existing test suite mocks
crud.EmbeddingCRUDat the class level (tests/conftest.py), so the real SQL path is not exercised and the regression slipped through. A dedicated integration test for the three filter axes (tags, date_from, date_to) would have caught this — intentionally not added in this PR to keep it focused on the regression fix; happy to follow up with a test-only PR.Verification performed
POST /searchwithout filters: 200, unchanged results ✓POST /searchwithdate_from: 200 (was 500) ✓POST /searchwithtags: 200 (was 500) ✓POST /searchwithtags + date_from + date_to: 200 (was 500) ✓SUCCESS duration=110.48s(wasFAILED error=500on first subscriber) ✓Out of scope
🤖 Generated with Claude Code