fix: support MatchPhrase filter in local mode#1213
Conversation
Local mode raised "Unknown match condition" when a filter used MatchPhrase, even though phrase matching is supported by the server and by the gRPC/REST converters. Handle MatchPhrase in check_match by matching the phrase tokens as a contiguous, ordered sub-sequence of the field value tokens, and add tests covering it. Signed-off-by: ATOM00blue <219721791+ATOM00blue@users.noreply.github.com>
✅ Deploy Preview for poetic-froyo-8baba7 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughThis PR introduces phrase-based text matching to payload filter evaluation in the qdrant-client. A new Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
qdrant_client/local/tests/test_payload_filters.py (1)
192-225: ⚡ Quick winAdd a non-string payload regression case for
MatchPhrase.This test is solid, but it currently doesn’t assert behavior when
textis non-string (e.g.,123), which is a realistic payload shape and guards against crashes.Proposed test addition
def test_match_phrase_filter_query(): @@ # missing field does not match assert matches("brown fox", {"other": "value"}) is False + + # non-string field does not match + assert matches("brown fox", {"text": 123}) is False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@qdrant_client/local/tests/test_payload_filters.py` around lines 192 - 225, Test test_match_phrase_filter_query lacks a regression case for non-string payloads and may not cover crashes when the field is an int; update the test (in test_match_phrase_filter_query and its nested matches helper usage) to include at least one assertion where payload's "text" is a non-string (e.g., 123) and assert that matches("brown fox", {"text": 123}) returns False (and does not raise), ensuring MatchPhrase handling of non-string/list values is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@qdrant_client/local/payload_filters.py`:
- Around line 171-172: The MatchPhrase branch currently calls
check_phrase_match(condition.phrase, value) for any non-None payload value which
will raise if value is not a str; update the MatchPhrase handler (the branch
that checks isinstance(condition, models.MatchPhrase)) to first ensure value is
an instance of str (e.g., isinstance(value, str)) and only call
check_phrase_match when it is, returning False for non-string payloads so
non-string values are safely treated as no match.
---
Nitpick comments:
In `@qdrant_client/local/tests/test_payload_filters.py`:
- Around line 192-225: Test test_match_phrase_filter_query lacks a regression
case for non-string payloads and may not cover crashes when the field is an int;
update the test (in test_match_phrase_filter_query and its nested matches helper
usage) to include at least one assertion where payload's "text" is a non-string
(e.g., 123) and assert that matches("brown fox", {"text": 123}) returns False
(and does not raise), ensuring MatchPhrase handling of non-string/list values is
validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2afcab61-352e-4904-b58e-ec0702ff02db
📒 Files selected for processing (2)
qdrant_client/local/payload_filters.pyqdrant_client/local/tests/test_payload_filters.py
| if isinstance(condition, models.MatchPhrase): | ||
| return value is not None and check_phrase_match(condition.phrase, value) |
There was a problem hiding this comment.
Guard MatchPhrase against non-string payload values to avoid runtime errors.
At Line 172, check_phrase_match(..., value) is called for any non-None value, but check_phrase_match expects str and calls .split(). Non-string payloads will raise at runtime instead of evaluating to False.
Proposed fix
if isinstance(condition, models.MatchPhrase):
- return value is not None and check_phrase_match(condition.phrase, value)
+ return isinstance(value, str) and check_phrase_match(condition.phrase, value)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isinstance(condition, models.MatchPhrase): | |
| return value is not None and check_phrase_match(condition.phrase, value) | |
| if isinstance(condition, models.MatchPhrase): | |
| return isinstance(value, str) and check_phrase_match(condition.phrase, value) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@qdrant_client/local/payload_filters.py` around lines 171 - 172, The
MatchPhrase branch currently calls check_phrase_match(condition.phrase, value)
for any non-None payload value which will raise if value is not a str; update
the MatchPhrase handler (the branch that checks isinstance(condition,
models.MatchPhrase)) to first ensure value is an instance of str (e.g.,
isinstance(value, str)) and only call check_phrase_match when it is, returning
False for non-string payloads so non-string values are safely treated as no
match.
There was a problem hiding this comment.
Pull request overview
This PR brings local-mode payload filtering in line with server behavior by adding support for models.MatchPhrase in the local filter evaluation logic.
Changes:
- Add a
check_phrase_matchhelper and handlemodels.MatchPhraseincheck_match. - Add a dedicated local-mode test covering phrase matching semantics and edge cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
qdrant_client/local/payload_filters.py |
Implements local evaluation for MatchPhrase via whitespace tokenization and contiguous, order-preserving matching. |
qdrant_client/local/tests/test_payload_filters.py |
Adds coverage for phrase matching behavior (contiguous matches, wrong order, non-contiguous tokens, list fields, missing fields). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return value is not None and condition.text in value | ||
| if isinstance(condition, models.MatchTextAny): | ||
| return value is not None and any(word in value for word in condition.text_any.split()) | ||
| if isinstance(condition, models.MatchPhrase): | ||
| return value is not None and check_phrase_match(condition.phrase, value) |
|
Hey @ATOM00blue Thank you for pointing it out! The current implementation is a bit different from the docs, so we'd need to update the PR to match Qdrant server's behaviour. |
Problem
Using a
MatchPhrasecondition in a filter against a local-mode client(
QdrantClient(":memory:")or local persistence) raises:Phrase matching is supported by the server and is already handled by the
gRPC/REST converters (
RestToGrpc/GrpcToRest), butcheck_matchinqdrant_client/local/payload_filters.pydoes not handlemodels.MatchPhrase,so it falls through to the "Unknown match condition" error. This makes local
mode diverge from server behavior for any filter that uses a phrase match.
Minimal reproduction:
Fix
Handle
MatchPhraseincheck_match. The phrase is matched as a contiguous,order-preserving sub-sequence of the field value's tokens (whitespace
tokenization, consistent with the existing
MatchTextAnyhandling). Thismatches the documented phrase semantics, e.g.
"quick brown fox"is matchedby
"brown fox"but not by"fox brown".Tests
Added
test_match_phrase_filter_queryinqdrant_client/local/tests/test_payload_filters.pycovering contiguousmatches, wrong order, non-contiguous tokens, partial tokens, list-valued
fields and missing fields. The test fails before the change (with the
ValueError) and passes after.Checklist
devbranch.pre-commit(ruff-format) andmypypass locally; local-mode test suite passes.