Skip to content

fix: support MatchPhrase filter in local mode#1213

Open
ATOM00blue wants to merge 1 commit into
qdrant:devfrom
ATOM00blue:fix-matchphrase-local
Open

fix: support MatchPhrase filter in local mode#1213
ATOM00blue wants to merge 1 commit into
qdrant:devfrom
ATOM00blue:fix-matchphrase-local

Conversation

@ATOM00blue
Copy link
Copy Markdown

Problem

Using a MatchPhrase condition in a filter against a local-mode client
(QdrantClient(":memory:") or local persistence) raises:

ValueError: Unknown match condition: phrase='...'

Phrase matching is supported by the server and is already handled by the
gRPC/REST converters (RestToGrpc/GrpcToRest), but check_match in
qdrant_client/local/payload_filters.py does not handle models.MatchPhrase,
so it falls through to the "Unknown match condition" error. This makes local
mode diverge from server behavior for any filter that uses a phrase match.

Minimal reproduction:

from qdrant_client import QdrantClient, models

client = QdrantClient(":memory:")
client.create_collection(
    "t", vectors_config=models.VectorParams(size=2, distance=models.Distance.COSINE)
)
client.upsert("t", [
    models.PointStruct(id=1, vector=[0.1, 0.2], payload={"text": "quick brown fox"}),
])

client.scroll(
    "t",
    scroll_filter=models.Filter(
        must=[models.FieldCondition(key="text", match=models.MatchPhrase(phrase="brown fox"))]
    ),
)  # -> ValueError: Unknown match condition

Fix

Handle MatchPhrase in check_match. The phrase is matched as a contiguous,
order-preserving sub-sequence of the field value's tokens (whitespace
tokenization, consistent with the existing MatchTextAny handling). This
matches the documented phrase semantics, e.g. "quick brown fox" is matched
by "brown fox" but not by "fox brown".

Tests

Added test_match_phrase_filter_query in
qdrant_client/local/tests/test_payload_filters.py covering contiguous
matches, wrong order, non-contiguous tokens, partial tokens, list-valued
fields and missing fields. The test fails before the change (with the
ValueError) and passes after.

Checklist

  • Targets the dev branch.
  • Added a test for the change.
  • pre-commit (ruff-format) and mypy pass locally; local-mode test suite passes.

Local mode raised "Unknown match condition" when a filter used
MatchPhrase, even though phrase matching is supported by the server
and by the gRPC/REST converters. Handle MatchPhrase in check_match by
matching the phrase tokens as a contiguous, ordered sub-sequence of the
field value tokens, and add tests covering it.

Signed-off-by: ATOM00blue <219721791+ATOM00blue@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 22, 2026 02:30
@netlify
Copy link
Copy Markdown

netlify Bot commented May 22, 2026

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit d123317
🔍 Latest deploy log https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/6a0fbfc1fce5b80008fe0d61
😎 Deploy Preview https://deploy-preview-1213--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces phrase-based text matching to payload filter evaluation in the qdrant-client. A new check_phrase_match helper function tokenizes both the search phrase and the candidate value, returns True for empty phrases, rejects cases where the phrase has more tokens than the value, and checks whether the phrase token sequence occurs contiguously within the value tokens. The existing check_match function is extended to recognize models.MatchPhrase and evaluate it using the new helper, while preserving all existing match behavior. Test coverage validates correct matching of contiguous sub-phrases, rejection of wrong token order and non-contiguous tokens, handling of list-valued fields, and missing fields.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding support for MatchPhrase filtering in local mode, which directly addresses the bug described in the pull request.
Description check ✅ Passed The description thoroughly explains the problem (ValueError when using MatchPhrase in local mode), provides a minimal reproduction case, details the fix implementation, and documents comprehensive test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
qdrant_client/local/tests/test_payload_filters.py (1)

192-225: ⚡ Quick win

Add a non-string payload regression case for MatchPhrase.

This test is solid, but it currently doesn’t assert behavior when text is non-string (e.g., 123), which is a realistic payload shape and guards against crashes.

Proposed test addition
 def test_match_phrase_filter_query():
@@
     # missing field does not match
     assert matches("brown fox", {"other": "value"}) is False
+
+    # non-string field does not match
+    assert matches("brown fox", {"text": 123}) is False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@qdrant_client/local/tests/test_payload_filters.py` around lines 192 - 225,
Test test_match_phrase_filter_query lacks a regression case for non-string
payloads and may not cover crashes when the field is an int; update the test (in
test_match_phrase_filter_query and its nested matches helper usage) to include
at least one assertion where payload's "text" is a non-string (e.g., 123) and
assert that matches("brown fox", {"text": 123}) returns False (and does not
raise), ensuring MatchPhrase handling of non-string/list values is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@qdrant_client/local/payload_filters.py`:
- Around line 171-172: The MatchPhrase branch currently calls
check_phrase_match(condition.phrase, value) for any non-None payload value which
will raise if value is not a str; update the MatchPhrase handler (the branch
that checks isinstance(condition, models.MatchPhrase)) to first ensure value is
an instance of str (e.g., isinstance(value, str)) and only call
check_phrase_match when it is, returning False for non-string payloads so
non-string values are safely treated as no match.

---

Nitpick comments:
In `@qdrant_client/local/tests/test_payload_filters.py`:
- Around line 192-225: Test test_match_phrase_filter_query lacks a regression
case for non-string payloads and may not cover crashes when the field is an int;
update the test (in test_match_phrase_filter_query and its nested matches helper
usage) to include at least one assertion where payload's "text" is a non-string
(e.g., 123) and assert that matches("brown fox", {"text": 123}) returns False
(and does not raise), ensuring MatchPhrase handling of non-string/list values is
validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2afcab61-352e-4904-b58e-ec0702ff02db

📥 Commits

Reviewing files that changed from the base of the PR and between 790328b and d123317.

📒 Files selected for processing (2)
  • qdrant_client/local/payload_filters.py
  • qdrant_client/local/tests/test_payload_filters.py

Comment on lines +171 to +172
if isinstance(condition, models.MatchPhrase):
return value is not None and check_phrase_match(condition.phrase, value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard MatchPhrase against non-string payload values to avoid runtime errors.

At Line 172, check_phrase_match(..., value) is called for any non-None value, but check_phrase_match expects str and calls .split(). Non-string payloads will raise at runtime instead of evaluating to False.

Proposed fix
     if isinstance(condition, models.MatchPhrase):
-        return value is not None and check_phrase_match(condition.phrase, value)
+        return isinstance(value, str) and check_phrase_match(condition.phrase, value)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if isinstance(condition, models.MatchPhrase):
return value is not None and check_phrase_match(condition.phrase, value)
if isinstance(condition, models.MatchPhrase):
return isinstance(value, str) and check_phrase_match(condition.phrase, value)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@qdrant_client/local/payload_filters.py` around lines 171 - 172, The
MatchPhrase branch currently calls check_phrase_match(condition.phrase, value)
for any non-None payload value which will raise if value is not a str; update
the MatchPhrase handler (the branch that checks isinstance(condition,
models.MatchPhrase)) to first ensure value is an instance of str (e.g.,
isinstance(value, str)) and only call check_phrase_match when it is, returning
False for non-string payloads so non-string values are safely treated as no
match.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR brings local-mode payload filtering in line with server behavior by adding support for models.MatchPhrase in the local filter evaluation logic.

Changes:

  • Add a check_phrase_match helper and handle models.MatchPhrase in check_match.
  • Add a dedicated local-mode test covering phrase matching semantics and edge cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
qdrant_client/local/payload_filters.py Implements local evaluation for MatchPhrase via whitespace tokenization and contiguous, order-preserving matching.
qdrant_client/local/tests/test_payload_filters.py Adds coverage for phrase matching behavior (contiguous matches, wrong order, non-contiguous tokens, list fields, missing fields).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 168 to +172
return value is not None and condition.text in value
if isinstance(condition, models.MatchTextAny):
return value is not None and any(word in value for word in condition.text_any.split())
if isinstance(condition, models.MatchPhrase):
return value is not None and check_phrase_match(condition.phrase, value)
@joein
Copy link
Copy Markdown
Member

joein commented May 22, 2026

Hey @ATOM00blue

Thank you for pointing it out!
According to the docs when there is no full-text index, phrase matching is supposed to work as an exact substring match.

The current implementation is a bit different from the docs, so we'd need to update the PR to match Qdrant server's behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants