Skip to content

fix: restore simple FTS tokenizer default#7006

Merged
BubbleCal merged 2 commits into
mainfrom
xuanwo/revert-fts-default-tokenizer
Jun 1, 2026
Merged

fix: restore simple FTS tokenizer default#7006
BubbleCal merged 2 commits into
mainfrom
xuanwo/revert-fts-default-tokenizer

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented May 29, 2026

This restores the native FTS default tokenizer to simple after ICU showed behavior differences that are too large for the default path. ICU remains available through explicit base_tokenizer="icu", while docs and tests now describe the default as simple again.

@github-actions
Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added bug Something isn't working python labels May 29, 2026
@Xuanwo Xuanwo marked this pull request as ready for review May 29, 2026 18:22
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented May 29, 2026

This PR touches the Lance format specification.

Not really

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@BubbleCal BubbleCal merged commit ba04846 into main Jun 1, 2026
30 checks passed
@BubbleCal BubbleCal deleted the xuanwo/revert-fts-default-tokenizer branch June 1, 2026 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants