Avoid ONNX session load for token counts by Mirochill · Pull Request #640 · qdrant/fastembed

Mirochill · 2026-05-27T09:10:10Z

All Submissions:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass the existing tests?
Have you added tests for your feature?
Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

Summary

Fixes #584.

This updates lazy token_count() paths so they load tokenizer files from the resolved model directory without initializing ONNX sessions. The change covers text embeddings, cross-encoder rerankers, sparse BM42, late-interaction embeddings, and multimodal late-interaction wrappers.

The lazy-load regression tests now exercise token_count() before embedding/reranking and still assert that the ONNX model session has not been loaded yet.

Validation

git diff --check
git diff --cached --check
git show --check --stat --oneline HEAD
Static search of the targeted token_count() paths for remaining ONNX-session loading calls
Not run locally: tests, docs build, linter, pre-commit

Avoid ONNX session load for token counts

0de9c01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid ONNX session load for token counts#640

Avoid ONNX session load for token counts#640
Mirochill wants to merge 1 commit into
qdrant:mainfrom
Mirochill:fix-584-token-count-tokenizer-only

Mirochill commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mirochill commented May 27, 2026

All Submissions:

New Feature Submissions:

Summary

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant