Skip to content

Avoid ONNX session load for token counts#640

Draft
Mirochill wants to merge 1 commit into
qdrant:mainfrom
Mirochill:fix-584-token-count-tokenizer-only
Draft

Avoid ONNX session load for token counts#640
Mirochill wants to merge 1 commit into
qdrant:mainfrom
Mirochill:fix-584-token-count-tokenizer-only

Conversation

@Mirochill
Copy link
Copy Markdown

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  • Does your submission pass the existing tests?
  • Have you added tests for your feature?
  • Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

Summary

Fixes #584.

This updates lazy token_count() paths so they load tokenizer files from the resolved model directory without initializing ONNX sessions. The change covers text embeddings, cross-encoder rerankers, sparse BM42, late-interaction embeddings, and multimodal late-interaction wrappers.

The lazy-load regression tests now exercise token_count() before embedding/reranking and still assert that the ONNX model session has not been loaded yet.

Validation

  • git diff --check
  • git diff --cached --check
  • git show --check --stat --oneline HEAD
  • Static search of the targeted token_count() paths for remaining ONNX-session loading calls
  • Not run locally: tests, docs build, linter, pre-commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: make token count accessible without loading the model

1 participant