feat(embeddings): support offline FastEmbed startup#44
Merged
Conversation
Allow deployments with pre-populated FastEmbed caches to opt into local-only model loading so blocked HuggingFace/GCS access fails with actionable guidance instead of a startup download attempt.
Owner
|
Please can we add a aside from that LGTM. |
ScottRBK
approved these changes
May 17, 2026
Contributor
Author
|
Thanks for the review! Added in c2d9e06 — FASTEMBED_LOCAL_FILES_ONLY=false is now in docker/.env.example right after the FastEmbed embedding block, with a short comment noting that the flag covers both the embedding model and the FastEmbed reranker (so contributors don't have to read docs/OFFLINE_SETUP.md to know it applies to the reranker too). Ready for another look when you have a minute. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FASTEMBED_LOCAL_FILES_ONLY=truefor FastEmbed embedding and reranking model loading.local_files_only=Truewhen offline mode is enabled.Why
In corporate or offline environments, HuggingFace/GCS downloads may be blocked. Forgetful already supports
FASTEMBED_CACHE_DIR, but if the cache is missing or incomplete, startup can still attempt remote model resolution. This gives deployments a deterministic cache-only mode that fails fast instead of attempting network access.Validation
uv run pytest tests/integration/test_fastembed_offline.py tests/integration/test_openai_embeddings_adapter.py tests/integration/test_http_reranker_adapter.py-> 26 passed.uv run python -m compileall ...on changed modules/tests -> passed.FASTEMBED_LOCAL_FILES_ONLY=truereturns the new actionableRuntimeError.FASTEMBED_LOCAL_FILES_ONLY=trueandRERANKING_ENABLED=falsestill exposes the 3 meta-tools..onnxmodel files.Compatibility
FASTEMBED_LOCAL_FILES_ONLY=false.