Skip to content

feat(embeddings): support offline FastEmbed startup#44

Merged
ScottRBK merged 3 commits into
ScottRBK:mainfrom
bertheto:feat/offline-fastembed-startup
May 17, 2026
Merged

feat(embeddings): support offline FastEmbed startup#44
ScottRBK merged 3 commits into
ScottRBK:mainfrom
bertheto:feat/offline-fastembed-startup

Conversation

@bertheto

Copy link
Copy Markdown
Contributor

Summary

  • Adds opt-in FASTEMBED_LOCAL_FILES_ONLY=true for FastEmbed embedding and reranking model loading.
  • Preserves default behavior for users who rely on first-run model downloads.
  • Delays FastEmbed imports until adapter construction and passes local_files_only=True when offline mode is enabled.
  • Wraps local-cache load failures with actionable guidance that includes the model and cache path.

Why

In corporate or offline environments, HuggingFace/GCS downloads may be blocked. Forgetful already supports FASTEMBED_CACHE_DIR, but if the cache is missing or incomplete, startup can still attempt remote model resolution. This gives deployments a deterministic cache-only mode that fails fast instead of attempting network access.

Validation

  • uv run pytest tests/integration/test_fastembed_offline.py tests/integration/test_openai_embeddings_adapter.py tests/integration/test_http_reranker_adapter.py -> 26 passed.
  • uv run python -m compileall ... on changed modules/tests -> passed.
  • Empty-cache negative scenario with FASTEMBED_LOCAL_FILES_ONLY=true returns the new actionable RuntimeError.
  • Offline MCP tool-list smoke with FASTEMBED_LOCAL_FILES_ONLY=true and RERANKING_ENABLED=false still exposes the 3 meta-tools.
  • Full cache-present FastEmbed startup was not run locally because the configured cache exists but has no .onnx model files.

Compatibility

  • Default remains FASTEMBED_LOCAL_FILES_ONLY=false.
  • Existing Azure, Google, OpenAI, Ollama, and HTTP reranker paths are unchanged.
  • This PR is independent from feat(mcp): add compact descriptor mode #43, which only addresses compact MCP descriptors.

bertheto added 2 commits May 10, 2026 19:41
Allow deployments with pre-populated FastEmbed caches to opt into local-only model loading so blocked HuggingFace/GCS access fails with actionable guidance instead of a startup download attempt.
@ScottRBK

ScottRBK commented May 14, 2026

Copy link
Copy Markdown
Owner

Please can we add a docker/.env.example entry for FASTEMBED_LOCAL_FILES_ONLY=false

aside from that LGTM.

@bertheto

Copy link
Copy Markdown
Contributor Author

Thanks for the review! Added in c2d9e06 — FASTEMBED_LOCAL_FILES_ONLY=false is now in docker/.env.example right after the FastEmbed embedding block, with a short comment noting that the flag covers both the embedding model and the FastEmbed reranker (so contributors don't have to read docs/OFFLINE_SETUP.md to know it applies to the reranker too).

Ready for another look when you have a minute.

@ScottRBK ScottRBK merged commit 5fc8650 into ScottRBK:main May 17, 2026
2 checks passed
@bertheto bertheto deleted the feat/offline-fastembed-startup branch May 18, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants