Skip to content

[Bug]: HF_HUB_OFFLINE=1 bypasses local HF cache and triggers GCS download #615

@amasolov

Description

@amasolov

What happened?

When HF_HUB_OFFLINE=1 is set, download_model() does not properly use the local HuggingFace cache. Instead, the model_info() API call fails immediately with an EnvironmentError (offline mode), and fastembed falls back to downloading ~83MB from storage.googleapis.com. In air-gapped environments where Google Cloud Storage is also unreachable, fastembed cannot load models that are already present in the local cache.

Related: #565

What is the expected behaviour?

With HF_HUB_OFFLINE=1, fastembed should resolve models from the local HF cache (models--org--name/snapshots/...) without any network calls.

Actual behavior

  1. download_model() calls download_files_from_huggingface() with local_files_only=False
  2. Inside, model_info(hf_source_repo) is called — this is a network API call
  3. With HF_HUB_OFFLINE=1, huggingface_hub raises EnvironmentError: offline mode is enabled
  4. fastembed catches this and logs "Could not download model from HuggingFace... Falling back to other sources."
  5. Falls back to retrieve_model_gcs() → downloads ~83MB from storage.googleapis.com
  6. In air-gapped environments, GCS is also unreachable → complete failure

Note: on current main (v0.7.x), there is a local_files_only=True first pass before the retry loop. However, if that pass fails for any reason (e.g. missing metadata file), the retry loop still hits the network path described above.

A minimal reproducible example

from fastembed import TextEmbedding
import os

# Step 1: Download the model (populates HF cache)
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")

# Step 2: Enable offline mode
os.environ["HF_HUB_OFFLINE"] = "1"

# Step 3: Try to load the same model — triggers GCS download instead of using local cache
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")

What Python version are you on?

Python 3.12 (pip)

FastEmbed version

  • 0.6.0 (pinned in container image, confirmed affected)
  • 0.7.4 (current main, confirmed affected)

What os are you seeing the problem on?

Linux (Red Hat UBI 9, running in OpenShift containers)

Relevant stack traces and/or logs

2026-03-16 04:47:03.565 | ERROR | fastembed.common.model_management:download_model:429 - Could not download model from HuggingFace: Cannot reach https://artifactory.example.com/api/models/qdrant/all-MiniLM-L6-v2-onnx: offline mode is enabled. To disable it, please unset the HF_HUB_OFFLINE environment variable. Falling back to other sources.
  0%|          | 0.00/83.2M [00:00<?, ?iB/s]  5%|▌         | 4.33M/83.2M [00:00<00:01, 43.2MiB/s] ...
100%|██████████| 83.2M/83.2M [00:00<00:00, 106MiB/s]

Fix

PR #614

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions