Skip to content

deps: upgrade to transformers v5 (load HF targets in native dtype)#844

Open
danbraunai-goodfire wants to merge 2 commits into
mainfrom
feature/transformers-v5
Open

deps: upgrade to transformers v5 (load HF targets in native dtype)#844
danbraunai-goodfire wants to merge 2 commits into
mainfrom
feature/transformers-v5

Conversation

@danbraunai-goodfire

Copy link
Copy Markdown
Collaborator

Description

Pin transformers>=5,<6. This is the alternative to #559 discussed in Slack: rather than
add a manual dtype field targeting old transformers, just move to v5, whose
from_pretrained defaults dtype="auto" and loads HF checkpoints in their native
precision
instead of upcasting to fp32. For a bf16 model like Llama-3.1-8B that halves
the weight footprint (~16 GB/GPU), with no code change at the call site.

Verified empirically: a bf16 checkpoint reloads as bf16; fp32 checkpoints (gpt2) are
byte-identical to before.

Issues found while upgrading

transformers v5 pulls huggingface_hub 1.x (0.36 → 1.19), which has two breaking
changes the repo hit. Both are fixed here:

  1. huggingface_hub moved requestshttpx and dropped configure_http_backend.
    That import is at module load in infra/hf_http.py (the fix(lm): retry huggingface_hub HTTP calls to survive transient Hub timeouts #557 Hub-retry shim), so the
    whole experiments.lm.data chain failed to import. Rewrote the shim onto the new
    set_client_factory API with a small retrying httpx transport — same policy as
    before (retry connect/read timeouts, network errors, 429/5xx with jittered backoff;
    idempotent methods only). Confirmed metadata calls (HfApi.repo_info, the call
    datasets makes at startup) still bypass hub's built-in http_backoff, so the shim
    is still needed.

  2. PreTrainedTokenizerBase.decode is now typed str | list[str] (was str). Six
    basedpyright errors in app/backend/app_tokenizer.py. Routed all decode calls through
    a _decode helper that asserts str (a flat list of ids always decodes to one str).

Other usages checked and OK: transformers.pytorch_utils.Conv1D (core decomposition) is
unchanged; AutoTokenizer / model-class imports are unchanged; the pretrain Llama→custom
conversion uses canonical module names (q_proj, embed_tokens, …) stable across v4→v5
(not exercised here — needs real Llama weights).

How Has This Been Tested?

  • Full suite: 421 passed, 4 skipped under v5.
  • Slow gpt2 end-to-end decomposition (--runslow): passed.
  • basedpyright + ruff check/format: clean.
  • Native-dtype load confirmed empirically (bf16 preserved, fp32 byte-identical).

Does this PR introduce a breaking change?

Behavioral: bf16/fp16-checkpoint targets now load at native precision instead of fp32
(same point as #559 — autocast already made matmuls bf16, so the frozen-target reference
shifts only slightly). fp32 checkpoints are unchanged.

Supersedes #559.

🤖 Generated with Claude Code

Pin transformers>=5,<6. v5's from_pretrained defaults dtype="auto", so HF
checkpoints load in their native precision instead of being upcast to fp32
(bf16 Llama-3.1-8B halves its weight footprint). This supersedes the manual
dtype field proposed in PR #559.

v5 pulls huggingface_hub 1.x, which has two breaking API changes the repo hit:

- hub moved requests->httpx and dropped configure_http_backend. Rewrote the
  #557 retry shim (infra/hf_http.py) onto the new set_client_factory API with
  a custom retrying httpx transport (same timeout/network/5xx/429 policy,
  idempotent methods only).
- PreTrainedTokenizerBase.decode is now typed str | list[str]. Routed
  AppTokenizer's decode calls through a _decode helper that asserts str.

Verified: full test suite (421 passed), slow gpt2 end-to-end decomposition,
basedpyright + ruff clean, native-dtype load confirmed empirically (bf16 stays
bf16, fp32 byte-identical).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@danbraunai-goodfire

Copy link
Copy Markdown
Collaborator Author

@claude can you review this PR?

@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @danbraunai-goodfire's task —— View job


PR Review: Transformers v5 Upgrade

Todo List:

  • Set up environment and understand repo structure
  • Review dependency changes in pyproject.toml
  • Analyze huggingface_hub breaking change fix in hf_http.py
  • Review tokenizer typing fix in app_tokenizer.py
  • Check for potential issues or improvements
  • Provide comprehensive review feedback

Summary

This is a well-executed upgrade that addresses a real pain point (memory usage with bf16 models) and properly handles the breaking changes introduced by transformers v5. The implementation is solid with good attention to detail.

✅ Strengths

  1. Clear Motivation: The dtype auto-detection saving ~8GB for bf16 models like Llama-3.1-8B is compelling
  2. Comprehensive Breaking Change Handling: Both huggingface_hub API changes are properly addressed
  3. Backward Compatibility: fp32 checkpoints remain byte-identical
  4. Good Documentation: Code comments explain the rationale and technical details
  5. Test Coverage: Existing tests pass, including slow end-to-end decomposition

🔍 Detailed Analysis

Dependency Changes (pyproject.toml:13)

  • Good: Pinning transformers>=5,<6 with explanatory comment
  • Good: The constraint is appropriately strict to avoid future v6 breaking changes

HTTP Client Rewrite (hf_http.py)

The migration from requests to httpx is well-handled:

  • Excellent: Preserves the exact same retry policy (timeouts, status codes, backoff)
  • Smart: Uses the new set_client_factory API correctly
  • Robust: Maintains hf_request_event_hook for compatibility
  • Safe: Only retries idempotent methods (GET, HEAD, OPTIONS)

Tokenizer Typing Fix (app_tokenizer.py:65-70)

  • Correct: The _decode helper properly handles the new str | list[str] return type
  • Defensive: Assert ensures the expected str type for token ID lists
  • Minimal: Changes are isolated to a single helper method

🚨 Potential Issues

1. Missing Import Error Handling

The old configure_http_backend import would fail at module load time. Consider adding a try/except for graceful degradation if the new API isn't available:

try:
    from huggingface_hub.utils import set_client_factory
    from huggingface_hub.utils._http import hf_request_event_hook
except ImportError:
    # Fallback for older huggingface_hub versions
    logger.warning("huggingface_hub retry setup failed - using default client")
    _configured = True  # Skip configuration
    return

2. Runtime Type Assertion Risk

In app_tokenizer.py:68-69, the assertion could theoretically fail if transformers changes behavior unexpectedly. Consider logging the actual type for debugging:

text = self._tok.decode(token_ids, skip_special_tokens=False)
if not isinstance(text, str):
    logger.error("Unexpected decode return type: %s", type(text))
assert isinstance(text, str), f"Expected str, got {type(text)}"

3. Memory Usage Verification

While the PR mentions empirical verification, consider adding a simple integration test that loads a bf16 model and verifies it remains in bf16 (not critical for this PR but useful for regression prevention).

📝 Minor Suggestions

  1. Line 74-80: The client_factory function could benefit from a brief docstring explaining why it mirrors huggingface_hub's defaults
  2. Line 62: Consider using random.uniform(0, delay) instead of delay * random.random() for clarity
  3. Test coverage: The existing test suite validates functionality, but consider adding a specific test for the new _decode helper

🎯 Recommendation

Approve with minor suggestions. This is a high-quality upgrade that solves a real problem (memory efficiency) while properly handling breaking changes. The implementation follows the codebase's fail-fast philosophy and maintains good separation of concerns.

The suggested improvements are defensive programming enhancements rather than blocking issues. The core functionality is sound and ready for merge.

@danbraunai-goodfire

Copy link
Copy Markdown
Collaborator Author

@claude can you fix the issues brought up in your last review?

@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @danbraunai-goodfire's task —— View job


I've implemented the defensive programming improvements from my review:

Todo List:

  • Activate environment and review current implementation
  • Fix import error handling in hf_http.py for graceful degradation
  • Improve type assertion in app_tokenizer.py with better error logging
  • Test changes and verify functionality
  • Commit and push fixes

Summary of Changes:

1. Enhanced import error handling in hf_http.py:

  • Added try/except around huggingface_hub v1+ API imports
  • Graceful fallback with warning message for older versions
  • configure_hf_http_retries() now safely skips configuration when new API unavailable

2. Improved type assertion in app_tokenizer.py:

  • Added detailed error logging before assertion failure
  • Logs the actual type received, token count, and sample input for debugging
  • More informative assertion message with actual type name

3. Minor improvement:

  • Used random.uniform(0, delay) for clearer jitter calculation in retry logic

These changes add defensive programming without affecting the core functionality. The code will now gracefully handle version mismatches and provide better debugging information if unexpected type issues occur.


Job run | Updated branch

- Add graceful fallback for missing huggingface_hub v1+ API in hf_http.py
- Improve type assertion logging in app_tokenizer.py for better debugging
- Use random.uniform() for clearer jitter calculation
- These changes address review feedback for better error handling

Co-authored-by: Dan Braun <danbraunai-goodfire@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant