deps: upgrade to transformers v5 (load HF targets in native dtype) by danbraunai-goodfire · Pull Request #844 · goodfire-ai/param-decomp

danbraunai-goodfire · 2026-06-16T16:37:51Z

Description

Pin transformers>=5,<6. This is the alternative to #559 discussed in Slack: rather than
add a manual dtype field targeting old transformers, just move to v5, whose
from_pretrained defaults dtype="auto" and loads HF checkpoints in their native
precision instead of upcasting to fp32. For a bf16 model like Llama-3.1-8B that halves
the weight footprint (~16 GB/GPU), with no code change at the call site.

Verified empirically: a bf16 checkpoint reloads as bf16; fp32 checkpoints (gpt2) are
byte-identical to before.

Issues found while upgrading

transformers v5 pulls huggingface_hub 1.x (0.36 → 1.19), which has two breaking
changes the repo hit. Both are fixed here:

huggingface_hub moved requests → httpx and dropped configure_http_backend.
That import is at module load in infra/hf_http.py (the fix(lm): retry huggingface_hub HTTP calls to survive transient Hub timeouts #557 Hub-retry shim), so the
whole experiments.lm.data chain failed to import. Rewrote the shim onto the new
set_client_factory API with a small retrying httpx transport — same policy as
before (retry connect/read timeouts, network errors, 429/5xx with jittered backoff;
idempotent methods only). Confirmed metadata calls (HfApi.repo_info, the call
datasets makes at startup) still bypass hub's built-in http_backoff, so the shim
is still needed.
PreTrainedTokenizerBase.decode is now typed str | list[str] (was str). Six
basedpyright errors in app/backend/app_tokenizer.py. Routed all decode calls through
a _decode helper that asserts str (a flat list of ids always decodes to one str).

Other usages checked and OK: transformers.pytorch_utils.Conv1D (core decomposition) is
unchanged; AutoTokenizer / model-class imports are unchanged; the pretrain Llama→custom
conversion uses canonical module names (q_proj, embed_tokens, …) stable across v4→v5
(not exercised here — needs real Llama weights).

How Has This Been Tested?

Full suite: 421 passed, 4 skipped under v5.
Slow gpt2 end-to-end decomposition (--runslow): passed.
basedpyright + ruff check/format: clean.
Native-dtype load confirmed empirically (bf16 preserved, fp32 byte-identical).

Does this PR introduce a breaking change?

Behavioral: bf16/fp16-checkpoint targets now load at native precision instead of fp32
(same point as #559 — autocast already made matmuls bf16, so the frozen-target reference
shifts only slightly). fp32 checkpoints are unchanged.

Supersedes #559.

🤖 Generated with Claude Code

Pin transformers>=5,<6. v5's from_pretrained defaults dtype="auto", so HF checkpoints load in their native precision instead of being upcast to fp32 (bf16 Llama-3.1-8B halves its weight footprint). This supersedes the manual dtype field proposed in PR #559. v5 pulls huggingface_hub 1.x, which has two breaking API changes the repo hit: - hub moved requests->httpx and dropped configure_http_backend. Rewrote the #557 retry shim (infra/hf_http.py) onto the new set_client_factory API with a custom retrying httpx transport (same timeout/network/5xx/429 policy, idempotent methods only). - PreTrainedTokenizerBase.decode is now typed str | list[str]. Routed AppTokenizer's decode calls through a _decode helper that asserts str. Verified: full test suite (421 passed), slow gpt2 end-to-end decomposition, basedpyright + ruff clean, native-dtype load confirmed empirically (bf16 stays bf16, fp32 byte-identical). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

danbraunai-goodfire · 2026-06-17T14:26:15Z

@claude can you review this PR?

claude · 2026-06-17T14:26:35Z

Claude finished @danbraunai-goodfire's task —— View job

PR Review: Transformers v5 Upgrade

Todo List:

Set up environment and understand repo structure
Review dependency changes in pyproject.toml
Analyze huggingface_hub breaking change fix in hf_http.py
Review tokenizer typing fix in app_tokenizer.py
Check for potential issues or improvements
Provide comprehensive review feedback

Summary

This is a well-executed upgrade that addresses a real pain point (memory usage with bf16 models) and properly handles the breaking changes introduced by transformers v5. The implementation is solid with good attention to detail.

✅ Strengths

Clear Motivation: The dtype auto-detection saving ~8GB for bf16 models like Llama-3.1-8B is compelling
Comprehensive Breaking Change Handling: Both huggingface_hub API changes are properly addressed
Backward Compatibility: fp32 checkpoints remain byte-identical
Good Documentation: Code comments explain the rationale and technical details
Test Coverage: Existing tests pass, including slow end-to-end decomposition

🔍 Detailed Analysis

Dependency Changes (`pyproject.toml:13`)

Good: Pinning transformers>=5,<6 with explanatory comment
Good: The constraint is appropriately strict to avoid future v6 breaking changes

HTTP Client Rewrite (`hf_http.py`)

The migration from requests to httpx is well-handled:

Excellent: Preserves the exact same retry policy (timeouts, status codes, backoff)
Smart: Uses the new set_client_factory API correctly
Robust: Maintains hf_request_event_hook for compatibility
Safe: Only retries idempotent methods (GET, HEAD, OPTIONS)

Tokenizer Typing Fix (`app_tokenizer.py:65-70`)

Correct: The _decode helper properly handles the new str | list[str] return type
Defensive: Assert ensures the expected str type for token ID lists
Minimal: Changes are isolated to a single helper method

🚨 Potential Issues

1. Missing Import Error Handling

The old configure_http_backend import would fail at module load time. Consider adding a try/except for graceful degradation if the new API isn't available:

try:
    from huggingface_hub.utils import set_client_factory
    from huggingface_hub.utils._http import hf_request_event_hook
except ImportError:
    # Fallback for older huggingface_hub versions
    logger.warning("huggingface_hub retry setup failed - using default client")
    _configured = True  # Skip configuration
    return

2. Runtime Type Assertion Risk

In app_tokenizer.py:68-69, the assertion could theoretically fail if transformers changes behavior unexpectedly. Consider logging the actual type for debugging:

text = self._tok.decode(token_ids, skip_special_tokens=False)
if not isinstance(text, str):
    logger.error("Unexpected decode return type: %s", type(text))
assert isinstance(text, str), f"Expected str, got {type(text)}"

3. Memory Usage Verification

While the PR mentions empirical verification, consider adding a simple integration test that loads a bf16 model and verifies it remains in bf16 (not critical for this PR but useful for regression prevention).

📝 Minor Suggestions

Line 74-80: The client_factory function could benefit from a brief docstring explaining why it mirrors huggingface_hub's defaults
Line 62: Consider using random.uniform(0, delay) instead of delay * random.random() for clarity
Test coverage: The existing test suite validates functionality, but consider adding a specific test for the new _decode helper

🎯 Recommendation

Approve with minor suggestions. This is a high-quality upgrade that solves a real problem (memory efficiency) while properly handling breaking changes. The implementation follows the codebase's fail-fast philosophy and maintains good separation of concerns.

The suggested improvements are defensive programming enhancements rather than blocking issues. The core functionality is sound and ready for merge.

danbraunai-goodfire · 2026-06-17T16:21:01Z

@claude can you fix the issues brought up in your last review?

claude · 2026-06-17T16:21:22Z

Claude finished @danbraunai-goodfire's task —— View job

I've implemented the defensive programming improvements from my review:

Todo List:

Activate environment and review current implementation
Fix import error handling in hf_http.py for graceful degradation
Improve type assertion in app_tokenizer.py with better error logging
Test changes and verify functionality
Commit and push fixes

Summary of Changes:

1. Enhanced import error handling in `hf_http.py`:

Added try/except around huggingface_hub v1+ API imports
Graceful fallback with warning message for older versions
configure_hf_http_retries() now safely skips configuration when new API unavailable

2. Improved type assertion in `app_tokenizer.py`:

Added detailed error logging before assertion failure
Logs the actual type received, token count, and sample input for debugging
More informative assertion message with actual type name

3. Minor improvement:

Used random.uniform(0, delay) for clearer jitter calculation in retry logic

These changes add defensive programming without affecting the core functionality. The code will now gracefully handle version mismatches and provide better debugging information if unexpected type issues occur.

Job run | Updated branch

- Add graceful fallback for missing huggingface_hub v1+ API in hf_http.py - Improve type assertion logging in app_tokenizer.py for better debugging - Use random.uniform() for clearer jitter calculation - These changes address review feedback for better error handling Co-authored-by: Dan Braun <danbraunai-goodfire@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: upgrade to transformers v5 (load HF targets in native dtype)#844

deps: upgrade to transformers v5 (load HF targets in native dtype)#844
danbraunai-goodfire wants to merge 2 commits into
mainfrom
feature/transformers-v5

danbraunai-goodfire commented Jun 16, 2026

Uh oh!

danbraunai-goodfire commented Jun 17, 2026

Uh oh!

claude Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

danbraunai-goodfire commented Jun 17, 2026

Uh oh!

claude Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danbraunai-goodfire commented Jun 16, 2026

Description

Issues found while upgrading

How Has This Been Tested?

Does this PR introduce a breaking change?

Uh oh!

danbraunai-goodfire commented Jun 17, 2026

Uh oh!

claude Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Transformers v5 Upgrade

Summary

✅ Strengths

🔍 Detailed Analysis

Dependency Changes (pyproject.toml:13)

HTTP Client Rewrite (hf_http.py)

Tokenizer Typing Fix (app_tokenizer.py:65-70)

🚨 Potential Issues

1. Missing Import Error Handling

2. Runtime Type Assertion Risk

3. Memory Usage Verification

📝 Minor Suggestions

🎯 Recommendation

Uh oh!

danbraunai-goodfire commented Jun 17, 2026

Uh oh!

claude Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List:

Summary of Changes:

1. Enhanced import error handling in hf_http.py:

2. Improved type assertion in app_tokenizer.py:

3. Minor improvement:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 17, 2026 •

edited

Loading

Dependency Changes (`pyproject.toml:13`)

HTTP Client Rewrite (`hf_http.py`)

Tokenizer Typing Fix (`app_tokenizer.py:65-70`)

claude Bot commented Jun 17, 2026 •

edited

Loading

1. Enhanced import error handling in `hf_http.py`:

2. Improved type assertion in `app_tokenizer.py`: