deps: upgrade to transformers v5 (load HF targets in native dtype)#844
deps: upgrade to transformers v5 (load HF targets in native dtype)#844danbraunai-goodfire wants to merge 2 commits into
Conversation
Pin transformers>=5,<6. v5's from_pretrained defaults dtype="auto", so HF checkpoints load in their native precision instead of being upcast to fp32 (bf16 Llama-3.1-8B halves its weight footprint). This supersedes the manual dtype field proposed in PR #559. v5 pulls huggingface_hub 1.x, which has two breaking API changes the repo hit: - hub moved requests->httpx and dropped configure_http_backend. Rewrote the #557 retry shim (infra/hf_http.py) onto the new set_client_factory API with a custom retrying httpx transport (same timeout/network/5xx/429 policy, idempotent methods only). - PreTrainedTokenizerBase.decode is now typed str | list[str]. Routed AppTokenizer's decode calls through a _decode helper that asserts str. Verified: full test suite (421 passed), slow gpt2 end-to-end decomposition, basedpyright + ruff clean, native-dtype load confirmed empirically (bf16 stays bf16, fp32 byte-identical). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@claude can you review this PR? |
|
Claude finished @danbraunai-goodfire's task —— View job PR Review: Transformers v5 UpgradeTodo List:
SummaryThis is a well-executed upgrade that addresses a real pain point (memory usage with bf16 models) and properly handles the breaking changes introduced by transformers v5. The implementation is solid with good attention to detail. ✅ Strengths
🔍 Detailed AnalysisDependency Changes (
|
|
@claude can you fix the issues brought up in your last review? |
|
Claude finished @danbraunai-goodfire's task —— View job I've implemented the defensive programming improvements from my review: Todo List:
Summary of Changes:1. Enhanced import error handling in
|
- Add graceful fallback for missing huggingface_hub v1+ API in hf_http.py - Improve type assertion logging in app_tokenizer.py for better debugging - Use random.uniform() for clearer jitter calculation - These changes address review feedback for better error handling Co-authored-by: Dan Braun <danbraunai-goodfire@users.noreply.github.com>
Description
Pin
transformers>=5,<6. This is the alternative to #559 discussed in Slack: rather thanadd a manual
dtypefield targeting old transformers, just move to v5, whosefrom_pretraineddefaultsdtype="auto"and loads HF checkpoints in their nativeprecision instead of upcasting to fp32. For a bf16 model like Llama-3.1-8B that halves
the weight footprint (~16 GB/GPU), with no code change at the call site.
Verified empirically: a bf16 checkpoint reloads as bf16; fp32 checkpoints (gpt2) are
byte-identical to before.
Issues found while upgrading
transformers v5 pulls huggingface_hub 1.x (0.36 → 1.19), which has two breaking
changes the repo hit. Both are fixed here:
huggingface_hubmovedrequests→httpxand droppedconfigure_http_backend.That import is at module load in
infra/hf_http.py(the fix(lm): retry huggingface_hub HTTP calls to survive transient Hub timeouts #557 Hub-retry shim), so thewhole
experiments.lm.datachain failed to import. Rewrote the shim onto the newset_client_factoryAPI with a small retryinghttpxtransport — same policy asbefore (retry connect/read timeouts, network errors, 429/5xx with jittered backoff;
idempotent methods only). Confirmed metadata calls (
HfApi.repo_info, the calldatasetsmakes at startup) still bypass hub's built-inhttp_backoff, so the shimis still needed.
PreTrainedTokenizerBase.decodeis now typedstr | list[str](wasstr). Sixbasedpyright errors in
app/backend/app_tokenizer.py. Routed all decode calls througha
_decodehelper that assertsstr(a flat list of ids always decodes to one str).Other usages checked and OK:
transformers.pytorch_utils.Conv1D(core decomposition) isunchanged;
AutoTokenizer/ model-class imports are unchanged; the pretrain Llama→customconversion uses canonical module names (
q_proj,embed_tokens, …) stable across v4→v5(not exercised here — needs real Llama weights).
How Has This Been Tested?
--runslow): passed.basedpyright+ruff check/format: clean.Does this PR introduce a breaking change?
Behavioral: bf16/fp16-checkpoint targets now load at native precision instead of fp32
(same point as #559 — autocast already made matmuls bf16, so the frozen-target reference
shifts only slightly). fp32 checkpoints are unchanged.
Supersedes #559.
🤖 Generated with Claude Code