Fail loud on save_pretrained() for unsharded LoRA tensors#3251
Open
akshansh47 wants to merge 1 commit into
Open
Fail loud on save_pretrained() for unsharded LoRA tensors#3251akshansh47 wants to merge 1 commit into
akshansh47 wants to merge 1 commit into
Conversation
Refuse to write LoRA adapters whose lora_A / lora_B tensors look unsharded (1-D or zero-sized). This is the canonical signature of an export that ran without gathering DeepSpeed ZeRO-3 / FSDP shards: the on-disk artifact looks structurally valid (filenames + adapter_config.json present), but downstream loaders fail with confusing index errors at first use (e.g. vLLM hot-swap's slice_lora_b in vllm-project/vllm#28640, transformers ZeRO-3 load report failures in huggingface/transformers#45313). The new _validate_lora_adapter_state_dict helper surfaces the failure at write time with an actionable hint pointing at deepspeed.zero.GatheredParameters / FullyShardedDataParallel .summon_full_params, instead of corrupting the artifact and deferring the crash. Scope: only .lora_A / .lora_B keys are inspected. Legitimately 1-D parameters (DoRA's lora_magnitude_vector, AdaLoRA's lora_E) and non-LoRA adapter types are unaffected. Tests in tests/test_initialization.py::TestSaveValidatesLoraShapes cover happy path, 1-D shards, 0-sized tensors, error message content, and the end-to-end save_pretrained() integration. Adjacent test classes (TestLoraInitialization, TestNoInfiniteRecursionDeepspeed) verified passing locally - no regressions. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Refuse to write LoRA adapters whose
lora_A/lora_Btensors look unsharded (1-D or zero-sized). This is the canonical signature of an export that ran without gathering DeepSpeed ZeRO-3 / FSDP shards: the on-disk artifact looks structurally valid (filenames +adapter_config.jsonpresent), but downstream loaders fail with confusing index errors at the first attempted use, e.g.IndexError: too many indices for tensor of dimension 1in vLLM'sslice_lora_bduring hot-swap.The new
_validate_lora_adapter_state_dicthelper surfaces the failure atsave_pretrained()time with an actionable hint pointing atdeepspeed.zero.GatheredParameters/FullyShardedDataParallel.summon_full_params, instead of corrupting the artifact and deferring the crash.Why
This failure mode is the most common cause of "my LoRA loads fine in HF/Transformers but breaks in vLLM" reports. It's currently silent. Examples:
AssertionErrorinlora_shrink_opCost of the current behavior is hours of debug pointed at the wrong layer (usually vLLM, sometimes the model). The validator turns it into a 30-second fix.
Scope
The validator only inspects
.lora_A/.lora_Btensors. Legitimately 1-D parameters such as DoRA'slora_magnitude_vectorand AdaLoRA'slora_Eare not affected. Non-LoRA adapter types (BoFT, OFT, P-tuning, prefix tuning, prompt tuning, etc.) are not touched.Tests
tests/test_initialization.py::TestSaveValidatesLoraShapes— 7 cases:lora_Araises (parameterized: shape(0,)and shape(rank,))lora_Braisessave_pretrained(state_dict=...)raisesadapter_config.jsonAdjacent test classes (
TestLoraInitialization,TestNoInfiniteRecursionDeepspeed) verified passing locally: 121/121 + 7/7 new = 128/128.make qualityclean.Backwards compatibility
Pure addition. No public API changes; the helper is
_-prefixed and only invoked from insidesave_pretrained. The only behavior change for existing code is that what was previously silent corruption now raises with an actionable message — which is the intent of the patch.Made with Cursor