Skip to content

Fail loud on save_pretrained() for unsharded LoRA tensors#3251

Open
akshansh47 wants to merge 1 commit into
huggingface:mainfrom
akshansh47:fix/validate-lora-shapes-on-save
Open

Fail loud on save_pretrained() for unsharded LoRA tensors#3251
akshansh47 wants to merge 1 commit into
huggingface:mainfrom
akshansh47:fix/validate-lora-shapes-on-save

Conversation

@akshansh47

Copy link
Copy Markdown

What

Refuse to write LoRA adapters whose lora_A / lora_B tensors look unsharded (1-D or zero-sized). This is the canonical signature of an export that ran without gathering DeepSpeed ZeRO-3 / FSDP shards: the on-disk artifact looks structurally valid (filenames + adapter_config.json present), but downstream loaders fail with confusing index errors at the first attempted use, e.g. IndexError: too many indices for tensor of dimension 1 in vLLM's slice_lora_b during hot-swap.

The new _validate_lora_adapter_state_dict helper surfaces the failure at save_pretrained() time with an actionable hint pointing at deepspeed.zero.GatheredParameters / FullyShardedDataParallel.summon_full_params, instead of corrupting the artifact and deferring the crash.

Why

This failure mode is the most common cause of "my LoRA loads fine in HF/Transformers but breaks in vLLM" reports. It's currently silent. Examples:

Cost of the current behavior is hours of debug pointed at the wrong layer (usually vLLM, sometimes the model). The validator turns it into a 30-second fix.

Scope

The validator only inspects .lora_A / .lora_B tensors. Legitimately 1-D parameters such as DoRA's lora_magnitude_vector and AdaLoRA's lora_E are not affected. Non-LoRA adapter types (BoFT, OFT, P-tuning, prefix tuning, prompt tuning, etc.) are not touched.

Tests

tests/test_initialization.py::TestSaveValidatesLoraShapes — 7 cases:

  • happy path (well-formed state dict + DoRA magnitude + non-LoRA bias)
  • 1-D lora_A raises (parameterized: shape (0,) and shape (rank,))
  • 0-sized lora_B raises
  • error message includes adapter name
  • end-to-end via save_pretrained(state_dict=...) raises
  • end-to-end happy path still writes adapter_config.json

Adjacent test classes (TestLoraInitialization, TestNoInfiniteRecursionDeepspeed) verified passing locally: 121/121 + 7/7 new = 128/128. make quality clean.

Backwards compatibility

Pure addition. No public API changes; the helper is _-prefixed and only invoked from inside save_pretrained. The only behavior change for existing code is that what was previously silent corruption now raises with an actionable message — which is the intent of the patch.

Made with Cursor

Refuse to write LoRA adapters whose lora_A / lora_B tensors look
unsharded (1-D or zero-sized). This is the canonical signature of an
export that ran without gathering DeepSpeed ZeRO-3 / FSDP shards: the
on-disk artifact looks structurally valid (filenames +
adapter_config.json present), but downstream loaders fail with confusing
index errors at first use (e.g. vLLM hot-swap's slice_lora_b in
vllm-project/vllm#28640, transformers ZeRO-3 load report failures in
huggingface/transformers#45313).

The new _validate_lora_adapter_state_dict helper surfaces the failure at
write time with an actionable hint pointing at
deepspeed.zero.GatheredParameters / FullyShardedDataParallel
.summon_full_params, instead of corrupting the artifact and deferring
the crash.

Scope: only .lora_A / .lora_B keys are inspected. Legitimately 1-D
parameters (DoRA's lora_magnitude_vector, AdaLoRA's lora_E) and non-LoRA
adapter types are unaffected.

Tests in tests/test_initialization.py::TestSaveValidatesLoraShapes
cover happy path, 1-D shards, 0-sized tensors, error message content,
and the end-to-end save_pretrained() integration. Adjacent test classes
(TestLoraInitialization, TestNoInfiniteRecursionDeepspeed) verified
passing locally - no regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant