Skip to content

FIX BEFT merge on bias-less layers; RandLoRA save_projection default#3227

Open
kashif wants to merge 3 commits into
huggingface:mainfrom
kashif:fix-beft-randlora
Open

FIX BEFT merge on bias-less layers; RandLoRA save_projection default#3227
kashif wants to merge 3 commits into
huggingface:mainfrom
kashif:fix-beft-randlora

Conversation

@kashif

@kashif kashif commented May 12, 2026

Copy link
Copy Markdown
Contributor

BEFT: add a zero bias to base layers that have no bias at adapter init time so that merge()/unmerge() work correctly on models like Qwen3.5 whose linear projections have no bias term (attention_bias=False). Previously, this raised a ValueError during vLLM weight sync.

RandLoRA: change save_projection default from True to False. The random basis (randlora_A / randlora_B) is fully deterministic given projection_prng_key and can always be regenerated on load, so saving it by default inflated checkpoints by ~42 GB on 4B-parameter models. Users who need portability across PyTorch RNG versions can opt in with save_projection=True.

BEFT: add a zero bias to base layers that have no bias at adapter init
time so that merge()/unmerge() work correctly on models like Qwen3.5
whose linear projections have no bias term (attention_bias=False).
Previously this raised a ValueError during vLLM weight sync.

RandLoRA: change save_projection default from True to False. The random
basis (randlora_A / randlora_B) is fully deterministic given
projection_prng_key and can always be regenerated on load, so saving it
by default inflated checkpoints by ~42 GB on 4B-parameter models. Users
who need portability across PyTorch RNG versions can opt in with
save_projection=True.
@kashif kashif requested a review from BenjaminBossan May 12, 2026 09:30
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Track whether BEFT added a zero bias to a bias-less base layer
(_beft_added_bias flag). On unload (merge=False), remove it so
num_params is unchanged from the original model. Update the two
test_beft_initialization/merge tests to reflect that merge now
works on bias-less layers instead of raising.

Co-Authored-By: Kashif Rasul <kashif@huggingface.co>
@BenjaminBossan

Copy link
Copy Markdown
Member

@kashif When we worked on this PR, we explicitly decided not to add a bias when there was none beforehand. Generally in PEFT, after merge_and_unload, we want the model to have exactly the same architecture as the base model. With this change, it would have a different architecture, right?

…ations

Revert the approach of adding a zero bias to bias-less base layers at
adapter init time. BEFT's forward() adds beft_bias directly to the
output and does not need base_layer.bias to exist; only merge() does.
Adding a bias at init changed the model architecture after unload,
breaking test_unload_adapter for bias-less models (Llama, Gemma3).

Instead, warn at init when the base layer has no bias (merge will not
work), and let merge() raise a clear ValueError. unload() is then
correct with no special handling needed.

Co-Authored-By: Kashif Rasul <kashif@huggingface.co>
@BenjaminBossan

Copy link
Copy Markdown
Member

RandLoRA: change save_projection default from True to False. The random basis (randlora_A / randlora_B) is fully deterministic given projection_prng_key and can always be regenerated on load, so saving it by default inflated checkpoints by ~42 GB on 4B-parameter models.

42 GB on a 4B model sounds like a bug honestly. Do you have a reproducer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants