FIX BEFT merge on bias-less layers; RandLoRA save_projection default#3227
FIX BEFT merge on bias-less layers; RandLoRA save_projection default#3227kashif wants to merge 3 commits into
Conversation
BEFT: add a zero bias to base layers that have no bias at adapter init time so that merge()/unmerge() work correctly on models like Qwen3.5 whose linear projections have no bias term (attention_bias=False). Previously this raised a ValueError during vLLM weight sync. RandLoRA: change save_projection default from True to False. The random basis (randlora_A / randlora_B) is fully deterministic given projection_prng_key and can always be regenerated on load, so saving it by default inflated checkpoints by ~42 GB on 4B-parameter models. Users who need portability across PyTorch RNG versions can opt in with save_projection=True.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Track whether BEFT added a zero bias to a bias-less base layer (_beft_added_bias flag). On unload (merge=False), remove it so num_params is unchanged from the original model. Update the two test_beft_initialization/merge tests to reflect that merge now works on bias-less layers instead of raising. Co-Authored-By: Kashif Rasul <kashif@huggingface.co>
|
@kashif When we worked on this PR, we explicitly decided not to add a bias when there was none beforehand. Generally in PEFT, after |
…ations Revert the approach of adding a zero bias to bias-less base layers at adapter init time. BEFT's forward() adds beft_bias directly to the output and does not need base_layer.bias to exist; only merge() does. Adding a bias at init changed the model architecture after unload, breaking test_unload_adapter for bias-less models (Llama, Gemma3). Instead, warn at init when the base layer has no bias (merge will not work), and let merge() raise a clear ValueError. unload() is then correct with no special handling needed. Co-Authored-By: Kashif Rasul <kashif@huggingface.co>
42 GB on a 4B model sounds like a bug honestly. Do you have a reproducer? |
BEFT: add a zero bias to base layers that have no bias at adapter init time so that merge()/unmerge() work correctly on models like Qwen3.5 whose linear projections have no bias term (attention_bias=False). Previously, this raised a ValueError during vLLM weight sync.
RandLoRA: change save_projection default from True to False. The random basis (randlora_A / randlora_B) is fully deterministic given projection_prng_key and can always be regenerated on load, so saving it by default inflated checkpoints by ~42 GB on 4B-parameter models. Users who need portability across PyTorch RNG versions can opt in with
save_projection=True.