Add UniLoRA tuner to PEFT#2968
Conversation
githubnemo
left a comment
There was a problem hiding this comment.
Hey @KaiyangLi1992, thanks for the PR :)
- Most general: let's rename
UniLoRA*toUniLorawhich makes it easier to remember how the method is typed in code (to be consistent withLoraModeland friends). - I noticed that the copyright notice says 2024, let's use the correct starting year in all newly introduced files: 2025.
- Before pushing changes it is always good to run
make styleto correct any style issues automatically, otherwise the CI will not be happy
It is good to see that you've already added some tests. Let's extend those by adding UniLoRA to the TEST_CASES list in tests/test_custom_models.py - this will already give quite a bit of coverage. You can check the results by running:
pytest tests/test_custom_models.py
If those tests run we can extend the tests to test_decoder_models.py and test_encoder_decoder_models.py in a similar fashion.
For this to be mergable we also need documentation in docs/source/package_reference/unilora.md (added to the _toctree.md).
| @@ -0,0 +1,23 @@ | |||
| # Copyright 2024-present the HuggingFace Inc. team. | |||
There was a problem hiding this comment.
| # Copyright 2024-present the HuggingFace Inc. team. | |
| # Copyright 2025-present the HuggingFace Inc. team. |
| "help": ( | ||
| "Names or patterns of modules to apply UniLoRA to. Accepts a list of " | ||
| "module name suffixes, a regex string, or the special value " | ||
| "'all-linear' to match all Linear/Conv1D layers except the output layer." | ||
| ) |
There was a problem hiding this comment.
You can just copy the documentation from the docstring above for the help string of the config values.
| def get_nb_savable_parameters(self, adapter="default") -> tuple[int, int]: | ||
| """ | ||
| Returns the number of savable Uni-LoRA parameters and other savable parameters. | ||
| """ | ||
| theta_d_params = 0 | ||
| other_params = 0 | ||
| for name, param in self.named_parameters(): | ||
| if "unilora_theta_d" in name: | ||
| theta_d_params += param.numel() | ||
| elif "unilora_indices" in name: | ||
| other_params += param.numel() | ||
| elif "unilora_scales" in name: | ||
| other_params += param.numel() | ||
|
|
||
| unilora_params = theta_d_params | ||
| return unilora_params, other_params | ||
|
|
||
| def print_savable_parameters(self) -> None: | ||
| """ | ||
| Prints the number of savable Uni-LoRA parameters and total savable parameters. | ||
| """ | ||
| unilora_params, other_params = self.get_nb_savable_parameters() | ||
| print( | ||
| f"Uni-LoRA params to-be-saved (float32-equivalent): {unilora_params:,d} " | ||
| f"|| total params to-be-saved: {(unilora_params + other_params):,d}" | ||
| ) No newline at end of file |
There was a problem hiding this comment.
Do these two functions ( get_nb_savable_parameters, print_savable_parameters) have a particular use or are they for debugging? If it is the latter, let's remove them.
There was a problem hiding this comment.
This is still open (get_nb... is removed but print_saveable_parameters still uses it).
| for module, (scale_a, scale_b) in zip(uni_modules, zip(*[iter(norm_factors)] * 2)): | ||
| module.update_norm(adapter_name, scale_a, scale_b) |
There was a problem hiding this comment.
doesn't this mean that scale_a == scale_b? Let's simplify the zip() statement then and only pass scale for both scale params. If the user wants to experiment with this setting they can set the scale attribute on the layers manually.
There was a problem hiding this comment.
Thanks for the suggestion! For this part specifically, scale_a and scale_b are not necessarily equal — the current design allows them to differ. So we can’t simplify it to a single scale here.
All other suggestions are accepted, thanks!
There was a problem hiding this comment.
Sorry, I don't understand. How can scale_a and scale_b differ? They're duplicated from the [iter(norm_factors)]*2 statement.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Thanks for the helpful suggestion! I have updated the relevant code accordingly. I ran: All UniLoRA-related tests passed successfully. Please let me know if further adjustments are needed! |
|
Can some reviewers review this merge request? It has been in pending status for 3 weeks. |
githubnemo
left a comment
There was a problem hiding this comment.
Hey @KaiyangLi1992,
thanks for nicely integrating the requested changes. I've flagged some comments where it seemed that they've passed under the radar and added some new ones.
Please make sure that the branch is up-to-date with main and to run make style to run the code formatter.
Once all changes are integrated it would make sense to add an experiment for Uni-LoRA in the MetaMathQA test suite, maybe using method_comparison/MetaMathQA/experiments/lora/* as a starting point. Let's also add a quick example on how to use UniLoRA, e.g. using the examples/miss_finetuning as an example.
| @@ -0,0 +1,153 @@ | |||
| # Copyright 2024-present the HuggingFace Inc. team. | |||
There was a problem hiding this comment.
| # Copyright 2024-present the HuggingFace Inc. team. | |
| # Copyright 2025-present the HuggingFace Inc. team. |
| @@ -0,0 +1,293 @@ | |||
| # Copyright 2024-present the HuggingFace Inc. team. | |||
There was a problem hiding this comment.
| # Copyright 2024-present the HuggingFace Inc. team. | |
| # Copyright 2025-present the HuggingFace Inc. team. |
| # List all names of layers that may contain adapter weights | ||
| # unilora_theta_d is a shared parameter. | ||
| #But it is referenced within individual layers. | ||
| adapter_layer_names = ("unilora_theta_d",) |
There was a problem hiding this comment.
This is still open as far as I can see
| base_layer = self.get_base_layer() | ||
| if isinstance(base_layer, nn.Linear): | ||
| in_features, out_features = base_layer.in_features, base_layer.out_features | ||
| elif isinstance(base_layer, Conv1D): | ||
| in_features, out_features = ( | ||
| base_layer.weight.ds_shape if hasattr(base_layer.weight, "ds_shape") else base_layer.weight.shape | ||
| ) | ||
|
|
||
| self.in_features = in_features | ||
| self.out_features = out_features |
| """ | ||
| Updates the scaling factors. |
There was a problem hiding this comment.
The docstring currently has no explanatory value. Let's remove or improve it.
| import numpy as np | ||
| total_length = lora_param_count | ||
| num_unique = theta_d_length | ||
| base_count = total_length // num_unique | ||
| remaining = total_length % num_unique | ||
| rng = np.random.default_rng(proj_seed) | ||
| data = np.repeat(np.arange(num_unique), base_count) | ||
| if remaining > 0: | ||
| extras = rng.choice(num_unique, size=remaining, replace=False) | ||
| data = np.concatenate([data, extras]) | ||
| rng.shuffle(data) | ||
| return torch.tensor(data) |
There was a problem hiding this comment.
I wonder, isn't generate_index basically np.random.choice(np.arange(theta_d_length), size=lora_param_count)?
I can see that results might differ for low lora_param_count values (<10k) but asymptotically it should be equal?
If we decide not to use choice(), let's document why we sample the values this way.
| for module, (scale_a, scale_b) in zip(uni_modules, zip(*[iter(norm_factors)] * 2)): | ||
| module.update_norm(adapter_name, scale_a, scale_b) |
There was a problem hiding this comment.
Sorry, I don't understand. How can scale_a and scale_b differ? They're duplicated from the [iter(norm_factors)]*2 statement.
| def _ensure_device(self, adapter): | ||
| """ | ||
| Ensure all UniLoRA buffers/params for the given adapter are on the same device as base_layer. | ||
| This is lazy-migration (only happens if device mismatch is detected). | ||
| """ | ||
| # get target device from base_layer | ||
| device = next(self.base_layer.parameters()).device | ||
|
|
||
| # ---- indices ---- | ||
| if adapter in self.unilora_indices_A: | ||
| t = self.unilora_indices_A[adapter] | ||
| if t.device != device: | ||
| self.unilora_indices_A[adapter] = t.to(device) | ||
|
|
||
| if adapter in self.unilora_indices_B: | ||
| t = self.unilora_indices_B[adapter] | ||
| if t.device != device: | ||
| self.unilora_indices_B[adapter] = t.to(device) | ||
|
|
||
| # ---- scales ---- | ||
| if adapter in self.unilora_scales_A: | ||
| t = self.unilora_scales_A[adapter] | ||
| if t.device != device: | ||
| self.unilora_scales_A[adapter] = t.to(device) | ||
|
|
||
| if adapter in self.unilora_scales_B: | ||
| t = self.unilora_scales_B[adapter] | ||
| if t.device != device: | ||
| self.unilora_scales_B[adapter] = t.to(device) | ||
|
|
||
| # ---- theta_d ---- (ParameterDict, but ensure consistency) | ||
| if adapter in self.unilora_theta_d: | ||
| p = self.unilora_theta_d[adapter] | ||
| if p.device != device: | ||
| # Parameter migration: need .data to avoid creating new graph edges | ||
| self.unilora_theta_d[adapter].data = p.data.to(device) |
There was a problem hiding this comment.
I think _ensure_device can be replaced with self._move_adapter_to_device_of_base_layer(adapter) if all params are registered in adapter_layer_names or other_param_names in the layer class.
| def _init_unilora_theta_d(self, config: UniLoraConfig, adapter_name: str) -> None: | ||
| unilora_theta_d = torch.zeros(config.theta_d_length) | ||
| torch.nn.init.uniform_(unilora_theta_d, -config.init_theta_d_bound, config.init_theta_d_bound) | ||
| self.unilora_theta_d[adapter_name] = unilora_theta_d |
There was a problem hiding this comment.
_init_unilora_theta_d should respect the config.init_weights flag and leave the weights random if init_weights is False.
| elif config.peft_type == PeftType.UNILORA: | ||
| to_return = {} | ||
| to_return["base_model.unilora_theta_d." + adapter_name] = state_dict["base_model.unilora_theta_d." + adapter_name] | ||
|
|
There was a problem hiding this comment.
Let's give the user the option to save the indices alongside the theta weights. This will allow the user to use a saved adapter even though the index sampling has changed for some reason.
VeRA (a few lines below this) can be used as inspiration.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
gentle ping @KaiyangLi1992 |
|
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
Hi @githubnemo , apologies again for the delay. I now have time to continue this PR. I have updated the branch against the latest main, addressed the remaining review comments, and ran make style / make quality plus the UniLora-related tests. I also added the requested documentation/example/benchmark pieces. Could you please reopen the PR when you have a chance? If you prefer a fresh PR instead, I can open a new one and reference this PR for review history. Thank you! |
|
Hi @KaiyangLi1992, unfortunately I can't reopen this PR, github doesn't let me - it seems you force-pushed or re-created the branch when the PR was already closed. The usual workaround for this is to reset your local branch to the commit that was last used in the PR (879e58d), push and then I can re-open the PR (as described here). This would let us keep the PR's history which would make reviewing easier. Alternatively you can open a new PR with an updated description and link it here. |
Hi @githubnemo, thanks for the suggestion. I opened a fresh PR here: #3257 . The new PR is rebased on the latest Thanks again for the review! |
Motivation
This PR adds UniLoRA, a LoRA-style parameter-efficient fine-tuning method
that introduces a unified parameterization for low-rank adaptations, enabling
further reductions in the number of trainable parameters while preserving
the standard PEFT workflow.
What's included
Checklist