Mica by sr-networks · Pull Request #3260 · huggingface/peft

sr-networks · 2026-05-24T20:20:39Z

Added support for MiCA as proposed in

What does this PR do?

This PR adds MiCA (Minor Component Adaptation) as a LoRA initialization variant, exposed via:

LoraConfig(init_lora_weights="mica")

MiCA initializes LoRA from the minor singular subspace of the base weight matrix. For a target linear layer with weight matrix (W = U \Sigma V^T), MiCA initializes:

B = U[:, -r:]
A = 0

where B contains the r left singular vectors corresponding to the smallest singular values.

Since A is initialized to zero, the adapter contribution is zero at initialization and the base model output is preserved.

Why freeze lora_B?

MiCA treats lora_B as the fixed minor-component subspace and trains only lora_A.

Freezing lora_B preserves the intended MiCA constraint during training. Without this, the adapter would no longer remain constrained to the selected minor subspace and would behave more like an unconstrained LoRA update.

This is why this PR also updates the LoRA variant interface so that variants can customize adapter trainability.

Recommended usage

MiCA is primarily intended for continued pretraining / domain-adaptive pretraining, not for instruction fine-tuning.

The recommended workflow is:

Start from the base model, not the instruct/chat model.
Train the MiCA adapter on continued-pretraining data.
Merge the trained adapter into the model weights.
Use the resulting merged model as the adapted base for subsequent instruction/chat tuning, or merge/apply it before using the corresponding instruct/chat model setup.

This recommendation follows the intended use of MiCA as a method for injecting domain knowledge into pretrained representations while constraining the update to the selected minor-component subspace.

Main changes

Adds "mica" as a valid value for LoraConfig(init_lora_weights=...).
Adds SVD-based MiCA initialization for nn.Linear and nn.Embedding layers.
Initializes B from the minor left singular vectors of the base weight.
Initializes A to zero so the adapter is a no-op at initialization.
Uses BaseTunerLayer.frozen_peft_weight_names to keep MiCA B frozen across get_peft_model, add_adapter, set_adapter, and set_requires_grad.
Keeps standard LoRA forward, merge, and unmerge behavior.
Adds MiCA tests, documentation, and a runnable fine-tuning example.

Tests

This PR adds tests covering:

zero adapter contribution at initialization
use of the minor rather than major singular subspace
B being frozen
adapter switching with MiCA
reusing an adapter name after deleting a MiCA adapter
r > max_r error handling
embedding initialization
custom model behavior with MiCA

Limitations

Supports nn.Linear and nn.Embedding target modules.
Requires r <= min(in_features, out_features) for linear layers.
Requires r <= min(num_embeddings, embedding_dim) for embedding layers.
Performs a full SVD during adapter initialization.

Adds Minor Component Adaptation (https://arxiv.org/abs/2604.01694) as a new init scheme for LoraConfig, triggered by `init_lora_weights="mica"`. Resolves huggingface#3142. MiCA initializes `B = U[:, -r:]` (the r left singular vectors of the base weight associated with the smallest singular values) and `A = 0`. During training only `A` is updated; `B` is frozen. Because `A == 0` at init, the adapter contribution `B @ A` is zero and the forward output is preserved exactly, with no need to mutate the base weight. Implementation: * `LoraConfig.init_lora_weights` accepts `"mica"`. * `LoraLayer.mica_init` performs the SVD-based init for Linear targets and validates `r <= min(in_features, out_features)`. The init is skipped when the adapter parameters are on the meta device (low_cpu_mem_usage path). * `MiCALinearVariant` is a `LoraVariant` that resolves for the MiCA init scheme. Forward and merge semantics are vanilla LoRA; the only override of substance is the new `update_requires_grad` hook. * `LoraVariant.update_requires_grad(module, adapter_name)` is a new entry point on the variant base class. Default is a no-op so existing variants are unaffected. `LoraModel._mark_only_adapters_as_trainable` invokes it for every adapter after the base trainability marking, which is where MiCA freezes `lora_B`. MiCA is currently restricted to `nn.Linear`. Passing `init_lora_weights="mica"` on a non-Linear target raises `ValueError: Unknown initialization` via the existing `reset_lora_parameters` fallback. Tests: * `tests/test_initialization.py` adds 6 MiCA-specific tests covering init correctness, that B is the minor (not major) subspace, B-freeze, train step behavior, save/load round-trip, and the unsupported-layer error. * `tests/test_custom_models.py` adds two parametrized MiCA entries to `TEST_CASES` for broader coverage (save/load, merge/unmerge, autocast). * `tests/testing_common.py` and `tests/test_custom_models.py` relax two assertions that previously required *every* `lora_*` parameter to be trainable / receive gradients, to accommodate variants like MiCA that intentionally freeze a subset. Docs and example: * `docs/source/developer_guides/lora.md` adds a MiCA section. * `examples/mica_finetuning/` provides a runnable example and README. * `method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank32-mica/` registers a benchmark config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

BenjaminBossan

Thanks for this PR to add MiCA. For this PR, I have focused on the implementation, I haven't checked the example and documentation yet.

I have a couple of smaller comments, but there is also a larger issue, which is that the way that requires_grad is set is not sufficient yet. Right now, it only covers the get_peft_model path, but that's not the only one that can modify requires_grad. Take this example:

from pprint import pprint
import torch
from torch import nn
from peft import LoraConfig, get_peft_model

class SimpleMlp(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 10)
        self.fc2 = nn.Linear(10, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleMlp()
config0 = LoraConfig(target_modules=["fc1"], init_lora_weights="mica")
model = get_peft_model(model, config0)

layers_with_requires_grad = [name for name, param in model.named_parameters() if param.requires_grad]
print("Layers with requires_grad=True after first LoRA")
pprint(layers_with_requires_grad)
# correct, should be ['base_model.model.fc1.lora_A.default.weight']

config1 = LoraConfig(target_modules=["fc1", "fc2"], init_lora_weights="mica")
model.add_adapter("other", config1)
model.set_adapter("other")
layers_with_requires_grad = [name for name, param in model.named_parameters() if param.requires_grad]
print("\nLayers with requires_grad=True after switching to other adapter")
pprint(layers_with_requires_grad)
# incorrect, should be ['base_model.model.fc1.lora_A.other.weight', 'base_model.model.fc2.lora_A.other.weight']

model.set_adapter("default")
layers_with_requires_grad = [name for name, param in model.named_parameters() if param.requires_grad]
print("\nLayers with requires_grad=True after switching back to default adapter")
pprint(layers_with_requires_grad)
# incorrect, should be ['base_model.model.fc1.lora_A.default.weight']

If you run this, you'll see that the add_adapter and set_adapter path are not covered.

Therefore, I have a different suggestion which implements this feature in a more declarative way, LMK what you think about that:

First, let's remove update_requires_grad completely. Next, on peft.tuners.tuners_utils.BaseTunerLayer, let's add a class attribute frozen_peft_weight_names: dict[str, tuple[str, ...]] = {}. This will contain a mapping from adapter name to the keys of the PEFT weights that should be frozen.

Second, in MiCALinearVariant.init, for the MiCA adapter, we add an entry to frozen_peft_weight_names for LoRA B. Let's ensure not to simply mutate frozen_peft_weight_names, as it's a class attribute. Instead, re-assign a copy of the mutated dict.

Finally, in _mark_only_adapters_as_trainable and in peft.tuners.tuners_utils.set_adapter, we can check if, for a given PEFT layer, there is an entry in frozen_peft_weight_names, and if we find it, set requires_grad = False.

Let's also add unit tests to ensure this works as expected, my example could serve as a template.

Moreover, MiCA currently wouldn't work for LoRA applied to embedding layers, right? I think it should be easy enough to add support for those.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

sr-networks · 2026-06-04T17:29:30Z

Thanks for the review. I pushed an update addressing the MiCA trainability issue with a declarative frozen_peft_weight_names mapping on BaseTunerLayer, removing update_requires_grad. The freeze is enforced through initial setup, adapter switching, and set_requires_grad.

I also added nn.Embedding support, removed redundant tests, added the r > max_r error test, and added a regression test for switching/reusing adapters.

sr-networks and others added 3 commits April 27, 2026 13:27

Merge branch 'main' into mica

1ab020e

changes for ruff

50ecb0d

BenjaminBossan requested changes May 26, 2026

View reviewed changes

sr-networks and others added 2 commits June 4, 2026 18:13

Update src/peft/tuners/lora/layer.py

23622a3

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Address MiCA trainability and embedding support

cd6bcae

sr-networks marked this pull request as ready for review June 4, 2026 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mica#3260

Mica#3260
sr-networks wants to merge 5 commits into
huggingface:mainfrom
sr-networks:mica

sr-networks commented May 24, 2026 •

edited

Loading

Uh oh!

BenjaminBossan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sr-networks commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sr-networks commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Recommended usage

Main changes

Tests

Limitations

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sr-networks commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sr-networks commented May 24, 2026 •

edited

Loading