Skip to content

Mica#3260

Open
sr-networks wants to merge 5 commits into
huggingface:mainfrom
sr-networks:mica
Open

Mica#3260
sr-networks wants to merge 5 commits into
huggingface:mainfrom
sr-networks:mica

Conversation

@sr-networks

@sr-networks sr-networks commented May 24, 2026

Copy link
Copy Markdown

Added support for MiCA as proposed in

#3142

What does this PR do?

This PR adds MiCA (Minor Component Adaptation) as a LoRA initialization variant, exposed via:

LoraConfig(init_lora_weights="mica")

MiCA initializes LoRA from the minor singular subspace of the base weight matrix. For a target linear layer with weight matrix (W = U \Sigma V^T), MiCA initializes:

B = U[:, -r:]
A = 0

where B contains the r left singular vectors corresponding to the smallest singular values.

Since A is initialized to zero, the adapter contribution is zero at initialization and the base model output is preserved.

Why freeze lora_B?

MiCA treats lora_B as the fixed minor-component subspace and trains only lora_A.

Freezing lora_B preserves the intended MiCA constraint during training. Without this, the adapter would no longer remain constrained to the selected minor subspace and would behave more like an unconstrained LoRA update.

This is why this PR also updates the LoRA variant interface so that variants can customize adapter trainability.

Recommended usage

MiCA is primarily intended for continued pretraining / domain-adaptive pretraining, not for instruction fine-tuning.

The recommended workflow is:

  1. Start from the base model, not the instruct/chat model.

  2. Train the MiCA adapter on continued-pretraining data.

  3. Merge the trained adapter into the model weights.

  4. Use the resulting merged model as the adapted base for subsequent instruction/chat tuning, or merge/apply it before using the corresponding instruct/chat model setup.

This recommendation follows the intended use of MiCA as a method for injecting domain knowledge into pretrained representations while constraining the update to the selected minor-component subspace.

Main changes

  • Adds "mica" as a valid value for LoraConfig(init_lora_weights=...).
  • Adds SVD-based MiCA initialization for nn.Linear and nn.Embedding layers.
  • Initializes B from the minor left singular vectors of the base weight.
  • Initializes A to zero so the adapter is a no-op at initialization.
  • Uses BaseTunerLayer.frozen_peft_weight_names to keep MiCA B frozen across get_peft_model, add_adapter, set_adapter, and set_requires_grad.
  • Keeps standard LoRA forward, merge, and unmerge behavior.
  • Adds MiCA tests, documentation, and a runnable fine-tuning example.

Tests

This PR adds tests covering:

  • zero adapter contribution at initialization
  • use of the minor rather than major singular subspace
  • B being frozen
  • adapter switching with MiCA
  • reusing an adapter name after deleting a MiCA adapter
  • r > max_r error handling
  • embedding initialization
  • custom model behavior with MiCA

Limitations

  • Supports nn.Linear and nn.Embedding target modules.
  • Requires r <= min(in_features, out_features) for linear layers.
  • Requires r <= min(num_embeddings, embedding_dim) for embedding layers.
  • Performs a full SVD during adapter initialization.

sr-networks and others added 3 commits April 27, 2026 13:27
Adds Minor Component Adaptation (https://arxiv.org/abs/2604.01694) as a
new init scheme for LoraConfig, triggered by `init_lora_weights="mica"`.
Resolves huggingface#3142.

MiCA initializes `B = U[:, -r:]` (the r left singular vectors of the base
weight associated with the smallest singular values) and `A = 0`. During
training only `A` is updated; `B` is frozen. Because `A == 0` at init, the
adapter contribution `B @ A` is zero and the forward output is preserved
exactly, with no need to mutate the base weight.

Implementation:

* `LoraConfig.init_lora_weights` accepts `"mica"`.
* `LoraLayer.mica_init` performs the SVD-based init for Linear targets and
  validates `r <= min(in_features, out_features)`. The init is skipped when
  the adapter parameters are on the meta device (low_cpu_mem_usage path).
* `MiCALinearVariant` is a `LoraVariant` that resolves for the MiCA init
  scheme. Forward and merge semantics are vanilla LoRA; the only override
  of substance is the new `update_requires_grad` hook.
* `LoraVariant.update_requires_grad(module, adapter_name)` is a new entry
  point on the variant base class. Default is a no-op so existing variants
  are unaffected. `LoraModel._mark_only_adapters_as_trainable` invokes it
  for every adapter after the base trainability marking, which is where
  MiCA freezes `lora_B`.

MiCA is currently restricted to `nn.Linear`. Passing `init_lora_weights="mica"`
on a non-Linear target raises `ValueError: Unknown initialization` via the
existing `reset_lora_parameters` fallback.

Tests:

* `tests/test_initialization.py` adds 6 MiCA-specific tests covering init
  correctness, that B is the minor (not major) subspace, B-freeze, train
  step behavior, save/load round-trip, and the unsupported-layer error.
* `tests/test_custom_models.py` adds two parametrized MiCA entries to
  `TEST_CASES` for broader coverage (save/load, merge/unmerge, autocast).
* `tests/testing_common.py` and `tests/test_custom_models.py` relax two
  assertions that previously required *every* `lora_*` parameter to be
  trainable / receive gradients, to accommodate variants like MiCA that
  intentionally freeze a subset.

Docs and example:

* `docs/source/developer_guides/lora.md` adds a MiCA section.
* `examples/mica_finetuning/` provides a runnable example and README.
* `method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank32-mica/`
  registers a benchmark config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@BenjaminBossan BenjaminBossan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR to add MiCA. For this PR, I have focused on the implementation, I haven't checked the example and documentation yet.

I have a couple of smaller comments, but there is also a larger issue, which is that the way that requires_grad is set is not sufficient yet. Right now, it only covers the get_peft_model path, but that's not the only one that can modify requires_grad. Take this example:

from pprint import pprint
import torch
from torch import nn
from peft import LoraConfig, get_peft_model

class SimpleMlp(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 10)
        self.fc2 = nn.Linear(10, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleMlp()
config0 = LoraConfig(target_modules=["fc1"], init_lora_weights="mica")
model = get_peft_model(model, config0)

layers_with_requires_grad = [name for name, param in model.named_parameters() if param.requires_grad]
print("Layers with requires_grad=True after first LoRA")
pprint(layers_with_requires_grad)
# correct, should be ['base_model.model.fc1.lora_A.default.weight']

config1 = LoraConfig(target_modules=["fc1", "fc2"], init_lora_weights="mica")
model.add_adapter("other", config1)
model.set_adapter("other")
layers_with_requires_grad = [name for name, param in model.named_parameters() if param.requires_grad]
print("\nLayers with requires_grad=True after switching to other adapter")
pprint(layers_with_requires_grad)
# incorrect, should be ['base_model.model.fc1.lora_A.other.weight', 'base_model.model.fc2.lora_A.other.weight']

model.set_adapter("default")
layers_with_requires_grad = [name for name, param in model.named_parameters() if param.requires_grad]
print("\nLayers with requires_grad=True after switching back to default adapter")
pprint(layers_with_requires_grad)
# incorrect, should be ['base_model.model.fc1.lora_A.default.weight']

If you run this, you'll see that the add_adapter and set_adapter path are not covered.

Therefore, I have a different suggestion which implements this feature in a more declarative way, LMK what you think about that:

First, let's remove update_requires_grad completely. Next, on peft.tuners.tuners_utils.BaseTunerLayer, let's add a class attribute frozen_peft_weight_names: dict[str, tuple[str, ...]] = {}. This will contain a mapping from adapter name to the keys of the PEFT weights that should be frozen.

Second, in MiCALinearVariant.init, for the MiCA adapter, we add an entry to frozen_peft_weight_names for LoRA B. Let's ensure not to simply mutate frozen_peft_weight_names, as it's a class attribute. Instead, re-assign a copy of the mutated dict.

Finally, in _mark_only_adapters_as_trainable and in peft.tuners.tuners_utils.set_adapter, we can check if, for a given PEFT layer, there is an entry in frozen_peft_weight_names, and if we find it, set requires_grad = False.

Let's also add unit tests to ensure this works as expected, my example could serve as a template.

Moreover, MiCA currently wouldn't work for LoRA applied to embedding layers, right? I think it should be easy enough to add support for those.

Comment thread src/peft/tuners/lora/layer.py Outdated
Comment thread src/peft/tuners/lora/layer.py
Comment thread src/peft/tuners/lora/layer.py Outdated
Comment thread tests/test_custom_models.py Outdated
Comment thread tests/test_initialization.py Outdated
Comment thread tests/test_initialization.py Outdated
Comment thread tests/test_initialization.py Outdated
Comment thread tests/test_initialization.py
sr-networks and others added 2 commits June 4, 2026 18:13
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@sr-networks sr-networks marked this pull request as ready for review June 4, 2026 17:28
@sr-networks

Copy link
Copy Markdown
Author

Thanks for the review. I pushed an update addressing the MiCA trainability issue with a declarative frozen_peft_weight_names mapping on BaseTunerLayer, removing update_requires_grad. The freeze is enforced through initial setup, adapter switching, and set_requires_grad.

I also added nn.Embedding support, removed redundant tests, added the r > max_r error test, and added a regression test for switching/reusing adapters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants