huggingface · sr-networks · Apr 27, 2026 · May 21, 2026 · May 21, 2026 · Jun 4, 2026
diff --git a/docs/source/developer_guides/lora.md b/docs/source/developer_guides/lora.md
@@ -54,6 +54,19 @@ lora_config = LoraConfig(init_lora_weights="pissa_niter_[number of iters]", ...)
 ```
 For detailed instruction on using PiSSA, please follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/pissa_finetuning).
 
+### MiCA
+
+[MiCA](https://arxiv.org/abs/2604.01694) (Minor Component Adaptation) is a complement to PiSSA: instead of initializing from the *principal* singular components, MiCA uses the *minor* ones. Concretely, with `W = U Σ V^T`, MiCA sets `B = U[:, -r:]` (the `r` left singular vectors associated with the smallest singular values) and `A = 0`. During training, only `A` is updated; `B` is frozen. The intuition is that the minor singular directions are largely unused by the pretrained task and therefore offer a more "plastic" subspace for injecting new knowledge while preserving pretrained capabilities.
+
+Because `A == 0` at init, the adapter contribution `B · A == 0` and the model output is preserved exactly at step 0 — no residual subtraction on the base weight is needed (unlike PiSSA). Since only `A` is trainable, the trainable parameter count for matching `r` is roughly half that of LoRA.
+
+```python
+from peft import LoraConfig
+config = LoraConfig(init_lora_weights="mica", r=16, target_modules=["q_proj", "v_proj"], ...)
+```
+
+MiCA currently supports `nn.Linear` and `nn.Embedding` target modules. The chosen rank must satisfy `r <= min(in_features, out_features)` for linear layers and `r <= min(num_embeddings, embedding_dim)` for embedding layers. For detailed usage, see [these instructions](https://github.com/huggingface/peft/tree/main/examples/mica_finetuning).
+
 ### CorDA
 
 [CorDA](https://huggingface.co/papers/2406.05223) builds task-aware LoRA adapters from weight decomposition oriented by the context of downstream task to learn (instruction-previewed mode, IPM) or world knowledge to maintain (knowledge-preserved mode, KPM).

diff --git a/examples/mica_finetuning/README.md b/examples/mica_finetuning/README.md
@@ -0,0 +1,80 @@
+# MiCA: Minor Component Adaptation
+
+## Introduction ([Paper](https://arxiv.org/abs/2604.01694))
+
+Minor Component Adaptation (MiCA) is a parameter-efficient fine-tuning method closely related to LoRA. Like LoRA, MiCA inserts a low-rank update `ΔW = (α/r) · B · A` into a pretrained weight `W ∈ R^{out×in}`. Unlike LoRA, MiCA initializes the matrices from the singular value decomposition of `W` and trains only one of them:
+
+- Compute the SVD `W = U Σ V^T`.
+- Initialize `B = U[:, -r:]` — the `r` left singular vectors associated with the **smallest** singular values.
+- Initialize `A = 0`.
+- During training, optimize only `A`; `W` and `B` remain frozen.
+
+The motivation is that the *minor* singular directions of a pretrained weight encode subspaces that are largely unused by the original task. Restricting adaptation to these directions provides a more "plastic" subspace for knowledge injection, with less risk of overwriting capabilities encoded in the dominant subspace. Empirically MiCA improves knowledge acquisition while reducing the trainable parameter footprint compared with LoRA at the same rank (because only `A` is trained, the parameter count is roughly halved for matching `r`).
+
+Because `A == 0` at initialization, the adapter contribution `B · A == 0` and the model's forward output is preserved exactly at step 0 — no residual subtraction is needed on the base weight.
+
+## Quick Start
+
+```python
+import torch
+from peft import LoraConfig, get_peft_model
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from trl import SFTConfig, SFTTrainer
+from datasets import load_dataset
+
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", dtype=torch.bfloat16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
+tokenizer.pad_token_id = tokenizer.eos_token_id
+
+lora_config = LoraConfig(
+    init_lora_weights="mica",
+    r=16,
+    lora_alpha=16,
+    target_modules=["q_proj", "v_proj"],
+    task_type="CAUSAL_LM",
+)
+peft_model = get_peft_model(model, lora_config)
+peft_model.print_trainable_parameters()
+
+dataset = load_dataset("imdb", split="train[:1%]")
+training_args = SFTConfig(dataset_text_field="text", max_length=128)
+trainer = SFTTrainer(
+    model=peft_model,
+    args=training_args,
+    train_dataset=dataset,
+    processing_class=tokenizer,
+)
+trainer.train()
+peft_model.save_pretrained("mica-llama-2-7b")
+```
+
+To reload the trained adapter:
+
+```python
+import torch
+from peft import PeftModel
+from transformers import AutoModelForCausalLM
+
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf", dtype=torch.bfloat16, device_map="auto"
+)
+peft_model = PeftModel.from_pretrained(model, "mica-llama-2-7b")
+```
+
+## Notes and limitations
+
+- MiCA currently supports `nn.Linear` and `nn.Embedding` target modules.
+- The chosen rank must satisfy `r <= min(in_features, out_features)` for linear layers and `r <= min(num_embeddings, embedding_dim)` for embedding layers; otherwise initialization raises `ValueError`.
+- MiCA performs a full SVD per target weight at initialization. For 7B-scale models this is a one-time cost of seconds; for substantially larger weight matrices (e.g. 70B-scale) the cost grows.
+- Combining MiCA with `use_dora=True` or other LoRA variants is not supported in this initial integration.
+
+## Citation
+
+```
+@article{rudiger2026mica,
+  title={MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning},
+  author={R{\"u}diger, Sten and Raschka, Sebastian},
+  journal={arXiv preprint arXiv:2604.01694},
+  year={2026}
+}
+```
diff --git a/examples/mica_finetuning/mica_finetuning.py b/examples/mica_finetuning/mica_finetuning.py
@@ -0,0 +1,80 @@
+# Copyright 2023-present the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Minimal MiCA fine-tuning example.
+
+Mirrors `examples/pissa_finetuning/pissa_finetuning.py` in spirit but with the MiCA-specific knobs only. MiCA
+initializes `B` from the bottom-r left singular vectors of the base weight and freezes it during training; only
+`A` is updated. Because `A == 0` at init, the adapter is a no-op on initialization and no residual subtraction
+on the base weight is needed.
+"""
+
+from dataclasses import dataclass, field
+from typing import Optional
+
+import torch
+from datasets import load_dataset
+from transformers import AutoModelForCausalLM, AutoTokenizer, HfArgumentParser
+from trl import SFTConfig, SFTTrainer
+
+from peft import LoraConfig, get_peft_model
+
+
+@dataclass
+class ScriptArguments(SFTConfig):
+    base_model_name_or_path: Optional[str] = field(default=None, metadata={"help": "Name or path of the base model."})
+    lora_r: int = field(default=16)
+    lora_alpha: int = field(default=16)
+    lora_dropout: float = field(default=0.0)
+    target_modules: Optional[str] = field(
+        default="q_proj,v_proj",
+        metadata={"help": "Comma-separated module names to adapt with MiCA."},
+    )
+    data_path: str = field(default="imdb", metadata={"help": "HF dataset path."})
+    dataset_split: str = field(default="train[:1%]")
+    dataset_text_field: str = field(default="text")
+
+
+def train():
+    parser = HfArgumentParser(ScriptArguments)
+    args = parser.parse_args_into_dataclasses()[0]
+
+    model = AutoModelForCausalLM.from_pretrained(args.base_model_name_or_path, dtype=torch.bfloat16, device_map="auto")
+    tokenizer = AutoTokenizer.from_pretrained(args.base_model_name_or_path)
+    if tokenizer.pad_token_id is None:
+        tokenizer.pad_token_id = tokenizer.eos_token_id
+
+    lora_config = LoraConfig(
+        init_lora_weights="mica",
+        r=args.lora_r,
+        lora_alpha=args.lora_alpha,
+        lora_dropout=args.lora_dropout,
+        target_modules=[m.strip() for m in args.target_modules.split(",")],
+        task_type="CAUSAL_LM",
+    )
+    peft_model = get_peft_model(model, lora_config)
+    peft_model.print_trainable_parameters()
+
+    dataset = load_dataset(args.data_path, split=args.dataset_split)
+    trainer = SFTTrainer(
+        model=peft_model,
+        args=args,
+        train_dataset=dataset,
+        processing_class=tokenizer,
+    )
+    trainer.train()
+    peft_model.save_pretrained(args.output_dir)
+
+
+if __name__ == "__main__":
+    train()
diff --git a/method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank32-mica/adapter_config.json b/method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank32-mica/adapter_config.json
@@ -0,0 +1,30 @@
+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": null,
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": false,
+  "init_lora_weights": "mica",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": null,
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}
diff --git a/src/peft/tuners/lora/config.py b/src/peft/tuners/lora/config.py
@@ -408,7 +408,7 @@ class LoraConfig(PeftConfig):
             use the original default value of `lora_alpha/r`.
         modules_to_save (`List[str]`):
             List of modules apart from adapter layers to be set as trainable and saved in the final checkpoint.
-        init_lora_weights (`bool` | `Literal["gaussian", "eva", "olora", "pissa", "pissa_niter_[number of iters]", "corda", "loftq", "orthogonal"]`):
+        init_lora_weights (`bool` | `Literal["gaussian", "eva", "olora", "pissa", "pissa_niter_[number of iters]", "corda", "loftq", "orthogonal", "mica"]`):
             How to initialize the weights of the adapter layers. Passing True (default) results in the default
             initialization from the reference implementation from Microsoft, with the LoRA B weight being set to 0.
             This means that without further training, the LoRA adapter will be a no-op. Setting the initialization to
@@ -430,7 +430,10 @@ class LoraConfig(PeftConfig):
             converges even more rapidly than PiSSA in Instruction-Previewed Mode, and preserves world knowledge better
             than LoRA in Knowledge-Preserved Mode. Passing `"orthogonal"` results in LoRA A and B being intialized
             orthogonally; in this, it resembles `"olora"`, but the base weights are left untouched (requires `r` to be
-            even, only supported for linear layers for now).
+            even, only supported for linear layers for now). Passing `"mica"` results in the initialization of <a
+            href='https://arxiv.org/abs/2604.01694' >Minor Component Adaptation (MiCA)</a>, which initializes B from
+            the r left singular vectors of the base weight associated with the smallest singular values, sets A to
+            zero, and freezes B during training; only A is updated. Currently supported for linear and embedding layers.
         layers_to_transform (`Union[List[int], int]`):
             The layer indices to transform. If a list of ints is passed, it will apply the adapter to the layer indices
             that are specified in this list. If a single integer is passed, it will apply the transformations on the
@@ -566,7 +569,17 @@ class LoraConfig(PeftConfig):
     )
     init_lora_weights: (
         bool
-        | Literal["gaussian", "eva", "olora", "pissa", "pissa_niter_[number of iters]", "corda", "loftq", "orthogonal"]
+        | Literal[
+            "gaussian",
+            "eva",
+            "olora",
+            "pissa",
+            "pissa_niter_[number of iters]",
+            "corda",
+            "loftq",
+            "orthogonal",
+            "mica",
+        ]
     ) = field(
         default=True,
         metadata={
@@ -586,7 +599,10 @@ class LoraConfig(PeftConfig):
                 "nonnegative integer. "
                 "Passing `'corda'` results in CorDA initialization. "
                 "Pass `'loftq'` to use LoftQ initialization. "
-                "Pass `'orthogonal'` for orthogonal initialization of LoRA A and B."
+                "Pass `'orthogonal'` for orthogonal initialization of LoRA A and B. "
+                "Pass `'mica'` to use MiCA initialization, where B is set to the r left singular vectors of the "
+                "base weight associated with the smallest singular values, A is set to zero, and B is frozen during "
+                "training (only A is updated)."
             ),
         },
     )

diff --git a/src/peft/tuners/lora/layer.py b/src/peft/tuners/lora/layer.py
@@ -136,6 +136,10 @@ def __init__(self, base_layer: nn.Module, ephemeral_gpu_offload: bool = False, *
         self.in_features = in_features
         self.out_features = out_features
 
+    def delete_adapter(self, adapter_name: str) -> None:
+        super().delete_adapter(adapter_name)
+        self.lora_variant.pop(adapter_name, None)
+
     def _get_in_out_features(self, module: nn.Module) -> tuple[int, int] | tuple[None, None]:
         return _get_in_out_features(module)
 
@@ -231,6 +235,9 @@ def update_layer(
         elif isinstance(init_lora_weights, str) and init_lora_weights.lower() == "olora":
             with gather_params_ctx(self.get_base_layer().weight):
                 self.olora_init(adapter_name)
+        elif isinstance(init_lora_weights, str) and init_lora_weights.lower() == "mica":
+            with gather_params_ctx(self.get_base_layer().weight):
+                self.mica_init(adapter_name)
         elif init_lora_weights == "loftq":
             with gather_params_ctx(self.get_base_layer().weight):
                 self.loftq_init(adapter_name, config)
@@ -395,6 +402,41 @@ def pissa_init(self, adapter_name, init_lora_weights):
         weight = transpose(weight.to(dtype), self.fan_in_fan_out)
         self.get_base_layer().weight.data = weight
 
+    def mica_init(self, adapter_name):
+        """Minor Component Adaptation (MiCA) initialization (https://arxiv.org/abs/2604.01694).
+
+        Initializes `lora_B` from the `r` left singular vectors of the base weight associated with the smallest
+        singular values, and sets `lora_A` to zero. The `lora_B` matrix is frozen during training (see
+        `MiCALinearVariant.init`); only `lora_A` is updated. Because `lora_A == 0` at init, the adapter
+        contribution `B @ A == 0` and the base weight does not need to be modified to preserve the forward output.
+        """
+        # When the adapter is being created under `init_empty_weights` (e.g. low_cpu_mem_usage=True), its parameters
+        # live on the meta device and will be filled in from a checkpoint after creation. Skip the SVD in that case.
+        if self.lora_B[adapter_name].weight.device.type == "meta":
+            return
+
+        weight = self.get_base_layer().weight
+        dtype = weight.dtype
+        if dtype not in [torch.float32, torch.float16, torch.bfloat16]:
+            raise TypeError("Please initialize MiCA under float32, float16, or bfloat16.")
+
+        weight = transpose(weight.to(torch.float32), self.fan_in_fan_out)
+        # weight has shape (out_features, in_features) once transposed for fan_in_fan_out, matching nn.Linear.weight.
+        # SVD: weight = U @ diag(S) @ Vh, with U: (out, k), Vh: (k, in), S sorted descending.
+        # MiCA selects the LAST r left singular vectors (smallest singular values) for B and zeroes A.
+        r = self.r[adapter_name]
+        max_r = min(weight.shape)
+        if r > max_r:
+            raise ValueError(
+                f"MiCA requires `r` <= min(in_features, out_features) but got r={r} for a layer with "
+                f"weight shape {tuple(weight.shape)} (max usable r is {max_r})."
+            )
+        U, _, _ = torch.linalg.svd(weight.data, full_matrices=False)
+        lora_B = U[:, -r:].contiguous()
+        lora_A = torch.zeros(r, weight.shape[1], device=weight.device)
+        self.lora_B[adapter_name].weight.data = lora_B.to(dtype)
+        self.lora_A[adapter_name].weight.data = lora_A.to(dtype)
+
     def corda_init(self, adapter_name, init_lora_weights):
         linear = self.get_base_layer()
         weight = linear.weight
@@ -815,6 +857,11 @@ def resolve_lora_variant(self, config: LoraConfig, **kwargs) -> Optional[LoraVar
 
             return BdLoraLinearVariant()
 
+        if isinstance(config.init_lora_weights, str) and config.init_lora_weights.lower() == "mica":
+            from .variants import MiCALinearVariant
+
+            return MiCALinearVariant()
+
         use_alora = config.alora_invocation_tokens is not None
         if not config.use_dora and not use_alora:
             return None
@@ -1064,6 +1111,10 @@ def __init__(
     def resolve_lora_variant(self, *, config: LoraConfig, **kwargs) -> Optional[LoraVariant]:
         if config.velora_config is not None:
             raise ValueError("VeLoRA does not support adapting embedding layers.")
+        if isinstance(config.init_lora_weights, str) and config.init_lora_weights.lower() == "mica":
+            from .variants import MiCAEmbeddingVariant
+
+            return MiCAEmbeddingVariant()
         if not config.use_dora:
             return None
 
@@ -1116,7 +1167,10 @@ def update_layer(
 
         self.use_dora[adapter_name] = config.use_dora
 
-        if init_lora_weights == "loftq":
+        if isinstance(init_lora_weights, str) and init_lora_weights.lower() == "mica":
+            with gather_params_ctx(self.get_base_layer().weight):
+                self.mica_init(adapter_name)
+        elif init_lora_weights == "loftq":
             self.loftq_init(adapter_name)
         elif init_lora_weights == "lora_ga":
             # Embedding layers don't support LoRA-GA, fall back to standard initialization
@@ -1145,6 +1199,36 @@ def output_fn(outputs):
             self.input_fns[adapter_name] = input_fn
             self.output_fns[adapter_name] = output_fn
 
+    def mica_init(self, adapter_name):
+        """Minor Component Adaptation (MiCA) initialization for embedding layers.
+
+        The effective embedding projection has shape `(embedding_dim, num_embeddings)`, so MiCA initializes
+        `lora_embedding_B` from the minor left singular vectors of `base_layer.weight.T` and sets
+        `lora_embedding_A` to zero.
+        """
+        if self.lora_embedding_B[adapter_name].device.type == "meta":
+            return
+
+        weight = self.get_base_layer().weight
+        dtype = weight.dtype
+        if dtype not in [torch.float32, torch.float16, torch.bfloat16]:
+            raise TypeError("Please initialize MiCA under float32, float16, or bfloat16.")
+
+        weight = weight.to(torch.float32).T
+        r = self.r[adapter_name]
+        max_r = min(weight.shape)
+        if r > max_r:
+            raise ValueError(
+                f"MiCA requires `r` <= min(num_embeddings, embedding_dim) but got r={r} for an embedding layer with "
+                f"weight shape {tuple(self.get_base_layer().weight.shape)} (max usable r is {max_r})."
+            )
+
+        U, _, _ = torch.linalg.svd(weight.data, full_matrices=False)
+        lora_embedding_B = U[:, -r:].contiguous()
+        lora_embedding_A = torch.zeros(r, weight.shape[1], device=weight.device)
+        self.lora_embedding_B[adapter_name].data = lora_embedding_B.to(dtype)
+        self.lora_embedding_A[adapter_name].data = lora_embedding_A.to(dtype)
+
     def merge(self, safe_merge: bool = False, adapter_names: Optional[list[str]] = None) -> None:
         """
         Merge the active adapter weights into the base weights