Official implementation of CoDyRA.
Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong
Low-rank adaptation serves as an implicit forgetting regularizer in continual learning.
The central tension in continual learning (CL) is the trade-off between plasticity (acquiring new knowledge) and stability (retaining prior knowledge). We study how a pre-trained backbone can be continually updated to absorb new knowledge while preserving existing capabilities, via capacity control: regulating the effective rank of each parameter update, a per-step quantity directly controllable inside a LoRA update.
A controlled probe of LoRA rank and placement across modules and tasks reveals a consistent trade-off, with a moderate-rank sweet spot that varies by placement and task, leaving no universally optimal fixed rank; a formal bound shows forgetting grows with rank.
Building on these findings, we propose Continual Dynamic Rank-Selective LoRA (CoDyRA), which jointly trains each LoRA update with adaptive rank minimization via sparsity-promoting regularization on per-component importance weights. The supervised objective drives
We show that adaptive rank minimization serves as a forgetting regularizer in the CL regime, protecting general capability and prior-task knowledge simultaneously by controlling forgetting against the current model state. Across MTIL, X-TAIL, and TRACE (CLIP, LLaMA, Gemma), CoDyRA matches or exceeds prior CL methods on learning accuracy while achieving the lowest forgetting, balancing plasticity and stability.
-
Takeaway 1: LoRA placement is itself a
$\color{purple}{\text{plasticity}}$ –$\color{green}{\text{stability}}$ lever; no single fixed choice dominates. -
Takeaway 2: The
$\color{purple}{\text{plasticity}}$ –$\color{green}{\text{stability}}$ balance is governed by LoRA rank:$\color{purple}{\text{high rank}}$ favors plasticity,$\color{green}{\text{low rank}}$ favors stability, with a sweet spot at moderate rank. - Takeaway 3: The sweet-spot rank is not universal: its location varies systematically by module and by downstream task.
- (See more details in the paper.)
CoDyRA introduces a dynamic rank-selection LoRA, enabling each pre-trained weight matrix to adaptively retain only the necessary ranks for downstream adaptation while preserving pre-trained capabilities. After each task, the rank-pruned LoRA updates merge into the backbone, adding no inference overhead.
Key properties:
- ✅ No past data, task IDs, or per-task modules — operates under a strict CL regime
- ✅ No inference overhead — updates merge into the backbone
- ✅ A single rank-based criterion protects
$\color{green}{\text{general (pretrained) capability}}$ and$\color{green}{\text{prior-task knowledge}}$ - ✅ Fewest trainable parameters among baselines (4.4M vs 60M–130M)
- ✅ Lowest Backward Transfer (BWT 1.87%) across replay, modular, orthogonal-subspace, and fixed-rank LoRA baselines
conda create -n codyra python=3.12 -y
conda activate codyra
# Install PyTorch that matches your CUDA setup, e.g.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Install project dependencies
pip install -r requirements.txt- Set
--data_dirto the root directory that should hold all benchmarks (Aircraft, Caltech101, DTD, EuroSAT, Oxford Flowers, Food-101, MNIST, Oxford Pets, Stanford Cars, SUN397). - Please refer to the following guide for setting up datasets: CoOp
bash runner_codyra.shCoDyRA is released under the Apache License 2.0. See LICENSE for details.
@article{lu2024adaptive,
title = {Take Only What You Need: Adaptive Rank Minimization as Forgetting Regularizer in Continual Learning},
author = {Lu, Haodong and Zhao, Chongyang and Xue, Jason and Yao, Lina and Moore, Kristen and Gong, Dong},
journal = {arXiv preprint arXiv:2412.01004},
year = {2024}
}Our repo benefits from MoE-Adapters and RAIL. We thank them for their wonderful works.
