Skip to content

jeff024/codyra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Take Only What You Need: Adaptive Rank Minimization as Forgetting Regularizer in Continual Learning

arXiv

Official implementation of CoDyRA.

Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong


TL;DR

Low-rank adaptation serves as an implicit forgetting regularizer in continual learning.

Abstract

The central tension in continual learning (CL) is the trade-off between plasticity (acquiring new knowledge) and stability (retaining prior knowledge). We study how a pre-trained backbone can be continually updated to absorb new knowledge while preserving existing capabilities, via capacity control: regulating the effective rank of each parameter update, a per-step quantity directly controllable inside a LoRA update.

A controlled probe of LoRA rank and placement across modules and tasks reveals a consistent trade-off, with a moderate-rank sweet spot that varies by placement and task, leaving no universally optimal fixed rank; a formal bound shows forgetting grows with rank.

Building on these findings, we propose Continual Dynamic Rank-Selective LoRA (CoDyRA), which jointly trains each LoRA update with adaptive rank minimization via sparsity-promoting regularization on per-component importance weights. The supervised objective drives $\color{purple}{\text{plasticity}}$; rank minimization regularizes $\color{green}{\text{forgetting}}$.

We show that adaptive rank minimization serves as a forgetting regularizer in the CL regime, protecting general capability and prior-task knowledge simultaneously by controlling forgetting against the current model state. Across MTIL, X-TAIL, and TRACE (CLIP, LLaMA, Gemma), CoDyRA matches or exceeds prior CL methods on learning accuracy while achieving the lowest forgetting, balancing plasticity and stability.

Key Takeaways from Analyses

  • Takeaway 1: LoRA placement is itself a $\color{purple}{\text{plasticity}}$$\color{green}{\text{stability}}$ lever; no single fixed choice dominates.
  • Takeaway 2: The $\color{purple}{\text{plasticity}}$$\color{green}{\text{stability}}$ balance is governed by LoRA rank: $\color{purple}{\text{high rank}}$ favors plasticity, $\color{green}{\text{low rank}}$ favors stability, with a sweet spot at moderate rank.
  • Takeaway 3: The sweet-spot rank is not universal: its location varies systematically by module and by downstream task.
  • (See more details in the paper.)

Overview of CoDyRA Methodology

CoDyRA overview diagram

CoDyRA introduces a dynamic rank-selection LoRA, enabling each pre-trained weight matrix to adaptively retain only the necessary ranks for downstream adaptation while preserving pre-trained capabilities. After each task, the rank-pruned LoRA updates merge into the backbone, adding no inference overhead.

Key properties:

  • ✅ No past data, task IDs, or per-task modules — operates under a strict CL regime
  • ✅ No inference overhead — updates merge into the backbone
  • ✅ A single rank-based criterion protects $\color{green}{\text{general (pretrained) capability}}$ and $\color{green}{\text{prior-task knowledge}}$
  • ✅ Fewest trainable parameters among baselines (4.4M vs 60M–130M)
  • ✅ Lowest Backward Transfer (BWT 1.87%) across replay, modular, orthogonal-subspace, and fixed-rank LoRA baselines

Quick Start

1. Environment

conda create -n codyra python=3.12 -y
conda activate codyra
# Install PyTorch that matches your CUDA setup, e.g.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Install project dependencies
pip install -r requirements.txt

2. Data

  • Set --data_dir to the root directory that should hold all benchmarks (Aircraft, Caltech101, DTD, EuroSAT, Oxford Flowers, Food-101, MNIST, Oxford Pets, Stanford Cars, SUN397).
  • Please refer to the following guide for setting up datasets: CoOp

Running CoDyRA

bash runner_codyra.sh

License

CoDyRA is released under the Apache License 2.0. See LICENSE for details.

Citation

@article{lu2024adaptive,
  title   = {Take Only What You Need: Adaptive Rank Minimization as Forgetting Regularizer in Continual Learning},
  author  = {Lu, Haodong and Zhao, Chongyang and Xue, Jason and Yao, Lina and Moore, Kristen and Gong, Dong},
  journal = {arXiv preprint arXiv:2412.01004},
  year    = {2024}
}

Acknowledgement

Our repo benefits from MoE-Adapters and RAIL. We thank them for their wonderful works.

About

Official implementation of "Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors