Skip to content

jeff024/tppt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TPPT: Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

TMLR arXiv

Official implementation of Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors (TPPT), accepted at TMLR.

Authors: Haodong Lu, Xinyu Zhang, Kristen Moore, Jason Xue, Lina Yao, Anton van den Hengel, Dong Gong


Abstract

Continual learning (CL) enables deep neural networks to acquire new knowledge over time while mitigating catastrophic forgetting of previously learned information. The powerful generalization ability of pre-trained models (PTMs), such as the Contrastive Language-Image Pre-training (CLIP) model, has inspired a range of CL methods targeting new and specialized tasks, further bridging the gap between PTMs and continual adaptation.

Leveraging its multi-modal visual and textual representations, CLIP offers a natural paradigm for CL, where new tasks can be accommodated by incrementally learning lightweight parameters, particularly prompts. However, existing prompt-based CL methods for PTMs often rely on complex designs built upon specific assumptions, such as intricate regularization schemes for prompt pools, specialized routing mechanisms, or multi-stage incrementation processes. While these approaches improve performance, they frequently introduce additional—and possibly unnecessary—complexity, underutilizing CLIP's intrinsic capabilities.

We propose a concise CL approach for CLIP based on incremental prompt tuning that fully exploits its multi-modal structure and the stability of textual representations. Our method, Textual Prototype-guided Prompt Tuning (TPPT), introduces textual prototypes not merely as static classifiers, as in existing methods, but as stable anchors to guide the learning of visual prompts, thereby shaping the embedding space (i.e., TPPT-V). We show that our bidirectional supervision strategy enables more effective learning of new knowledge while reducing forgetting. To further close the vision-language gap during CL, we activate the language branch and extend our approach to jointly optimize both visual and textual prompts (i.e., TPPT-VT). We also introduce a relational diversity regularization on the textual anchors to prevent embedding space collapse and mitigate correlated forgetting. Extensive experiments and analyses demonstrate the effectiveness of our proposed approach, highlighting the benefits of leveraging CLIP's intrinsic guidance for continual adaptation.


Design of TPPT

Conceptual Design

Conceptual illustrations

Conceptual illustrations of (a) standard Cross-Entropy (CE), (b) our proposed TPPT-V, (c) a naïve multi-modal extension of TPPT-V, and (d) our proposed TPPT-VT.

  • (a) Standard CE: Prior methods use CE loss to adapt PTMs, but suffer from representation drift, leading to forgetting.
  • (b) TPPT-V: Introduces a textual prototypical contrastive loss to anchor visual features and mitigate drift.
  • (c) Naïve Extension: A naïve extension that also tunes textual prompts may improve textual prototype quality but risks collapse to trivial solutions.
  • (d) TPPT-VT: Addresses this by regularizing multi-modal prompt learning with diversity constraints on textual prototypes.

Framework Overview

Framework overview

Overall framework of our two proposed methods.

  1. TPPT-V: The learned visual representations are guided by static textual prototypes. We alleviate the forgetting issue by guiding visual representations with consistent textual prototypes, preventing drift of representations in the embedding space.

  2. TPPT-VT: To improve upon the static textual prototypes, we propose to learn textual prompts for prototypes and regulate the learning process by encouraging diversity.

  3. Key Advantage: Benefiting from the textual prototype anchors, our proposed methods remain simple yet effective, unlike previous methods that use delicate, complex designs.


Installation

Setup

# Clone the repository
git clone https://github.com/jeff024/tppt.git
cd tppt

# Create conda environment
conda create -n tppt python=3.8
conda activate tppt

# Install other dependencies
pip install -r requirements.txt

Datasets

We provide processed datasets for download:


Quick Start

How to Run

General usage:

python main.py --config=./configs/{method}/{dataset}.yaml

Where:

  • {method} can be tppt_v or tppt_vt
  • {dataset} can be aircraft, cars, cf100, cub, or inr

Example: Training TPPT-VT on fine-grained aircraft classification:

python main.py --config=./configs/tppt_vt/aircraft.yaml

Configuration

Edit the YAML configuration files in configs/ to customize:

  • Dataset settings: dataset name
  • Model settings: backbone type, pretrained weights
  • Training hyperparameters: batch size, learning rate, epochs
  • Prompt settings: prompt depth, prompt length, number of prompts

Citation

@article{lu2025tppt,
  title={Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors},
  author={Lu, Haodong and Zhang, Xinyu and Moore, Kristen and Xue, Jason and Yao, Lina and van den Hengel, Anton and Gong, Dong},
  journal={Transactions on Machine Learning Research},
  year={2025},
  url={https://openreview.net/forum?id=YJnjkzKq5Y}
}

License

TPPT is released under the Apache License 2.0. See LICENSE for details.


Acknowledgments

Our repository benefits from LAMDA-PILOT and open-clip. We thank them for their wonderful work.

About

Official implementation of "Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors", accepted at TMLR.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages