TPPT: Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Official implementation of Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors (TPPT), accepted at TMLR.

Authors: Haodong Lu, Xinyu Zhang, Kristen Moore, Jason Xue, Lina Yao, Anton van den Hengel, Dong Gong

Abstract

Continual learning (CL) enables deep neural networks to acquire new knowledge over time while mitigating catastrophic forgetting of previously learned information. The powerful generalization ability of pre-trained models (PTMs), such as the Contrastive Language-Image Pre-training (CLIP) model, has inspired a range of CL methods targeting new and specialized tasks, further bridging the gap between PTMs and continual adaptation.

Leveraging its multi-modal visual and textual representations, CLIP offers a natural paradigm for CL, where new tasks can be accommodated by incrementally learning lightweight parameters, particularly prompts. However, existing prompt-based CL methods for PTMs often rely on complex designs built upon specific assumptions, such as intricate regularization schemes for prompt pools, specialized routing mechanisms, or multi-stage incrementation processes. While these approaches improve performance, they frequently introduce additional—and possibly unnecessary—complexity, underutilizing CLIP's intrinsic capabilities.

We propose a concise CL approach for CLIP based on incremental prompt tuning that fully exploits its multi-modal structure and the stability of textual representations. Our method, Textual Prototype-guided Prompt Tuning (TPPT), introduces textual prototypes not merely as static classifiers, as in existing methods, but as stable anchors to guide the learning of visual prompts, thereby shaping the embedding space (i.e., TPPT-V). We show that our bidirectional supervision strategy enables more effective learning of new knowledge while reducing forgetting. To further close the vision-language gap during CL, we activate the language branch and extend our approach to jointly optimize both visual and textual prompts (i.e., TPPT-VT). We also introduce a relational diversity regularization on the textual anchors to prevent embedding space collapse and mitigate correlated forgetting. Extensive experiments and analyses demonstrate the effectiveness of our proposed approach, highlighting the benefits of leveraging CLIP's intrinsic guidance for continual adaptation.

Design of TPPT

Conceptual Design

Conceptual illustrations of (a) standard Cross-Entropy (CE), (b) our proposed TPPT-V, (c) a naïve multi-modal extension of TPPT-V, and (d) our proposed TPPT-VT.

(a) Standard CE: Prior methods use CE loss to adapt PTMs, but suffer from representation drift, leading to forgetting.
(b) TPPT-V: Introduces a textual prototypical contrastive loss to anchor visual features and mitigate drift.
(c) Naïve Extension: A naïve extension that also tunes textual prompts may improve textual prototype quality but risks collapse to trivial solutions.
(d) TPPT-VT: Addresses this by regularizing multi-modal prompt learning with diversity constraints on textual prototypes.

Framework Overview

Overall framework of our two proposed methods.

TPPT-V: The learned visual representations are guided by static textual prototypes. We alleviate the forgetting issue by guiding visual representations with consistent textual prototypes, preventing drift of representations in the embedding space.
TPPT-VT: To improve upon the static textual prototypes, we propose to learn textual prompts for prototypes and regulate the learning process by encouraging diversity.
Key Advantage: Benefiting from the textual prototype anchors, our proposed methods remain simple yet effective, unlike previous methods that use delicate, complex designs.

Installation

Setup

# Clone the repository
git clone https://github.com/jeff024/tppt.git
cd tppt

# Create conda environment
conda create -n tppt python=3.8
conda activate tppt

# Install other dependencies
pip install -r requirements.txt

Datasets

We provide processed datasets for download:

Aircraft:
- Google Drive
- OneDrive
CIFAR-100: Automatically downloaded by the code
Cars:
- Google Drive
- OneDrive
CUB-200:
- Google Drive
- OneDrive
ImageNet-R:
- Google Drive
- OneDrive

Quick Start

How to Run

General usage:

python main.py --config=./configs/{method}/{dataset}.yaml

Where:

{method} can be tppt_v or tppt_vt
{dataset} can be aircraft, cars, cf100, cub, or inr

Example: Training TPPT-VT on fine-grained aircraft classification:

python main.py --config=./configs/tppt_vt/aircraft.yaml

Configuration

Edit the YAML configuration files in configs/ to customize:

Dataset settings: dataset name
Model settings: backbone type, pretrained weights
Training hyperparameters: batch size, learning rate, epochs
Prompt settings: prompt depth, prompt length, number of prompts

Citation

@article{lu2025tppt,
  title={Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors},
  author={Lu, Haodong and Zhang, Xinyu and Moore, Kristen and Xue, Jason and Yao, Lina and van den Hengel, Anton and Gong, Dong},
  journal={Transactions on Machine Learning Research},
  year={2025},
  url={https://openreview.net/forum?id=YJnjkzKq5Y}
}

License

TPPT is released under the Apache License 2.0. See LICENSE for details.

Acknowledgments

Our repository benefits from LAMDA-PILOT and open-clip. We thank them for their wonderful work.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
clip_backbones		clip_backbones
configs		configs
figs		figs
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPPT: Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Abstract

Design of TPPT

Conceptual Design

Framework Overview

Installation

Setup

Datasets

Quick Start

How to Run

Configuration

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TPPT: Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Abstract

Design of TPPT

Conceptual Design

Framework Overview

Installation

Setup

Datasets

Quick Start

How to Run

Configuration

Citation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages