End-to-End Knowledge Distillation for Unsupervised DomainAdaptation with Large Vision-language Models
Abstract: Knowledge distillation based on large vision-language models (VLMs) has recently emerged as a significant solution to transfer knowledge from the source domain to the target domain in unsupervised domain adaptation (UDA) tasks. However, existing methods employ a two-stage training pipeline, which not only complicates the training procedure but also lacks interactions between the source and target domains, severely hindering real-time cross-domain knowledge transfer. To address these challenges, we propose End to End Knowledge Distillation for UDA with large VLMs (termed as EKDA). (1) EKDA employs a lightweight prompt learning mechanism to first embed the knowledge from the source domain into VLMs, and then simultaneously utilize the image encoder and text encoder of VLMs to perform knowledge distillation on the target domain, significantly reducing the domain gap. (2) EKDA designs a teacher-student alternating training strategy to implement realtime collaborative interactions across domains, enabling an end-toend paradigm to provide accurate source domain-aware supervision for the target domain. We conduct extensive experiments on 4 widely recognized benchmark datasets including Office-31, OfficeHome, VisDA-2017, and Mini-DomainNet. Experimental results demonstrate that EKDA achieves significant performance improvement over the state-of-the-art UDA approaches, while maintaining a much lower model complexity. Take Office-Home for example, EKDA has gained at least 2.7% performance improvement while reducing the learnable parameters by over 80% compared with the strongest baselines.
- New paradigm: To the best of our knowledge, this is the first attempt to propose an end-to-end knowledge distillation paradigm that facilitates real-time collaborative interactions between the source domain and target domain, breaking new ground for UDA with VLMs-based cross-domain knowledge transfer.
- Novel method: We introduce a lightweight cross-domain distillation method EKDA that (1) leverages text features from the source domain as shared semantic centers, and (2) extracts source domain-aware image features and pseudo labels for the target domain to implement real-time knowledge distillation, significantly reducing the domain gap.
- High Performance: We conduct extensive experiments on UDA benchmark datasets including Office-31, OfficeHome, VisDA-2017, and MiniDomainNet. Experimental results demonstrate that EKDA achieves significant performance improvement over the state-of-the-art approaches, while maintaining a much lower model complexity
- Setup conda environment.
# Create a conda environment
conda create -y -n ekda python=3.8
# Activate the environment
conda activate ekda
# Install torch (requires version >= 1.8.1) and torchvision
# Please refer to https://pytorch.org/get-started/previous-versions/ if your cuda version is different
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia- Clone EKDA code repository and install requirements.
# Clone EKDA code base
git clone https://github.com/1d1x1w/EKDA.git
cd EKDA
# Install requirements
pip install -r requirements.txtPlease follow the instructions as follows to prepare all datasets. Datasets list:
Please follow the instructions for training, evaluating and reproducing the results. Firstly, you need to modify the directory of data by yourself in scripts/main_ekda.sh and scripts/eval_ekda.sh.
# Example: trains on Office-Home dataset, and the source domian is art and the target domain is clipart (a-c)
bash scripts/ekda/main_ekda.sh officehome b32_ep20_officehome EKDA ViT-B/16 2 a-c 0# evaluates on Office-Home dataset, and the source domian is art and the target domain is clipart (a-c)
bash scripts/ekda/eval_ekda.sh officehome b32_ep20_officehome EKDA ViT-B/16 2 a-c 0
The details are at each method folder in [scripts folder](EKDA/scripts at main · 1d1x1w/EKDA (github.com)).
Our style of reademe refers to PDA. And our code is based on CoOp and CoCoOp, DAPL ,MaPLe ,PromptSRC , PromptKD and PDA etc. repository. We thank the authors for releasing their code. If you use their model and code, please consider citing these works as well. Supported methods are as follows:
| Method | Paper | Code |
|---|---|---|
| CoOp | IJCV 2022 | link |
| CoCoOp | CVPR 2022 | link |
| VPT | ECCV 2022 | link |
| IVLP & MaPLe | CVPR 2023 | link |
| DAPL | TNNLS 2023 | link |
| PromptSRC | ICCV2023 | link |
| PromptKD | CVPR2024 | link |
| PDA | AAAI 2024 | link |
