As multimodal models go viral these days, we make an attempt to apply CLIP variants in traditional ReID tasks.
This work has been accepted by ISVC 24' and will be published on Advances in Visual Computing by the end of 2024.
Slides are now available.
Installation
pip install -r requirements.txtQuick Start
This repo does not include concrete prompt generation by GPT-4o(mini).
ImageNet-Pretrained ViT/16 IVLP model, thanks for the contribution of multimodal-prompt-learning: IVLP
python3 zero_shot_learning.py --model ViT-B/16 --augmented_template --height 256 --mm --clip_weights xxxTraining Examples with Prompt Engineering
python3 prompt_learning.py --model ViT-B/16 --height 256 --bs 64 --amp --epochs_stage1 120 --epochs_stage2 60 --training_mode ivlp --test_dataset dukemtmc
python3 prompt_learning.py --model ViT-B/16 --height 256 --bs 64 --amp --epochs_stage1 120 --epochs_stage2 60 --training_mode ivlp --train_dataset dukemtmc --test_dataset market1501 --vpt_ctx 2