Accepted by ICML 2026.
D²O is a strictly training-free inference-time operator for CLIP-style vision-language models. For each test sample, it builds:
fcnt: a retrieval-oriented feature with nuisance-sensitive directions suppressed.zsty: a low-dimensional routing coordinate for environment-aware bias tracking.sdeb: debiased logits obtained by subtracting cluster-wise centered logit bias.
This repository contains reproducible code for the strongest host-adapter combination used in the paper:
- D²O+ADAPT: online and transductive Gaussian-posterior adaptation.
Generated results, pre-extracted features, probe CSVs, checkpoints, and run logs are intentionally excluded. They can be regenerated from the code.
.
├── ADAPT_online_ecw_adapt_probe.py # D²O+ADAPT, online setting
├── ADAPT_transductive_ecw_adapt_probe.py # D²O+ADAPT, transductive setting
├── Pre_extract_class_emb_default.py # CLIP text/class embedding pre-extraction
├── asset/ # motivation figure and README assets
├── clip/ # CLIP implementation used by ADAPT host
├── configs/
│ └── d2o_paper_hparams.yaml # dataset-specific paper preset groups
├── data/ # dataset loaders and class-name utilities
├── descriptions/ # GPT-assisted class descriptions
├── scripts/
│ ├── pre_extract_class_emb.sh
│ ├── run_d2o_adapt_online.sh
│ ├── run_d2o_adapt_online_paper_presets.sh
│ ├── run_d2o_adapt_transductive.sh
│ └── run_d2o_adapt_transductive_paper_presets.sh
├── utils/ # shared utilities
├── requirements_adapt.txt # ADAPT/D²O+ADAPT environment deps
D²O+ADAPT follows the ADAPT environment:
conda create -n d2o_adapt python=3.10
conda activate d2o_adapt
pip install -r requirements_adapt.txtSet DATA_ROOT to the root folder containing the benchmark datasets. The ADAPT-side loader expects names such as:
DATA_ROOT/
├── imagenet-adversarial/imagenet-a
├── imagenet-rendition/imagenet-r
├── imagenet-sketch/ImageNet-Sketch
├── imagenetv2/imagenetv2-matched-frequency-format-val
├── imagenet-c/<corruption>/<level>
├── PUG_ImageNet/<variant>
├── dtd
├── oxford_flowers
└── ...
For the fine-grained datasets, keep the CoOp/TPT split JSON files inside each dataset folder, for example split_zhou_OxfordPets.json.
For ImageNetV2, use the WordNet-folder layout under imagenetv2/:
DATA_ROOT/
└── imagenetv2/
├── classnames.txt
└── imagenetv2-matched-frequency-format-val/
├── n01440764/
├── n01443537/
└── ...
For ImageNet variants and PUG, also keep the corresponding classnames.txt file in the dataset family folder, for example imagenet-adversarial/classnames.txt, imagenetv2/classnames.txt, and PUG_ImageNet/classnames.txt.
D²O+ADAPT uses CLIP text/class embeddings stored under pre_extracted_class_feat/. They are generated, not committed. Generate them once for the TESTSETS you plan to evaluate before running the ADAPT preset scripts.
DATA_ROOT=/path/to/data \
TESTSETS=imagenetv2/imagenet_a/imagenet_r/imagenet_sketch \
MODEL_ARCH=ViT-B/16 \
bash scripts/pre_extract_class_emb.shFor PUG, the script stores the shared embedding as pug_imagenet.pth:
DATA_ROOT=/path/to/data TESTSETS=pug_cpitch bash scripts/pre_extract_class_emb.shFor the full ADAPT paper preset suite, pass the same slash-separated dataset names used by the preset scripts, or run the pre-extraction command separately for each group.
Online:
DATA_ROOT=/path/to/data bash scripts/run_d2o_adapt_online_paper_presets.shTransductive:
DATA_ROOT=/path/to/data bash scripts/run_d2o_adapt_transductive_paper_presets.shThe preset scripts apply the paper hyperparameter settings and write logs under outputs/d2o_adapt_online_paper_presets/ and outputs/d2o_adapt_transductive_paper_presets/.
- This release does not include result folders, generated class embeddings, cached visual features, or model checkpoints.
- CLIP weights are downloaded by the CLIP loader if they are not already cached.
- Set
CUDA_VISIBLE_DEVICESoutside the scripts if you want to choose a specific GPU. - The PUG-to-ImageNet class mapping metadata is included at
data/PUG_ImageNet-Class_Ref_ImageNet_class.xlsx.
This code builds on the public implementations and dataset preparation conventions from:
We thank the authors for releasing their code and dataset preparation instructions.
If you find this repository useful, please cite:
@inproceedings{luo2026d2o,
title={{D²O}: A Dual Debiasing Operator for Training-Free Test-Time Adaptation of Vision-Language Models},
author={Luo, Yihong and He, Wenwu and Liang, Dong and Zhou, Yihang and Cui, Zhuo-Xu},
booktitle={Proceedings of the 43rd International Conference on Machine Learning},
year={2026}
}