This repository contains the official code for the paper "Robust Watermarks Meet Backdoored Models: Evading Diffusion Semantic Watermarks via Stealthy Backdoor", accepted by USENIX Security Symposium 2026.
@inproceedings{LCLWHWW26,
author = {Jinyuan Liu and Tianshuo Cong and Pei Li and Tianrui Wang and Xinlei He and Anyu Wang and Xiaoyun Wang},
title = {Robust Watermarks Meet Backdoored Models: Evading Diffusion Semantic Watermarks via Stealthy Backdoor},
booktitle = {USENIX Security Symposium} ,
year = {2026}
}In this paper, we propose GhostVAE, a stealthy VAE encoder backdoor that achieves watermark evasion under trigger activation while preserving detection behavior on benign inputs. You can reproduce the results in this paper by following the instructions. The Diffusion_watermark directory provides the core utilities for image generation and watermark detection. The GhostVAE_pipeline directory contains the implementation of GhostVAE, including backdoor training, defense against GhostVAE.
conda create -n dif python=3.12
conda activate dif
pip install -r requirements_dif.txt
- Change
script/gpu_run.shfor your CONDA_PATH - Update the local model path in
src/consts.pyat Line 36. To use a custom VAE, such as GhostVAE, update the VAE path insrc/consts.pyat Line 51 and add its name toModelEnumat Line 17. - We use 24 cores for CPU tasks by default. Use
export DIFF_WATERMARK_PROCESSES=xxto change it. - We use 8 cards for GPU tasks by default. Change it in
script/*.shas you wish. - To run a diffusion model, specify the target model in
main/setup/generate_requests.pyat Line 34 and the corresponding scheduler at Line 39. In addition, when running SD-2.1 or SD-XL, comment out Line 66 insrc/consts.py.
script/generate_images_without_watermark.sh: Pure Stable Diffusion, no watermarkscript/*tr.sh: Pure Tree-ring Watermark, including generation and verificationscript/*gs.sh: Pure Gaussian shading Watermark, including generation and verificationscript/*prc.sh: Pure PRC Watermark, including generation and verification
Detect/detectgs.py, Detect/detecttr.py and Detect/detectprc.py.
src, public libraries about watermarkmain, each step in pipelinescript, the script to run all steps in the pipeline as-a-wholeDetect, the script that read the result and do statisticsresult, generated results
gen, Generated samples and generation parameterspert, Perturbed sampleveri/request, images to verify. Put your images here.veri/witness, temp files during generation, such as z0, zt, msg...veri/result, results
After generating the dataset required by GhostVAE, you can use script/rename_images.py to rename the images as 0.png, 1.png,..., into the Dataset/finetune_img/xxx, which facilitates subsequent training in GhostVAE_pipeline.
conda create -n vae-backdoor python=3.11.9
conda activate vae-backdoor
pip install -r requirements_ghostvae.txt
--data_dir: folder containing training images, which can be generated viaDiffusion_watermark--save_noise_path: where to save the trigger checkpoint (.pt)
python Backdoor_train/robust_universal_noise_finding.py \
--data_dir Dataset/finetune_img/SD2.1 \
--save_noise_path outputs/trigger/trigger_ckpt.pt \
--img_size 512 \
--batch_size 16 \
--epochs 10 \
--lr 1e-4 outputs/trigger/trigger_ckpt.pt(checkpoint dict)outputs/trigger/trigger_ckpt_delta_only.pt(delta-only tensor; recommended for downstream use)
In the following sections, <PATH_TO_TRIGGER_NOISE_PT> refers to the path of the outputs/trigger/trigger_ckpt_delta_only.pt.
--clean_dir: folder containing training images, which can be generated viaDiffusion_watermark--trigger_noise_path: delta-only trigger tensor from Stage 1 (*_delta_only.pt)--save_full_vae_dir: directory to save the full VAE (save_pretrainedformat)--save_encoder_only_path: path to save encoder-only weights (.pth)
python Backdoor_train/backdoor_train.py \
--clean_dir Dataset/finetune_img/SD2.1 \
--trigger_noise_path outputs/trigger/trigger_ckpt_delta_only.pt \
--save_full_vae_dir outputs/trigger_vae_encoder \
--save_encoder_only_path outputs/encoder_only_final.pth \
--img_size 512 \
--batch_size 8 \
--epochs 4 \
--lr 1e-4 \
--weight_decay 0 \
--lambda_clean 1.0 \
--lambda_trig 1.0 \
--lambda_mmd 50.0 outputs/
├── trigger_vae_encoder/
│ ├── config.json
│ ├── diffusion_pytorch_model.safetensors
│ ├── train_config.json
│ └── train_log.json
└── encoder_only_final.pth
In the following sections, <PATH_TO_VAE_DIR> refers to the path of the full backdoored VAE directory, i.e., outputs/trigger_vae_encoder.
For each defense level, we provide the usage of one representative defense method. For the other defense methods, the default parameters are specified in the corresponding Python files; you only need to complete the required paths before running them.
We provide DiffPure as a representative input-level defense. When using DiffPure, first download the model checkpoint from Download link and place it under GhostVAE_pipeline/Defense_method/input_level. Then run:
python Defense_method/input_level/diffpure_defense.py \
--image_root <PATH_TO_WATERMARK_IMAGE_DIR> \
--keys_pkl Defense_method/input_level/PRC_decode/keys.pkl \
--universal_noise_path <PATH_TO_TRIGGER_NOISE_PT> \
--model_id <PATH_TO_SD_MODEL> \
--vae_dir <PATH_TO_VAE_DIR> \
--diffpure_config Defense_method/input_level/imagenet.yml \
--diffpure_model_dir Defense_method/input_level \
--out_dir <OUTPUT_DIR>We use CLP as a representative parameter-level defense:
python Defense_method/parameter_level/CLP.py \
--input_vae_dir <PATH_TO_VAE_DIR> \
--output_dir <OUTPUT_DIR>We use the Kolmogorov-Smirnov (KS) test as a representative latent-level detection defense. It compares the latent distribution of clean inputs against that of trigger-perturbed inputs to detect anomalous behavior in the VAE:
python Defense_method/latent_level/Kolmogorov_Smirnov_test.py \
--ref_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
--test_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
--backdoor_vae_dir <PATH_TO_VAE_DIR> \
--noise_path <PATH_TO_TRIGGER_NOISE_PT>You can also use the single-image mode:
python Defense_method/latent_level/Kolmogorov_Smirnov_test.py \
--ref_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
--test_image <PATH_TO_SINGLE_WATERMARK_IMAGE> \
--backdoor_vae_dir <PATH_TO_VAE_DIR> \
--noise_path <PATH_TO_TRIGGER_NOISE_PT> \
--query_num_images 1 \
--query_num_images_end 1To evaluate GhostVAE after applying defenses, follow the instructions in Diffusion_watermark to run the corresponding watermark detection and performance evaluation used in the paper.
The code for generating PRC watermarked images is adapted from https://github.com/XuandongZhao/PRC-Watermark.