GhostVAE

This repository contains the official code for the paper "Robust Watermarks Meet Backdoored Models: Evading Diffusion Semantic Watermarks via Stealthy Backdoor", accepted by USENIX Security Symposium 2026.

@inproceedings{LCLWHWW26,
    author = {Jinyuan Liu and Tianshuo Cong and Pei Li and Tianrui Wang and Xinlei He and Anyu Wang and Xiaoyun Wang},
    title = {Robust Watermarks Meet Backdoored Models: Evading Diffusion Semantic Watermarks via Stealthy Backdoor},
    booktitle = {USENIX Security Symposium} ,
    year = {2026}
}

In this paper, we propose GhostVAE, a stealthy VAE encoder backdoor that achieves watermark evasion under trigger activation while preserving detection behavior on benign inputs. You can reproduce the results in this paper by following the instructions. The Diffusion_watermark directory provides the core utilities for image generation and watermark detection. The GhostVAE_pipeline directory contains the implementation of GhostVAE, including backdoor training, defense against GhostVAE.

In Diffusion_watermark

Install

conda create -n dif python=3.12 
conda activate dif
pip install -r requirements_dif.txt

Config

Change script/gpu_run.sh for your CONDA_PATH
Update the local model path in src/consts.py at Line 36. To use a custom VAE, such as GhostVAE, update the VAE path in src/consts.py at Line 51 and add its name to ModelEnum at Line 17.
We use 24 cores for CPU tasks by default. Use export DIFF_WATERMARK_PROCESSES=xx to change it.
We use 8 cards for GPU tasks by default. Change it in script/*.sh as you wish.
To run a diffusion model, specify the target model in main/setup/generate_requests.py at Line 34 and the corresponding scheduler at Line 39. In addition, when running SD-2.1 or SD-XL, comment out Line 66 in src/consts.py.

Run

script/generate_images_without_watermark.sh: Pure Stable Diffusion, no watermark
script/*tr.sh: Pure Tree-ring Watermark, including generation and verification
script/*gs.sh: Pure Gaussian shading Watermark, including generation and verification
script/*prc.sh: Pure PRC Watermark, including generation and verification

Show watermark detection results

Detect/detectgs.py, Detect/detecttr.py and Detect/detectprc.py.

Project Structure

src, public libraries about watermark
main, each step in pipeline
script, the script to run all steps in the pipeline as-a-whole
Detect, the script that read the result and do statistics
result, generated results

Result Structure

gen, Generated samples and generation parameters
pert, Perturbed sample
veri/request, images to verify. Put your images here.
veri/witness, temp files during generation, such as z0, zt, msg...
veri/result, results

After generating the dataset required by GhostVAE, you can use script/rename_images.py to rename the images as 0.png, 1.png,..., into the Dataset/finetune_img/xxx, which facilitates subsequent training in GhostVAE_pipeline.

In GhostVAE_pipeline

Install

conda create -n vae-backdoor python=3.11.9
conda activate vae-backdoor
pip install -r requirements_ghostvae.txt

Generate the Universal Trigger Noise

--data_dir: folder containing training images, which can be generated via Diffusion_watermark
--save_noise_path: where to save the trigger checkpoint (.pt)

Example command

python Backdoor_train/robust_universal_noise_finding.py \
  --data_dir Dataset/finetune_img/SD2.1 \
  --save_noise_path outputs/trigger/trigger_ckpt.pt \
  --img_size 512 \
  --batch_size 16 \
  --epochs 10 \
  --lr 1e-4

Expected outputs

outputs/trigger/trigger_ckpt.pt (checkpoint dict)
outputs/trigger/trigger_ckpt_delta_only.pt (delta-only tensor; recommended for downstream use)

In the following sections, <PATH_TO_TRIGGER_NOISE_PT> refers to the path of the outputs/trigger/trigger_ckpt_delta_only.pt.

Train the Backdoored VAE Encoder Using the Trigger

--clean_dir: folder containing training images, which can be generated via Diffusion_watermark
--trigger_noise_path: delta-only trigger tensor from Stage 1 (*_delta_only.pt)
--save_full_vae_dir: directory to save the full VAE (save_pretrained format)
--save_encoder_only_path: path to save encoder-only weights (.pth)

Example command

python Backdoor_train/backdoor_train.py \
  --clean_dir Dataset/finetune_img/SD2.1 \
  --trigger_noise_path outputs/trigger/trigger_ckpt_delta_only.pt \
  --save_full_vae_dir outputs/trigger_vae_encoder \
  --save_encoder_only_path outputs/encoder_only_final.pth \
  --img_size 512 \
  --batch_size 8 \
  --epochs 4 \
  --lr 1e-4 \
  --weight_decay 0 \
  --lambda_clean 1.0 \
  --lambda_trig 1.0 \
  --lambda_mmd 50.0

Expected outputs

outputs/
├── trigger_vae_encoder/
│   ├── config.json
│   ├── diffusion_pytorch_model.safetensors
│   ├── train_config.json
│   └── train_log.json
└── encoder_only_final.pth

In the following sections, <PATH_TO_VAE_DIR> refers to the path of the full backdoored VAE directory, i.e., outputs/trigger_vae_encoder.

Apply Defenses on the Backdoored VAE

For each defense level, we provide the usage of one representative defense method. For the other defense methods, the default parameters are specified in the corresponding Python files; you only need to complete the required paths before running them.

Input_level Defense

We provide DiffPure as a representative input-level defense. When using DiffPure, first download the model checkpoint from Download link and place it under GhostVAE_pipeline/Defense_method/input_level. Then run:

python Defense_method/input_level/diffpure_defense.py \
  --image_root <PATH_TO_WATERMARK_IMAGE_DIR> \
  --keys_pkl Defense_method/input_level/PRC_decode/keys.pkl \
  --universal_noise_path <PATH_TO_TRIGGER_NOISE_PT> \
  --model_id <PATH_TO_SD_MODEL> \
  --vae_dir <PATH_TO_VAE_DIR> \
  --diffpure_config Defense_method/input_level/imagenet.yml \
  --diffpure_model_dir Defense_method/input_level \
  --out_dir <OUTPUT_DIR>

Parameter_level Defense

We use CLP as a representative parameter-level defense:

python Defense_method/parameter_level/CLP.py \
  --input_vae_dir <PATH_TO_VAE_DIR> \
  --output_dir <OUTPUT_DIR>

Latent_level Defense

We use the Kolmogorov-Smirnov (KS) test as a representative latent-level detection defense. It compares the latent distribution of clean inputs against that of trigger-perturbed inputs to detect anomalous behavior in the VAE:

python Defense_method/latent_level/Kolmogorov_Smirnov_test.py \
  --ref_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
  --test_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
  --backdoor_vae_dir <PATH_TO_VAE_DIR> \
  --noise_path <PATH_TO_TRIGGER_NOISE_PT>

You can also use the single-image mode:

python Defense_method/latent_level/Kolmogorov_Smirnov_test.py \
  --ref_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
  --test_image <PATH_TO_SINGLE_WATERMARK_IMAGE> \
  --backdoor_vae_dir <PATH_TO_VAE_DIR> \
  --noise_path <PATH_TO_TRIGGER_NOISE_PT> \
  --query_num_images 1 \
  --query_num_images_end 1

To evaluate GhostVAE after applying defenses, follow the instructions in Diffusion_watermark to run the corresponding watermark detection and performance evaluation used in the paper.

Acknowledgements

The code for generating PRC watermarked images is adapted from https://github.com/XuandongZhao/PRC-Watermark.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Diffusion_watermark		Diffusion_watermark
GhostVAE_pipeline		GhostVAE_pipeline
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GhostVAE

In Diffusion_watermark

Install

Config

Run

Show watermark detection results

Project Structure

Result Structure

In GhostVAE_pipeline

Install

Generate the Universal Trigger Noise

Example command

Expected outputs

Train the Backdoored VAE Encoder Using the Trigger

Example command

Expected outputs

Apply Defenses on the Backdoored VAE

Input_level Defense

Parameter_level Defense

Latent_level Defense

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GhostVAE

In Diffusion_watermark

Install

Config

Run

Show watermark detection results

Project Structure

Result Structure

In GhostVAE_pipeline

Install

Generate the Universal Trigger Noise

Example command

Expected outputs

Train the Backdoored VAE Encoder Using the Trigger

Example command

Expected outputs

Apply Defenses on the Backdoored VAE

Input_level Defense

Parameter_level Defense

Latent_level Defense

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages