Skip to content

CryptoAILab/GhostVAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

GhostVAE

DOI

This repository contains the official code for the paper "Robust Watermarks Meet Backdoored Models: Evading Diffusion Semantic Watermarks via Stealthy Backdoor", accepted by USENIX Security Symposium 2026.

@inproceedings{LCLWHWW26,
    author = {Jinyuan Liu and Tianshuo Cong and Pei Li and Tianrui Wang and Xinlei He and Anyu Wang and Xiaoyun Wang},
    title = {Robust Watermarks Meet Backdoored Models: Evading Diffusion Semantic Watermarks via Stealthy Backdoor},
    booktitle = {USENIX Security Symposium} ,
    year = {2026}
}

In this paper, we propose GhostVAE, a stealthy VAE encoder backdoor that achieves watermark evasion under trigger activation while preserving detection behavior on benign inputs. You can reproduce the results in this paper by following the instructions. The Diffusion_watermark directory provides the core utilities for image generation and watermark detection. The GhostVAE_pipeline directory contains the implementation of GhostVAE, including backdoor training, defense against GhostVAE.

In Diffusion_watermark

Install

conda create -n dif python=3.12 
conda activate dif
pip install -r requirements_dif.txt

Config

  1. Change script/gpu_run.sh for your CONDA_PATH
  2. Update the local model path in src/consts.py at Line 36. To use a custom VAE, such as GhostVAE, update the VAE path in src/consts.py at Line 51 and add its name to ModelEnum at Line 17.
  3. We use 24 cores for CPU tasks by default. Use export DIFF_WATERMARK_PROCESSES=xx to change it.
  4. We use 8 cards for GPU tasks by default. Change it in script/*.sh as you wish.
  5. To run a diffusion model, specify the target model in main/setup/generate_requests.py at Line 34 and the corresponding scheduler at Line 39. In addition, when running SD-2.1 or SD-XL, comment out Line 66 in src/consts.py.

Run

  • script/generate_images_without_watermark.sh: Pure Stable Diffusion, no watermark
  • script/*tr.sh: Pure Tree-ring Watermark, including generation and verification
  • script/*gs.sh: Pure Gaussian shading Watermark, including generation and verification
  • script/*prc.sh: Pure PRC Watermark, including generation and verification

Show watermark detection results

Detect/detectgs.py, Detect/detecttr.py and Detect/detectprc.py.

Project Structure

  • src, public libraries about watermark
  • main, each step in pipeline
  • script, the script to run all steps in the pipeline as-a-whole
  • Detect, the script that read the result and do statistics
  • result, generated results

Result Structure

  • gen, Generated samples and generation parameters
  • pert, Perturbed sample
  • veri/request, images to verify. Put your images here.
  • veri/witness, temp files during generation, such as z0, zt, msg...
  • veri/result, results

After generating the dataset required by GhostVAE, you can use script/rename_images.py to rename the images as 0.png, 1.png,..., into the Dataset/finetune_img/xxx, which facilitates subsequent training in GhostVAE_pipeline.

In GhostVAE_pipeline

Install

conda create -n vae-backdoor python=3.11.9
conda activate vae-backdoor
pip install -r requirements_ghostvae.txt

Generate the Universal Trigger Noise

  • --data_dir: folder containing training images, which can be generated via Diffusion_watermark
  • --save_noise_path: where to save the trigger checkpoint (.pt)

Example command

python Backdoor_train/robust_universal_noise_finding.py \
  --data_dir Dataset/finetune_img/SD2.1 \
  --save_noise_path outputs/trigger/trigger_ckpt.pt \
  --img_size 512 \
  --batch_size 16 \
  --epochs 10 \
  --lr 1e-4 

Expected outputs

  • outputs/trigger/trigger_ckpt.pt (checkpoint dict)
  • outputs/trigger/trigger_ckpt_delta_only.pt (delta-only tensor; recommended for downstream use)

In the following sections, <PATH_TO_TRIGGER_NOISE_PT> refers to the path of the outputs/trigger/trigger_ckpt_delta_only.pt.

Train the Backdoored VAE Encoder Using the Trigger

  • --clean_dir: folder containing training images, which can be generated via Diffusion_watermark
  • --trigger_noise_path: delta-only trigger tensor from Stage 1 (*_delta_only.pt)
  • --save_full_vae_dir: directory to save the full VAE (save_pretrained format)
  • --save_encoder_only_path: path to save encoder-only weights (.pth)

Example command

python Backdoor_train/backdoor_train.py \
  --clean_dir Dataset/finetune_img/SD2.1 \
  --trigger_noise_path outputs/trigger/trigger_ckpt_delta_only.pt \
  --save_full_vae_dir outputs/trigger_vae_encoder \
  --save_encoder_only_path outputs/encoder_only_final.pth \
  --img_size 512 \
  --batch_size 8 \
  --epochs 4 \
  --lr 1e-4 \
  --weight_decay 0 \
  --lambda_clean 1.0 \
  --lambda_trig 1.0 \
  --lambda_mmd 50.0 

Expected outputs

outputs/
├── trigger_vae_encoder/
│   ├── config.json
│   ├── diffusion_pytorch_model.safetensors
│   ├── train_config.json
│   └── train_log.json
└── encoder_only_final.pth

In the following sections, <PATH_TO_VAE_DIR> refers to the path of the full backdoored VAE directory, i.e., outputs/trigger_vae_encoder.

Apply Defenses on the Backdoored VAE

For each defense level, we provide the usage of one representative defense method. For the other defense methods, the default parameters are specified in the corresponding Python files; you only need to complete the required paths before running them.

Input_level Defense

We provide DiffPure as a representative input-level defense. When using DiffPure, first download the model checkpoint from Download link and place it under GhostVAE_pipeline/Defense_method/input_level. Then run:

python Defense_method/input_level/diffpure_defense.py \
  --image_root <PATH_TO_WATERMARK_IMAGE_DIR> \
  --keys_pkl Defense_method/input_level/PRC_decode/keys.pkl \
  --universal_noise_path <PATH_TO_TRIGGER_NOISE_PT> \
  --model_id <PATH_TO_SD_MODEL> \
  --vae_dir <PATH_TO_VAE_DIR> \
  --diffpure_config Defense_method/input_level/imagenet.yml \
  --diffpure_model_dir Defense_method/input_level \
  --out_dir <OUTPUT_DIR>

Parameter_level Defense

We use CLP as a representative parameter-level defense:

python Defense_method/parameter_level/CLP.py \
  --input_vae_dir <PATH_TO_VAE_DIR> \
  --output_dir <OUTPUT_DIR>

Latent_level Defense

We use the Kolmogorov-Smirnov (KS) test as a representative latent-level detection defense. It compares the latent distribution of clean inputs against that of trigger-perturbed inputs to detect anomalous behavior in the VAE:

python Defense_method/latent_level/Kolmogorov_Smirnov_test.py \
  --ref_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
  --test_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
  --backdoor_vae_dir <PATH_TO_VAE_DIR> \
  --noise_path <PATH_TO_TRIGGER_NOISE_PT>

You can also use the single-image mode:

python Defense_method/latent_level/Kolmogorov_Smirnov_test.py \
  --ref_dir <PATH_TO_WATERMARK_IMAGE_DIR> \
  --test_image <PATH_TO_SINGLE_WATERMARK_IMAGE> \
  --backdoor_vae_dir <PATH_TO_VAE_DIR> \
  --noise_path <PATH_TO_TRIGGER_NOISE_PT> \
  --query_num_images 1 \
  --query_num_images_end 1

To evaluate GhostVAE after applying defenses, follow the instructions in Diffusion_watermark to run the corresponding watermark detection and performance evaluation used in the paper.

Acknowledgements

The code for generating PRC watermarked images is adapted from https://github.com/XuandongZhao/PRC-Watermark.

About

[USENIX Security'26] Robust Watermarks Meet Backdoored Models: Evading Diffusion Semantic Watermarks via Stealthy Backdoor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors