Skip to content

1234wangtr/PRC_estimator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cryptanalysis of LDPC-Based Pseudorandom Error-Correcting Codes

This is the official artifacts repository for the paper "Cryptanalysis of LDPC-Based Pseudorandom Error-Correcting Codes" (accepted by USENIX Security 2026 at Cycle 2 with Paper ID #1773).

In this paper, we analyze the security of pseudorandom error-correcting codes (PRCs), and propose three attacks against their undetectability and robustness properties. Since PRCs are used for watermarking generative content, our attacks can be used to detect and remove those watermarks, revealing the concrete security limits of PRCs in real-world watermarking applications.

This artifact contains the code for both thoretical analysis of our attacks, along with the attack against real-world PRC watermark schemes for Large Language Models (LLMs) and Generative Image Models (GIMs). We also provide the example data for reproducing the results in our paper.

System Requirements

  • Software: an Linux-based environment with conda, git, and unzip installed.
  • Hardware: Generating PRC watermarked content requires a GPU with at least 80GB of VRAM. However, the attacks and result analysis can be reproduced on a CPU-only machine with the provided data artifacts.

Setup Instructions

  • Download the artifacts from Zenodo or directly clone the repository from GitHub.

    git clone https://github.com/1234wangtr/PRC_estimator
  • Unzip the example data artifacts

    cd prc-estimator
    unzip gim/data/SD21_t3.zip -d gim/data/SD21_t3_example
    unzip llm/data/Deepseek_t_3_temp_1.8.zip -d llm/data/Deepseek_t_3_temp_1.8_example
    unzip llm/data/Deepseek_t_3_temp_all.zip -d llm/data
  • Create the conda environment for llm and gim.

    conda env create -f setup/environment-llm.yaml
    conda env create -f setup/environment-gim.yaml
  • Download the publicly available models and datasets to .models.

    These models and datasets can be downloaded by:

    setup/get_gim.sh # For GIM main experiments
    setup/get_llm.sh # For LLM main experiments
    setup/get_gim_ablation_models.sh # For GIM ablation experiments
    setup/get_llm_ablation_models.sh # For LLM ablation experiments

Project Structure

The artifacts are seperated into two main folders: llm for the LLM-based experiments and gim for the GIM-based experiments. Each folder is organized as follows:

llm / gim
├── requirements.txt # Python dependencies for the experiments 
├── security_estim # Code for estimating the time complexities of our attacks
├── generation # Code for generating PRC watermarked content
├── data # Example watermarked content and necessary data for reproducing the attack results
└── attack # Code for the concrete attacks

Concrete Time Complexity Analysis (Section 5, Figure 4, Table 6 and Table 7)

The scripts for estimating the complexities of our attacks in Section 5 are located in <gim or llm>/security_estim/main.py directory.

The estimation can be reproduced by running the following commands:

conda run -n prc-estimator-llm python llm/security_estim/main.py # Figure 4(a), Table 6
conda run -n prc-estimator-gim python gim/security_estim/main.py # Figure 4(b), Table 7

This will reproduce the following results:

  • llm/data/security_estim.pdf (Figure 4(a))
  • gim/data/security_estim.pdf (Figure 4(b))
  • llm/data/security_estim.csv (Table 6)
  • gim/data/security_estim.csv (Table 7)

Attack Real-World PRC Watermark Schemes (Section 6)

PRC Watermark for LLMs

Watermarked Content Generation (Section 6.1)

To generate the watermarked texts, you can run the following scripts:

conda activate prc-estimator-llm
CUDA_VISIBLE_DEVICES=0 python llm/generation/main.py --prc_t 3 --model_name Deepseek --temperature 1.8 --start 0 --end 16 # 64GB VRAM, 30 min/file * 10 files
CUDA_VISIBLE_DEVICES=0 python llm/generation/main.py --prc_t 4 --model_name Deepseek --temperature 1.8 --start 0 --end 16 # 64GB VRAM, 30 min/file * 10 files

This will generate watermarked texts and the associated PRC data in the data/llm/<model_name>_t_<prc_t>_temp_<temperature> directory.

Concrete Attacks (Section 6.2, Table 3)

The implementations of the Attack I and Attack II are provided in llm/attack.

To run the attacks and reproduce Table 3, you can run the following scripts:

conda activate prc-estimator-llm
python llm/attack/attack1_2_main.py llm/data/Deepseek_t_3_temp_1.8

This will output the corresponding data for Table 3 in the terminal.

Watermarked Content Quality Analysis (Section 6.2.1, Figure 5, and Table 9)

The LLM-generated watermarked and non-watermarked content under different temperatures are unzipped to llm/data/gen_result.

To reproduce Figure 5, you can run the following command:

conda run -n prc-estimator-llm \
  python llm/generation/plot_entropy.py

This will generate the following figure:

  • llm/data/avg_entropy_vs_correct_rate.pdf (Figure 5)

To reproduce Table 9, you can manually check the following JSON files:

  • llm/data/gen_result/temperature_1.0/1748357173909375233.json
  • llm/data/gen_result/temperature_1.2/1748402150078932357.json
  • llm/data/gen_result/temperature_1.4/1748402173162096394.json
  • llm/data/gen_result/temperature_1.6/1748402254743413050.json
  • llm/data/gen_result/temperature_1.8/1748402215940443270.json

PRC Watermark for GIMs

Watermarked Content Generation (Section 6.1)

To generate the watermarked images, you can run the following scripts:

conda activate prc-estimator-gim
CUDA_VISIBLE_DEVICES=0 python gim/generation/main.py --gen_model_id SD21 --inv_model_ids SD21 --prc_t 3 --start 0 --end 10 # 16GB VRAM, 15 min/file * 10 files
CUDA_VISIBLE_DEVICES=0 python gim/generation/main.py --gen_model_id SD21 --inv_model_ids SD15,SD2,SD21 --prc_t 4 --start 0 --end 10 # 16GB VRAM, 15 min/file * 10 files

This will generate watermarked images and the associated PRC data in the gim/data/<model_id>_t<prc_t> directory.

Concrete Attacks (Section 6.3, Table 4, Table 8)

The implementations of the Attack I and Attack II are provided in gim/attack. To run the attacks and reproduce Table 4, you can run the following scripts:

conda activate prc-estimator-gim
python gim/attack/attack1_2_main.py gim/data/SD21_t3

This will output the corresponding data for Table 4 in the terminal.

To reproduce Table 8, you can run the following scripts:

conda activate prc-estimator-gim
python gim/attack/attack1_2_diff_inv_main.py

This will output the corresponding data for Table 8 in the terminal.

Watermark Removal Content Quality Analysis (Added in Rebuttal)

Since Attack III requires significantly more computational resources, we directly extract imtermedia results of PRC and only test whether Attack III will degrade the image quality. The data extraction and watermark removal can be reproduced by running the following scripts:

conda activate prc-estimator-gim
CUDA_VISIBLE_DEVICES=0 python gim/generation/main-for-attack3.py --prc_t 3 --model_id SD21 --start 0 --end 10  # Extract the intermediate data for Attack III, 16GB VRAM, 2 min for 10 files
CUDA_VISIBLE_DEVICES=0 python gim/attack/attack3_main.py gim/data/attack3_SD21_t3 --start 0 --end 10 --model_id SD21 --eps 16  # Remove the watermark using results from Attack III, 80GB VRAM, 6 min/file * 10 files
python gim/attack/attack3_stati.py gim/data/attack3_SD21_t3/inv_lat_16.0  # Calculate the attack statistics

This will output the success rate of watermark removal under different image quality budgets.

Acknowledgements

The code for generating PRC watermarked images is adapted from https://github.com/XuandongZhao/PRC-Watermark.

@inproceedings{WWCRLW26,
    author = {Tianrui Wang and Anyu Wang and Tianshuo Cong and Delong Ran and Jinyuan Liu and Xiaoyun Wang},
    title = {Cryptanalysis of LDPC-Based Pseudorandom Error-Correcting Codes},
    booktitle = {USENIX Security} ,
    year = {2026}
}

About

[USENIX Security' 26] Cryptanalysis of LDPC-Based Pseudorandom Error-Correcting Codes

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors