Skip to content

TrustAIRLab/InferPilot

Repository files navigation

InferPilot

This is the official repository of the ACL 2026 Findings paper InferPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents.

InferPilot is an agentic benchmark for evaluating LLM-driven machine learning privacy attacks. It provides a structured environment where an LLM-based agent autonomously plans and executes inference-time attacks against black-box ML models.

Overview

InferPilot covers five attack tasks:

Task Description
Membership Inference (MIA) Determine whether a data sample was used to train the target model
Attribute Inference Infer sensitive attributes of input data from model predictions
Data Reconstruction Reconstruct training data from model outputs via inversion attacks
Model Stealing Extract a functionally equivalent surrogate model from query access
All-in-One A meta-task where a ControllerAgent autonomously selects and coordinates multiple attacks

Repository Structure

inferpilot/
├── runner.py                        # Main entry point
├── run_exp.sh                       # Experiment runner script
├── env.py                           # Benchmark environment
├── LLM.py                           # LLM API wrappers (OpenAI, Claude, etc.)
├── schema.py                        # Data schemas (Action, Step, Trace, ...)
├── low_level_actions.py             # Primitive environment actions (read, write, execute)
├── high_level_actions.py            # High-level agent tools (edit script, understand file, ...)
├── logger.py                        # Logging utility for target model training
├── prepare_dataset.py               # Prepare raw datasets into target/shadow .pt splits
├── agents/
│   ├── agent.py                     # Base agent class
│   ├── attack_agent.py              # AttackAgent for individual attack tasks
│   └── controller_agent.py         # ControllerAgent for all-in-one coordination
├── targets/
│   ├── target_service.py            # Flask server exposing the target model API
│   ├── train_target_model.py        # Script to train target models
│   ├── prepare_target_dataset.py    # Script to prepare target dataset splits
│   ├── target_dataset.py            # Dataset utilities
│   ├── custom_datasets/             # Dataset loaders for AFAD, CelebA, UTKFace
│   ├── model_pool/                  # Model architecture definitions
│   └── assets/                      # (generated) trained models and dataset splits
├── benchmarks/
│   ├── mia/                         # Membership inference task
│   │   ├── env/                     # Attack scripts and resources
│   │   └── scripts/                 # Prompt template and config
│   ├── attr_infer/                  # Attribute inference task
│   ├── data_recon/                  # Data reconstruction task
│   ├── model_steal/                 # Model stealing task
│   └── all_in_one/                  # Combined multi-attack task
└── task_configs/                    # Task configuration JSON files
    ├── mia/
    ├── attr_infer/
    ├── data_recon/
    ├── model_steal/
    └── all_in_one/

Requirements

pip install anthropic openai tiktoken torch torchvision dacite flask tqdm timm pandas pillow nltk rouge-score bert-score numpy matplotlib scikit-learn torch==2.5.1 torchvision==0.20.1

Set up your API keys:

echo "YOUR_OPENAI_KEY" > openai_api_key.txt        # format: org_id:api_key
echo "YOUR_ANTHROPIC_KEY" > claude_api_key.txt

Target Service Setup

Each experiment requires a running target model service. Setup involves two steps.

Step 1 — Prepare datasets

CIFAR-10, STL-10 are downloaded automatically by prepare_dataset.py. The three face datasets require manual download first:

CelebA

  1. Download from the official page (or Kaggle mirror):
    • img_align_celeba.zip — aligned face images
  2. Unzip and place the image folder at data/celeba/img_align_celeba/
  3. Place attribute/partition CSV files (list_attr_celeba.csv, list_eval_partition.csv) at data/celeba/

UTKFace

  1. Download from the official page — get the aligned image archive
  2. Place all .jpg images at data/utkface/UTKFace/
  3. Place utkface_attr.csv file at data/utkface/

AFAD

  1. Download from GitHub — get AFAD-Full
  2. Place the image folder at data/afad/AFAD-Full/
  3. Place afad_attr.csv at data/afad/

Once face datasets are placed correctly, run:

python prepare_dataset.py

This splits raw data into target .pt files under data/target/ and shadow .pt files under data/shadow/. The shadow datasets are used by the agent to train shadow models during attacks.

Step 2 — Train a target model

python targets/train_target_model.py \
    --dataset_name utkface \
    --model_name resnet18 \
    --size 5000 \
    --save_dir targets/assets \
    --num_epochs 300

This produces:

  • targets/assets/models/<dataset>_<model>_<size>_target_model_final.pth
  • targets/assets/datasets/<dataset>_<size>_target_train.pt
  • targets/assets/datasets/<dataset>_<size>_target_test.pt

Once assets are ready, run_exp.sh will copy them into the workspace and start the target service automatically.

Supported datasets: cifar10, stl10, utkface, celeba, afad
Supported models: cnn, resnet18, resnet50, xception

Notes:

  • attr_infer is only supported for face datasets (utkface, celeba, afad).
  • model_steal task configs do not require a --model argument (model architecture is not needed for black-box stealing). Run as: bash run_exp.sh <dataset> cnn model_steal.

Running Experiments

# Single task experiment
bash run_exp.sh <dataset> <model> <task>

# Examples
bash run_exp.sh cifar10 cnn mia
bash run_exp.sh celeba resnet18 attr_infer
bash run_exp.sh cifar10 resnet50 data_recon
bash run_exp.sh cifar10 cnn model_steal
bash run_exp.sh cifar10 resnet18 all_in_one

About

This is the official repository of the ACL 2026 Findings paper: InferPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors