This is the official repository of the ACL 2026 Findings paper InferPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents.
InferPilot is an agentic benchmark for evaluating LLM-driven machine learning privacy attacks. It provides a structured environment where an LLM-based agent autonomously plans and executes inference-time attacks against black-box ML models.
InferPilot covers five attack tasks:
| Task | Description |
|---|---|
| Membership Inference (MIA) | Determine whether a data sample was used to train the target model |
| Attribute Inference | Infer sensitive attributes of input data from model predictions |
| Data Reconstruction | Reconstruct training data from model outputs via inversion attacks |
| Model Stealing | Extract a functionally equivalent surrogate model from query access |
| All-in-One | A meta-task where a ControllerAgent autonomously selects and coordinates multiple attacks |
inferpilot/
├── runner.py # Main entry point
├── run_exp.sh # Experiment runner script
├── env.py # Benchmark environment
├── LLM.py # LLM API wrappers (OpenAI, Claude, etc.)
├── schema.py # Data schemas (Action, Step, Trace, ...)
├── low_level_actions.py # Primitive environment actions (read, write, execute)
├── high_level_actions.py # High-level agent tools (edit script, understand file, ...)
├── logger.py # Logging utility for target model training
├── prepare_dataset.py # Prepare raw datasets into target/shadow .pt splits
├── agents/
│ ├── agent.py # Base agent class
│ ├── attack_agent.py # AttackAgent for individual attack tasks
│ └── controller_agent.py # ControllerAgent for all-in-one coordination
├── targets/
│ ├── target_service.py # Flask server exposing the target model API
│ ├── train_target_model.py # Script to train target models
│ ├── prepare_target_dataset.py # Script to prepare target dataset splits
│ ├── target_dataset.py # Dataset utilities
│ ├── custom_datasets/ # Dataset loaders for AFAD, CelebA, UTKFace
│ ├── model_pool/ # Model architecture definitions
│ └── assets/ # (generated) trained models and dataset splits
├── benchmarks/
│ ├── mia/ # Membership inference task
│ │ ├── env/ # Attack scripts and resources
│ │ └── scripts/ # Prompt template and config
│ ├── attr_infer/ # Attribute inference task
│ ├── data_recon/ # Data reconstruction task
│ ├── model_steal/ # Model stealing task
│ └── all_in_one/ # Combined multi-attack task
└── task_configs/ # Task configuration JSON files
├── mia/
├── attr_infer/
├── data_recon/
├── model_steal/
└── all_in_one/
pip install anthropic openai tiktoken torch torchvision dacite flask tqdm timm pandas pillow nltk rouge-score bert-score numpy matplotlib scikit-learn torch==2.5.1 torchvision==0.20.1Set up your API keys:
echo "YOUR_OPENAI_KEY" > openai_api_key.txt # format: org_id:api_key
echo "YOUR_ANTHROPIC_KEY" > claude_api_key.txtEach experiment requires a running target model service. Setup involves two steps.
Step 1 — Prepare datasets
CIFAR-10, STL-10 are downloaded automatically by prepare_dataset.py. The three face datasets require manual download first:
CelebA
- Download from the official page (or Kaggle mirror):
img_align_celeba.zip— aligned face images
- Unzip and place the image folder at
data/celeba/img_align_celeba/ - Place attribute/partition CSV files (
list_attr_celeba.csv,list_eval_partition.csv) atdata/celeba/
UTKFace
- Download from the official page — get the aligned image archive
- Place all
.jpgimages atdata/utkface/UTKFace/ - Place
utkface_attr.csvfile atdata/utkface/
AFAD
- Download from GitHub — get
AFAD-Full - Place the image folder at
data/afad/AFAD-Full/ - Place
afad_attr.csvatdata/afad/
Once face datasets are placed correctly, run:
python prepare_dataset.pyThis splits raw data into target .pt files under data/target/ and shadow .pt files under data/shadow/. The shadow datasets are used by the agent to train shadow models during attacks.
Step 2 — Train a target model
python targets/train_target_model.py \
--dataset_name utkface \
--model_name resnet18 \
--size 5000 \
--save_dir targets/assets \
--num_epochs 300This produces:
targets/assets/models/<dataset>_<model>_<size>_target_model_final.pthtargets/assets/datasets/<dataset>_<size>_target_train.pttargets/assets/datasets/<dataset>_<size>_target_test.pt
Once assets are ready, run_exp.sh will copy them into the workspace and start the target service automatically.
Supported datasets: cifar10, stl10, utkface, celeba, afad
Supported models: cnn, resnet18, resnet50, xception
Notes:
attr_inferis only supported for face datasets (utkface,celeba,afad).model_stealtask configs do not require a--modelargument (model architecture is not needed for black-box stealing). Run as:bash run_exp.sh <dataset> cnn model_steal.
# Single task experiment
bash run_exp.sh <dataset> <model> <task>
# Examples
bash run_exp.sh cifar10 cnn mia
bash run_exp.sh celeba resnet18 attr_infer
bash run_exp.sh cifar10 resnet50 data_recon
bash run_exp.sh cifar10 cnn model_steal
bash run_exp.sh cifar10 resnet18 all_in_one