Skip to content

zuojr/Delegate_UCB

Repository files navigation

Delegate-UCB Controlled Simulation

This repository contains code and reproducibility artifacts for a controlled LLM-assisted human-machine delegation simulation. The experiment studies selective audit as a way to learn a residual correction to a biased base predictor b0(x) under audit costs.

The main experiment is controlled: LLMs may be used to create realistic task contexts and base predicted quality values, but the latent reward model is synthetic and known to the experiment runner. The main experiment does not use an LLM judge.

Repository Structure

configs/       Experiment configs
prompts/       Optional API prompt templates
scripts/       Data, experiment, plotting, and utility scripts
src/           Python package implementation
tests/         Unit tests
docs/          Reproducibility notes, test report, and result notes
artifacts/     Curated small formal figures and summaries for GitHub upload

Large generated round-level logs are intentionally not part of the curated artifact library.

Setup

python -m venv .venv
.venv/Scripts/activate  # Windows PowerShell users can run: .venv\Scripts\Activate.ps1
pip install -r requirements.txt

Tests

python -m pytest

Current local test report: docs/test_report.md.

Local Fallback Smoke Run

The fallback path is deterministic and does not require an API key:

python scripts/run_all_local.py --config configs/pilot.yaml

This writes fallback tasks, a controlled dataset, CSV results, and PDF plots under outputs/.

Formal Experiment Artifacts

The preserved formal run is:

formal_strict_1000_twobranch_lambda02

Key settings:

  • T = 1000
  • domains: billing, refund, technical, compliance
  • feature normalization: x_t = phi_raw / sqrt(5)
  • normalized-coordinate ridge lambda: 0.2
  • raw-equivalent lambda: 1
  • beta_0^emp = 0.25
  • beta_h^emp = 0.35
  • sigma = 0.03
  • use_llm_judge = false
  • b0 = clip(predicted_quality_raw, 0.2, 0.9)
  • Delegate-UCB uses the two-branch audit rule from Algorithm 1 with an empirical hard cap
  • final methods: Delegate-UCB, No-audit, Random-audit UCB

Main paper figures are copied to:

artifacts/formal_strict_1000_twobranch_lambda02/figures/

The result summary is documented in:

docs/results/formal_strict_1000_twobranch_lambda02.md

High-level conclusions are in:

docs/results/experiment_conclusions.md

Regenerate Formal Figures Without API Calls

If the existing saved results are present, regenerate plots only with:

python scripts/plot_results.py \
  --paper \
  --config configs/formal_strict_1000_normalized.yaml \
  --results results/formal_strict_1000_twobranch_lambda02/results.csv \
  --dataset data/formal_strict_1000_normalized_lambda02/controlled_dataset.csv \
  --outdir figures/formal_strict_1000_twobranch_lambda02 \
  --main-budget 200 \
  --main-methods Delegate-UCB No-audit "Random-audit UCB" \
  --appendix-methods Delegate-UCB No-audit "Random-audit UCB"

This command reads existing CSVs only. It does not call APIs, rebuild datasets, or rerun the simulation. See docs/reproducibility/reproduce_formal_figures.md.

Optional Full Formal Regeneration

Full regeneration is optional and should be done only when explicitly needed. If cached LLM task and base-score files exist, the full experiment can be run offline. If they are missing, regeneration is API-dependent.

See:

docs/reproducibility/reproduce_full_formal_experiment.md

Optional API Generation

Create a local .env only when intentionally running API scripts. Never commit .env.

cp .env.example .env
# edit .env and add OPENAI_API_KEY=...

The Qwen/OpenAI-compatible scripts read OPENAI_API_KEY from the environment or .env, cache outputs, support dry runs, and refuse to overwrite existing outputs unless --overwrite is passed.

Implementation Notes

  • Empirical UCB uses fixed exploration multipliers, not determinant-based theoretical confidence radii.
  • sigma = 0.03 is used for simulated machine audit noise and human feedback noise, but not to set beta.
  • Random-audit UCB uses the same UCB routing indices and ridge updates as Delegate-UCB.
  • Random-audit UCB selects audit online with probability remaining_budget / remaining_rounds after routing is computed.
  • Hard audit caps are empirical budgeted variants; theoretical guarantees are stated for uncapped tau-admissible audit rules.

Security

  • .env and .env.* are ignored.
  • .env.example is safe to track.
  • API keys are never hard-coded.
  • API scripts do not print keys or request headers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages