LPI-B² — Reproducibility Package

Code and pre-computed data to reproduce all experiments and figures in:

"Latent Performance Indicator: a Bootstrap-Bayesian framework for
architecture-relative evaluation under label noise"
A. Ehrenfeld, A. Egaña, G. Díaz, F. Navarro, J. F. Sánchez-Pérez, G. García-Ros

Two modes of use

Mode	What it does	Time
Quick	Reads pre-computed CSVs in `data/` and regenerates every figure	~5 minutes
Full	Runs Stan sampling from scratch and regenerates everything	~6–10 hours (CPU)

For most readers, quick mode is sufficient to verify all paper figures.

Installation

# 1. Clone the repository
git clone https://github.com/<org>/lpi-b2-reproducibility.git
cd lpi-b2-reproducibility

# 2a. Install with pip
pip install -r requirements.txt
pip install -e .          # makes the lpi_b2 package importable

# 2b. Alternative: conda / mamba
conda env create -f environment.yml
conda activate lpi-b2
pip install -e .

CmdStan (the C++ Stan backend) is installed automatically on first use via cmdstanpy. No manual compilation is needed.

Quick mode — regenerate all figures

Each script in figures/ reads pre-computed CSVs from data/ and writes PDFs to outputs/. Run them individually or all at once:

# All figures at once
for f in figures/fig_*.py; do python "$f"; done

# Or individually (examples)
python figures/fig_plate_diagram.py
python figures/fig_two_phase_protocol.py
python figures/fig_noise_capacity_trajectories.py

Figure → script → data mapping

Figure in paper	Script	Input CSVs
Fig S0 — Plate diagram (Methods)	`fig_plate_diagram.py`	(none — pure matplotlib)
Fig 4 — Noise taxonomy	`fig_noise_taxonomy.py`	`paper5_comparison_agg.csv`
Fig 6 — Two-phase LPI-B² protocol	`fig_two_phase_protocol.py`	`paper6_two_phase_raw1.5.csv`
Fig 7 — Full vs Weak capacity	`fig_strong_vs_weak_capacity.py`	`table_multiarch_def.csv`, `table_multiarch_def_sep1_W.csv`, `sample_posteriors.csv`, `sample_posteriors_sep1_W.csv`
Fig 8 — Capacity & noise trajectories	`fig_noise_capacity_trajectories.py`	`table_multiarch_def_yc.csv`, `sample_posteriors_yc.csv`
Fig 11 — Ranking recovery	`fig_ranking_recovery.py`	`postproc_02_classical_vs_lbi.csv`, `table_multiarch_def_*.csv`
Supp Fig S1 — Estimator correlations	`fig_estimator_correlations.py`	`sample_posteriors_yc.csv`, `table_multiarch_def_yc.csv`, `arch_summary.csv`
Supp Fig S2 — Threshold ablation	`fig_threshold_ablation.py`	`table1_noise_robustness.csv`, `table_multiarch_def_yc.csv`
Supp Fig S3 — Posterior geometry	`fig_posterior_geometry.py`	`sample_posteriors.csv`

Note on fig_ranking_recovery.py: this script requires data/postproc_02_classical_vs_lbi.csv. If that file is not present, run figures/fig_estimator_correlations.py first — it generates that intermediate CSV as a side effect.

Full mode — run experiments from scratch

Scripts in experiments/ run the full Stan sampling pipeline and write results to data/ and outputs/. After they finish, run the figures/ scripts as above.

Experiment → figure mapping

Experiment	Script	Stan calls	Est. time	Produces
Multi-arch capacity grid	`exp_multiarch_capacity.py`	~20–30	3–5 h	`table_multiarch_def.csv`, `sample_posteriors.csv`
Two-phase auditing protocol	`exp_two_phase_protocol.py`	~18	2–3 h	`paper6_two_phase_raw*.csv`
Comparison with Confident Learning	`exp_cleanlab_comparison.py`	~72	~2 h	`exp_cleanlab_results.csv`
Prior sensitivity	`exp_prior_sensitivity.py`	12	~1 h	`exp_prior_sensitivity.csv`
Asymmetric noise channels	`exp_asymmetric_noise.py`	~90	2–3 h	`exp_asymmetric_noise.csv`
Real dataset validation	`exp_real_datasets.py`	~20	~1 h	`exp_real_datasets.csv`

All experiment scripts support --dry-run for a quick smoke-test with one condition:

python experiments/exp_multiarch_capacity.py --dry-run
python experiments/exp_cleanlab_comparison.py --dry-run

Repository structure

lpi-b2-reproducibility/
├── lpi_b2/                  # installable Python package
│   ├── __init__.py
│   ├── estimator.py         # LPIB2Evaluator — main sklearn-compatible class
│   ├── stan_utils.py        # CmdStan auto-install helper
│   └── models/              # Stan model files
│       ├── lpi_b2_bootstrap.stan              # main model (baseline priors)
│       ├── lpi_b2_bootstrap_asym_priors.stan
│       ├── lpi_b2_bootstrap_conc_priors.stan
│       └── lpi_b2_bootstrap_flat_priors.stan
│
├── experiments/             # Stan sampling runs (full mode)
├── figures/                 # figure generation from pre-computed CSVs (quick mode)
├── data/                    # pre-computed CSVs (28 files, ~30 MB)
├── outputs/                 # generated figures land here (git-ignored)
│
├── requirements.txt
├── environment.yml
└── pyproject.toml

Using the `LPIB2Evaluator` in your own project

from sklearn.ensemble import RandomForestClassifier
from lpi_b2 import LPIB2Evaluator

evaluator = LPIB2Evaluator(
    base_estimator=RandomForestClassifier(n_estimators=100),
    n_bootstrap=15,
    anchor_rate=0.5,
)
evaluator.evaluate(X_train, y_noisy)

perf = evaluator.get_global_performance()
# {'mu_Se': 0.87, 'mu_Sp': 0.83, 'kappa_Se': 42.1, ...}

audit = evaluator.get_sample_audit()
# DataFrame with columns: posterior_truth_prob, ambiguity_score

Citation

@article{ehrenfeld2025lpi,
  title   = {Latent Performance Indicator: a Bootstrap-Bayesian framework for
             architecture-relative evaluation under label noise},
  author  = {Ehrenfeld, Alejandro and Ega{\~n}a, Alvaro and D{\'i}az, Gonzalo
             and Navarro, Felipe and Garc{\'i}a-Ros, Gonzalo},
  journal = {[journal]},
  year    = {2025},
}

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LPI-B² — Reproducibility Package

Two modes of use

Installation

Quick mode — regenerate all figures

Figure → script → data mapping

Full mode — run experiments from scratch

Experiment → figure mapping

Repository structure

Using the `LPIB2Evaluator` in your own project

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
experiments		experiments
figures		figures
lpi_b2		lpi_b2
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LPI-B² — Reproducibility Package

Two modes of use

Installation

Quick mode — regenerate all figures

Figure → script → data mapping

Full mode — run experiments from scratch

Experiment → figure mapping

Repository structure

Using the LPIB2Evaluator in your own project

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using the `LPIB2Evaluator` in your own project

Packages