LoopBench

The public scoreboard for loop engineering.

Fixed tasks. Fixed seeds. Observed LES. Submissions anyone can audit.

No hand-waved demos — bring an LSS spec, get a number, climb the leaderboard.

pip install loopbench loopgym
loopbench list

Run your first score · Leaderboard · Suite overview

LoopBench: install, list tasks, run, validate, rank

What LoopBench measures

You submit a loop specification (LSS YAML). LoopBench:

Runs it through LoopGym on fixed task instances
Computes Success@k and LES_obs across eight categories
Validates your results.json against a published schema
Ranks you on the public leaderboard

loopbench run --task LB-CR-1 --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json
loopbench rank leaderboard/entries.json

The measurement stack

flowchart LR
  YOU["Your LSS spec"]
  LB["LoopBench<br/>tasks · scoring · conformance"]
  LG["LoopGym<br/>SimEnv execution"]
  OUT["results.json → leaderboard"]

  YOU --> LB
  LB --> LG
  LG --> LB
  LB --> OUT

Layer	Owns	Repo
Spec	LSS schema, LES formulas	Loop Core Engineering
Data	Trajectories (holdout v0.2)	LoopNet
Runtime	`env.run_episode()`	LoopGym
Observability	LTF traces, iteration metrics	loop-observability
Measurement	Tasks, LES_obs, anti-gaming	LoopBench

LoopBench defines and scores. LoopGym runs. Never the other way around.

New to the stack? Start with the LoopNet end-to-end tutorial.

Tasks (v0.1)

ID	Name	What it exposes
`LB-CR-1`	Code repair	Can your loop fix broken code under verify pressure?
`LB-RS-1`	Research synthesis	Quality vs. cost on structured briefs
`LB-MA-1`	Multi-agent debate	Autonomy + coordination under evaluator scrutiny

Five seeds per task. Details in tasks/.

Score in 2 minutes

pip install loopbench loopgym

loopbench list

loopbench run \
  --task LB-CR-1 \
  --spec submissions/examples/spec-fast-loop.yaml \
  --seeds 0,1,2,3,4 \
  -o results.json

loopbench validate results.json

Submit to the leaderboard: open a PR adding your entry to leaderboard/entries.json.

v0.1 accepts SimEnv submissions only (fully reproducible, no API keys). LiveEnv tier: v0.2.

Metrics explained

Metric	Meaning
Success@k	Fraction of instances reaching goal threshold
LES_obs	Observed composite ∈ `[0, 1]` — eight categories
Cost	Estimated USD from LSS cost limits
Robustness	Quality retention across seeds

Display scale 0–100 is optional (les × 100).

Who this is for

You are…	LoopBench gives you…
Loop designer	A number you can improve release-over-release
Framework author	A neutral arena — not your own benchmark
Researcher	Reproducible tasks + published submission schema
Team lead	Comparable scores across designs and vendors

Citation

@software{loopbench2026,
  title={LoopBench: Benchmark Suite for Loop Engineering},
  author={Malpani, Kanak},
  year={2026},
  url={https://pypi.org/project/loopbench/}
}

_{MIT · v0.1 · Contributing · Security · Status}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
assets		assets
cli		cli
leaderboard		leaderboard
loopbench		loopbench
metrics		metrics
scripts		scripts
submissions/examples		submissions/examples
submit		submit
tasks		tasks
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PLAN.md		PLAN.md
PUBLISHING.md		PUBLISHING.md
README.md		README.md
SECURITY.md		SECURITY.md
STATUS.md		STATUS.md
SUITE-OVERVIEW.md		SUITE-OVERVIEW.md
SYNC.md		SYNC.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoopBench

What LoopBench measures

The measurement stack

Tasks (v0.1)

Score in 2 minutes

Metrics explained

Who this is for

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LoopBench

What LoopBench measures

The measurement stack

Tasks (v0.1)

Score in 2 minutes

Metrics explained

Who this is for

Citation

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages