Add opt-in concurrent evaluation (overlap eval with next solve) by robdmac · Pull Request #11 · SprocketLab/slop-code-bench

robdmac · 2026-05-27T07:29:01Z

Currently, evaluation runs serially between checkpoints - a checkpoint's tests must finish before the agent starts the next one even though eval results never feed back into the (blind) agent. This adds an opt-in --concurrent-evaluation flag that instead runs each checkpoint's eval concurrently with the next checkpoint's solve, so eval no longer blocks progress. On subprocess/compile-heavy problems (such as test_translator), evaluation is ~57% of per-run wall-clock so this approximately halves the run-time.

A single rolling background thread evaluates checkpoint N while N+1 solves; the previous eval is joined before the next starts, so solve stays <= 1 checkpoint ahead of eval and at most one solve + one eval run at a time (in-flight containers bounded to 2). Results land per-checkpoint as each eval completes, not batched at the end. Off by default.

Since eval never feeds the agent, scores are unchanged for the ANY_CASE pass policy, verified by re-evaluating recorded snapshots: pass counts and test_collection_hash bit-for-bit identical at checkpoints.

Trade-off: can't early-stop on test failures (a checkpoint's eval finishes during the next solve); agent errors and rate limits will stop the run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

robdmac · 2026-05-28T15:29:16Z

This can be hardened. Converting to draft.

Add opt-in concurrent evaluation (overlap eval with next solve)

7391b61

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

robdmac mentioned this pull request May 28, 2026

Add opt-in parallel test execution via pytest-xdist #10

Closed

robdmac marked this pull request as draft May 28, 2026 15:29

robdmac added 2 commits May 28, 2026 10:35

Trim comments and docstrings in concurrent-eval impl

b15c537

Surface concurrent-eval failures and guarantee thread cleanup

1d8992c

robdmac marked this pull request as ready for review May 28, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in concurrent evaluation (overlap eval with next solve)#11

Add opt-in concurrent evaluation (overlap eval with next solve)#11
robdmac wants to merge 3 commits into
SprocketLab:mainfrom
robdmac:concurrent-eval

robdmac commented May 27, 2026 •

edited

Loading

Uh oh!

robdmac commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robdmac commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robdmac commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

robdmac commented May 27, 2026 •

edited

Loading