fix: snapshot recording fails for multi-arm experiments by jjroelofs · Pull Request #54 · dxpr/rl

jjroelofs · 2026-06-30T11:07:16Z

Summary

Fixes snapshot sampling for experiments where recordTurns() is called with many arm IDs (e.g. 208 arms for ai_sorting views). The old ($total_turns % $interval) === 0 check assumed single-step increments and never aligned with multi-arm jumps, producing zero snapshots regardless of traffic volume.
Replaces exact modulo with range-crossing detection: floor(total_turns / interval) != floor(previous_turns / interval). This correctly detects when a multi-step jump crosses a sampling boundary.
Passes the actual $step_size through recordSnapshot() and maybeRecordSnapshots() so the sampler knows the jump width.

Root cause

SnapshotStorage::shouldRecordSnapshot() uses a modulo check to sample snapshots at regular intervals. When ExperimentDataStorage::recordTurns() is called with N arm IDs, total_turns jumps by N per request. Two problems:

First window skipped entirely. With 208 arms the first call sets total_turns to 208, instantly exceeding the first_window of 19 (floor(10000/208) * 0.4).
Middle interval never aligns. The modulo (total_turns % interval) never hits zero because the 208-step jumps and the growing interval share no common factor. Simulation across all 82 page views (17,160 total turns) confirms zero hits.

Same bug affects rl_sorting (batches all visible arms via JS IntersectionObserver into a single action=turns POST), and any experiment with more than ~50 arms.

Changes

File	Change
`SnapshotStorageInterface.php`	Add optional `$step_size` param to `recordSnapshot()`
`SnapshotStorage.php`	`shouldRecordSnapshot()` and `isMilestone()` use range-crossing instead of modulo
`ExperimentDataStorage.php`	`recordTurns()` passes `$arm_count` as step_size; `maybeRecordSnapshots()` propagates it

Test plan

Verify fix with simulation: run shouldRecordSnapshot across a 208-arm experiment's traffic; confirm hits > 0
Verify existing behaviour preserved for small experiments (step_size=1 degenerates to the old modulo check)
Rsync to DDEV site and confirm the experiment detail page no longer shows "No data yet"
Run docker compose --profile lint run --rm drupal-lint (passed locally)

Fixes #53

When recordTurns() is called with many arm IDs (e.g. 208 for ai_sorting), total_turns jumps by the arm count per request. The shouldRecordSnapshot() modulo check ($total_turns % $interval === 0) assumes increments of 1 and never aligns with these large jumps, producing zero snapshots regardless of traffic volume. Replace exact modulo with range-crossing detection: check whether the step [previous_turns, total_turns] crosses a sampling boundary via floor(total_turns / interval) != floor(previous_turns / interval). Pass the actual step_size through the recording chain so the snapshot sampler knows the jump width. Fixes #53

GuzzleHttp\ClientInterface does not define post(); that method only exists on the concrete Client class. PHPStan correctly flags this as method.notFound. Use request('POST', ...) which is part of the interface contract.

jjroelofs · 2026-06-30T11:17:21Z

+    // For middle section, check if step crossed an interval boundary.
    $interval = $this->calculateMiddleInterval($snapshots_per_arm, $total_turns);
-    return ($total_turns % $interval) === 0;
+    return (int) floor($total_turns / $interval) !== (int) floor($previous_turns / $interval);


[P1] This moving interval makes every large batch a milestone. calculateMiddleInterval() derives the interval from the current $total_turns, so the boundary moves forward on every request. For 208 arms at request 82, total=17056, previous=16848, and interval=1701, yielding buckets 10 and 9; the same comparison succeeds on every preceding request too. Because recordTurns() then writes one row per arm and isMilestone() repeats this predicate, 82 requests create 17,056 permanent rows against the 9,984-row experiment budget, and cleanup cannot remove them because it only deletes is_milestone = 0. Please use a stable sampling threshold/schedule and add a regression test asserting batched traffic does not snapshot every request.

jjroelofs · 2026-06-30T11:17:46Z

+   *   interval boundary crossings.
   */
-  public function recordSnapshot(string $experiment_id, string $arm_id, int $turns, int $rewards, int $total_experiment_turns): void;
+  public function recordSnapshot(string $experiment_id, string $arm_id, int $turns, int $rewards, int $total_experiment_turns, int $step_size = 1): void;


[P2] Adding this optional parameter breaks existing interface implementers. It is backward-compatible for callers, but a custom class implementing the previous five-argument signature now fails at class loading with an incompatible declaration fatal error. Since this targets the stable 1.1.x line, please preserve the existing contract or introduce a secondary range-aware capability with a fallback for current implementations.

Address PR #54 review feedback: P1: Sampling based on total_experiment_turns made every large batch cross an interval boundary, marking all snapshots as permanent milestones that cleanup could not remove. Switch to per-arm turns which always increment by 1 regardless of batch size, making the modulo check reliable. Milestones now use a coarser multiple of the sampling interval so they always land on recorded snapshots. P2: Revert the step_size parameter added to SnapshotStorageInterface, preserving the original 5-argument contract for existing implementers. Also: - Raise MAX_ROWS_PER_EXPERIMENT from 10k to 100k for better chart resolution in many-arm experiments (208 arms: 48 -> 250 per arm). - Lower calculateSnapshotsPerArm floor from 20 to 2 so large arm counts stay within the per-experiment row budget. - Cleanup now enforces per-arm budgets by removing oldest rows (including milestones) when arm count grows and quotas shrink.

jjroelofs · 2026-06-30T12:01:51Z

Critical follow-up before merge

The original batching and interface issues are addressed, but the revised cleanup introduces two critical retention problems:

The configured global 100k row limit is not enforceable. MAX_ROWS_PER_EXPERIMENT now permits each experiment to retain 100k rows, while the global cleanup still deletes only is_milestone = 0 rows. In the 1,000-arm / 1,000-turn simulation, each experiment reaches the 100k per-arm-budget cap with roughly 60k milestones and 40k non-milestones. With only two such experiments, global cleanup can delete all 80k non-milestones but still leaves roughly 120k milestone rows, permanently above the configured 100k limit. Additional experiments make this grow linearly.
The per-arm hard-cap fallback destroys the early history it promises to preserve. When an arm remains over budget, cleanup orders by total_experiment_turns ASC and deletes the oldest rows regardless of milestone status. Those are precisely the first-window snapshots documented as permanent. For 1,000 arms at 1,000 turns, the current policy deletes the earliest six snapshots from every arm. The fallback should compact middle history first while explicitly preserving the allocated early and recent windows.

These need to be resolved before merge: global cleanup requires a final hard-cap path that can actually enforce the configured limit, and per-arm cleanup needs an explicit keep-set for early, middle, and recent history rather than deleting oldest-first. Please also add regression tests covering multiple max-size experiments and preservation of early snapshots during quota shrinkage.

Address follow-up review on PR #54: 1. Per-arm cleanup now compacts middle history first, explicitly preserving the first-window (early learning) and recent-window snapshots. Only falls back to trimming early/recent rows when the middle section is fully exhausted. 2. Global cleanup now enforces the configured row limit even when milestone rows alone exceed it: a second pass removes oldest rows regardless of milestone status after the non-milestone pass is insufficient.

jjroelofs added 3 commits June 30, 2026 13:06

fix: use ClientInterface::request() instead of shorthand post()

f31d25b

GuzzleHttp\ClientInterface does not define post(); that method only exists on the concrete Client class. PHPStan correctly flags this as method.notFound. Use request('POST', ...) which is part of the interface contract.

Merge branch 'fix/endpoint-checker-phpstan' into fix/snapshot-many-arms

f653187

jjroelofs commented Jun 30, 2026

View reviewed changes

jjroelofs merged commit 35e3477 into 1.x Jun 30, 2026
3 checks passed

jjroelofs deleted the fix/snapshot-many-arms branch June 30, 2026 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: snapshot recording fails for multi-arm experiments#54

fix: snapshot recording fails for multi-arm experiments#54
jjroelofs merged 5 commits into
1.xfrom
fix/snapshot-many-arms

jjroelofs commented Jun 30, 2026

Uh oh!

jjroelofs Jun 30, 2026

Uh oh!

jjroelofs Jun 30, 2026

Uh oh!

jjroelofs commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jjroelofs commented Jun 30, 2026

Summary

Root cause

Changes

Test plan

Uh oh!

jjroelofs Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

jjroelofs Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

jjroelofs commented Jun 30, 2026

Critical follow-up before merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant