Skip to content

fix(beta): capture HTTP status + error into sweep_runs.notes#107

Merged
Augustas11 merged 1 commit into
spike/provider-model-autotunefrom
fix/sweep-capture-error-string
Jun 18, 2026
Merged

fix(beta): capture HTTP status + error into sweep_runs.notes#107
Augustas11 merged 1 commit into
spike/provider-model-autotunefrom
fix/sweep-capture-error-string

Conversation

@Augustas11

Copy link
Copy Markdown
Owner

Problem

When a sweep cell fails, sweep_runs records n_err > 0 but the notes column is NULL — the per-request HTTP status and error string already captured by harness.fire_stream are dropped on the floor in aggregate_cell. This caused a real misdiagnosis last week: a ctx=2000 failure through the production gateway was misread as a "gateway streaming read-idle timeout bug" when gateway-path timeouts are all 280–360s. The actual cause was almost certainly a transient 503 provider_unavailable on a marginal single-slot provider, but the harness gave no way to confirm without re-running.

Base is spike/provider-model-autotune because that is where beta/sweep.py lives today; it is not yet on main.

Fix

  • In aggregate_cell, collect up to 3 distinct (http_status, error[:80]) pairs while iterating per-request results, join them into a ~200-char summary, and expose it as notes (NULL when every request succeeds). The cell-row builder uses agg["notes"] instead of the hardcoded None. ≤20-line change; no schema migration needed (notes column already exists).

Verification

Local smoke against beta/mock_llm_server.py.

100% error mock (per the verification plan in the task prompt) — notes populated:

context_target  n_err  notes
--------------  -----  ---------------------------------------------
1000            1      HTTP 500: {"error": "simulated server error"}

Green mock (0% errors) — notes stays NULL:

context_target  n_ok  n_err  notes
--------------  ----  -----  -----
1000            1     0
  • python3 -m py_compile beta/sweep.py beta/harness.py — OK
  • Sweep against erroring mock exits 1; sweep against green mock exits 0.
  • No changes outside beta/sweep.py. harness.py, schema, and other phases untouched.

🤖 Generated with Claude Code

When a sweep cell failed, sweep_runs recorded n_err > 0 but notes was
NULL — the per-request HTTP status and error string from harness.py
were dropped on the floor. Caused a ctx=2000 production-gateway
misdiagnosis: a transient 503 provider_unavailable was misread as a
gateway streaming read-idle bug, with no way to confirm without
re-running.

aggregate_cell now collects up to 3 distinct (status, error[:80])
pairs from the per-request results, joins them into a ~200-char
summary, and exposes it via the existing notes column. notes stays
NULL on cells where every request succeeded.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Augustas11 Augustas11 marked this pull request as ready for review June 18, 2026 07:27
@Augustas11 Augustas11 merged commit d3782be into spike/provider-model-autotune Jun 18, 2026
5 checks passed
@Augustas11 Augustas11 deleted the fix/sweep-capture-error-string branch June 18, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant