docs(fix-flaky): de-overfit diagnosis playbook to generic classes by skipi · Pull Request #58 · semaphoreio/sem-ai

skipi · 2026-06-17T09:33:20Z

Why

The step-4 diagnosis table in the fix-flaky skill had cells reverse-engineered from a few specific tests (e.g. count_children → %{active: N, workers: N}, compare == :lt → fix in [:lt, :eq]). On those exact tests it's a perfect match; on any other repo the cells read as answer-keys, and a clean run tends to parrot the canned fix instead of diagnosing the test in front of it.

What

Reframe step 4: pull the real flaky failure first — the table now names the class of nondeterminism and the fix direction only, never the fix itself.
Genericize the table cells (drop test-specific shapes and language-specific naming).

Net: diagnosis is driven by the real failure evidence, so the skill still nails the cases it used to without going canned on unfamiliar repos.

Validation

Ran the skill from a clean Claude instance (only sem-ai installed) against the hardest case — the exact test one of the old answer-key cells was mined from. It led with flaky failure, diagnosed from the actual code (a shared named supervisor with transient restart-loop workers leaking across tests), and wrote a one-place setup drain using the manager's real children/0 + finish_schedule_task/1. No parroting, and it never needed a worked-example crutch.

No version bump (bundled at release).

The step-4 table cells were reverse-engineered from semaphore's top flakes (count_children → %{active:N,workers:N}, `in [:lt,:eq]`), so they read as answer-keys a plain run parrots on unrelated repos. - reframe step 4: pull the real `flaky failure` first; the table names the class + fix direction only, never the fix itself - genericize the cells (drop semaphore-specific shapes / Elixir-only naming) Diagnosis now driven by real failure evidence — still nails semaphore, no longer canned on other projects. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

skipi requested review from DamjanBecirovic, dexyk and loadez as code owners June 17, 2026 09:33

github-actions Bot added the documentation Improvements or additions to documentation label Jun 17, 2026

skipi force-pushed the mk/sem-ai/fix-flaky-generic branch from d45f8e1 to 3a6b1bb Compare June 17, 2026 09:38

skipi force-pushed the mk/sem-ai/fix-flaky-generic branch from 3a6b1bb to f899a25 Compare June 17, 2026 09:47

loadez approved these changes Jun 17, 2026

View reviewed changes

skipi merged commit 9c18e30 into main Jun 17, 2026
1 check passed

skipi deleted the mk/sem-ai/fix-flaky-generic branch June 17, 2026 10:06

skipi mentioned this pull request Jun 17, 2026

chore(release): v0.1.24 #61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(fix-flaky): de-overfit diagnosis playbook to generic classes#58

docs(fix-flaky): de-overfit diagnosis playbook to generic classes#58
skipi merged 1 commit into
mainfrom
mk/sem-ai/fix-flaky-generic

skipi commented Jun 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

skipi commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

skipi commented Jun 17, 2026 •

edited

Loading