Skip to content

docs(fix-flaky): de-overfit diagnosis playbook to generic classes#58

Merged
skipi merged 1 commit into
mainfrom
mk/sem-ai/fix-flaky-generic
Jun 17, 2026
Merged

docs(fix-flaky): de-overfit diagnosis playbook to generic classes#58
skipi merged 1 commit into
mainfrom
mk/sem-ai/fix-flaky-generic

Conversation

@skipi

@skipi skipi commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Why

The step-4 diagnosis table in the fix-flaky skill had cells reverse-engineered from a few specific tests (e.g. count_children → %{active: N, workers: N}, compare == :lt → fix in [:lt, :eq]). On those exact tests it's a perfect match; on any other repo the cells read as answer-keys, and a clean run tends to parrot the canned fix instead of diagnosing the test in front of it.

What

  • Reframe step 4: pull the real flaky failure first — the table now names the class of nondeterminism and the fix direction only, never the fix itself.
  • Genericize the table cells (drop test-specific shapes and language-specific naming).

Net: diagnosis is driven by the real failure evidence, so the skill still nails the cases it used to without going canned on unfamiliar repos.

Validation

Ran the skill from a clean Claude instance (only sem-ai installed) against the hardest case — the exact test one of the old answer-key cells was mined from. It led with flaky failure, diagnosed from the actual code (a shared named supervisor with transient restart-loop workers leaking across tests), and wrote a one-place setup drain using the manager's real children/0 + finish_schedule_task/1. No parroting, and it never needed a worked-example crutch.

No version bump (bundled at release).

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 17, 2026
@skipi skipi force-pushed the mk/sem-ai/fix-flaky-generic branch from d45f8e1 to 3a6b1bb Compare June 17, 2026 09:38
The step-4 table cells were reverse-engineered from semaphore's top
flakes (count_children → %{active:N,workers:N}, `in [:lt,:eq]`), so they
read as answer-keys a plain run parrots on unrelated repos.

- reframe step 4: pull the real `flaky failure` first; the table names
  the class + fix direction only, never the fix itself
- genericize the cells (drop semaphore-specific shapes / Elixir-only naming)

Diagnosis now driven by real failure evidence — still nails semaphore,
no longer canned on other projects.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@skipi skipi force-pushed the mk/sem-ai/fix-flaky-generic branch from 3a6b1bb to f899a25 Compare June 17, 2026 09:47
@skipi skipi merged commit 9c18e30 into main Jun 17, 2026
1 check passed
@skipi skipi deleted the mk/sem-ai/fix-flaky-generic branch June 17, 2026 10:06
@skipi skipi mentioned this pull request Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants