Skip to content

#250: barrier background-worker teardown against ENOTEMPTY flake#255

Merged
seungpyoson merged 2 commits into
mainfrom
fix/250-claude-smoke-teardown-barrier
Jun 26, 2026
Merged

#250: barrier background-worker teardown against ENOTEMPTY flake#255
seungpyoson merged 2 commits into
mainfrom
fix/250-claude-smoke-teardown-barrier

Conversation

@seungpyoson

@seungpyoson seungpyoson commented Jun 26, 2026

Copy link
Copy Markdown
Owner

What

Fixes the intermittent ENOTEMPTY teardown flake in the claude and gemini companion smoke suites (#250) by applying the proven waitForProcessExit worker-exit barrier to every background-worker test that recursively removes its data dir.

Root cause

Background-worker tests launch a detached --background worker, then await only the job record (waitForJobRecord / terminal-meta poll) — never the worker process exit. The finally then recursively removes dataDir while the worker is still doing its post-terminal tail writes (lease release, sidecar removal, lifecycle-jsonl flush), so cleanup races a live writer and flakes with ENOTEMPTY on rmdir of state/.../jobs (sometimes the top-level data dir).

This is the same class #234 fixed for the concurrent-relay harnesses with a deterministic waitForProcessExit barrier; the companion smoke suites lacked the equivalent.

Fix

Apply the in-file waitForProcessExit(pid[, timeoutMs]) barrier as the last statement of each affected test's try, before cleanup — using the exact proven pattern from PR #242 / the #234 lineage (plain await, fail-loud; a 30s *_SMOKE_POLL_TIMEOUT_MS budget on the cancel / status-poll tests; the redundant setTimeout(250) cushions replaced, not duplicated):

  • claude: 9 background-worker tests barriered (3 cancel/status-poll tests use the 30s budget). The helper already existed in-file.
  • gemini: the same waitForProcessExit helper added, plus 8 background-worker tests barriered (4 cancel/status-poll tests use the 30s budget; the approval test waits on launchEvent.pid).

Test-only; no product/source code touched.

Deterministic by construction: cleanup now provably runs after the worker has exited (process.kill(pid,0)ESRCH), so an ENOTEMPTY from a still-writing worker is structurally impossible.

Single-source / #242 reconciliation

These barriers are byte-identical to the companion-smoke barriers currently carried by the open PR #242 (concurrent relays) — verified: git diff feat/234 -- <both smoke files> shows zero changed waitForProcessExit lines. #242 carried the barrier work bundled with its feature changes; this PR extracts it into the focused flake-fix where it belongs. Once this merges, #242's next git merge main collapses the identical barrier lines automatically, leaving #242 with only its concurrency-feature changes. No duplicative barrier work survives in either PR.

Verification (local)

  • npm run lint clean (incl. all sync checks).
  • Full node --test of both suites: claude + gemini green.
  • Sequential stress: 0 ENOTEMPTY across repeated runs (the flake reproduced ~50%/run on the unfixed code). Parallel whole-file stress is invalid for these suites (shared companion state → functional false-positives), so stress is sequential.

Closes #250

🤖 Generated with Claude Code

…250)

claude-companion smoke intermittently failed with ENOTEMPTY during temp cleanup: a detached --background worker was still writing into dataDir/state/.../jobs while the test's finally rmSync'd dataDir. Tests awaited only the job record (waitForJobRecord), never the worker process exit.

Apply the deterministic waitForProcessExit barrier (mirroring #234) to every background-worker test that rmSyncs dataDir: capture launched.pid, then await waitForProcessExit(launchedPid) as the first finally statement, before cleanup. Cancellation tests use the tolerant .catch(() => {}). 9 tests covered; the waitForProcessExit helper and the one already-barriered test are unchanged.

Validated: lint clean, baseline 138/138, 16x sequential stress 0 ENOTEMPTY (the flake was ~50%/run before the fix).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several smoke tests in claude-companion.smoke.test.mjs to track the process ID (launchedPid) of spawned background processes and await their exit (waitForProcessExit) in the finally block before performing cleanup. This prevents race conditions and resource leaks during test execution. There are no review comments, and I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…nd to gemini (#250)

The first cut of #250 used a finally-first barrier with .catch() on the cancel
tests -- a variant that diverged from the proven #234/#242 pattern and swallowed
the 5s fail-loud timeout. Rework to the exact #242 form: end-of-try placement,
plain await (fail-loud), a 30s *_SMOKE_POLL_TIMEOUT_MS budget on the
cancel/status-poll tests, and the redundant setTimeout(250) cushions replaced
rather than duplicated. Extend the same barrier to the gemini companion smoke
suite, which shares the identical detached-worker teardown race.

The resulting barriers are byte-identical to the companion-smoke barriers
currently carried by PR #242 (verified: git diff feat/234 shows zero changed
waitForProcessExit lines), so #242 dedups them automatically on its next main
merge. This makes #250 the single source for the companion-smoke barrier class
and removes the duplicative barrier work from the concurrent-relays PR.

Class coverage: claude (9 tests) and gemini (helper + 8 tests) fixed here; kimi
already barriered on main; agy is foreground-only (rejects --background, no
detached worker, no race); identity-resume has no background launches.

Test-only. Validated: lint (incl. sync checks), claude 138/138, gemini 100/100,
sequential stress 10x claude + 5x gemini = 0 ENOTEMPTY (flake reproduced
~50%/run unfixed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@seungpyoson

Copy link
Copy Markdown
Owner Author

Reworked at e458807b after an adversarial review round (GLM / Claude / GPT) on the original 125ee96b:

Class coverage (MECE): claude + gemini fixed here; kimi already barriered on main; agy is foreground-only (agy-companion.mjs rejects --background, no detached worker → no race); identity-resume has no background launches.

Validation: lint (incl. all sync checks), claude 138/138, gemini 100/100, sequential stress 10× claude + 5× gemini = 0 ENOTEMPTY (flake reproduced ~50%/run unfixed).

@seungpyoson seungpyoson marked this pull request as ready for review June 26, 2026 08:21

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@seungpyoson seungpyoson merged commit dd7b1a2 into main Jun 26, 2026
8 checks passed
@seungpyoson seungpyoson deleted the fix/250-claude-smoke-teardown-barrier branch June 26, 2026 10:47
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

claude-companion smoke: ENOTEMPTY teardown flake (missing waitForProcessExit barrier)

1 participant