Skip to content

fix: keep unmerged reviewer commits during reclaim + fix empty-stdin deadlock#63

Merged
nathanwhit merged 2 commits into
mainfrom
incident-fixes-reclaim-and-stdin
Jun 24, 2026
Merged

fix: keep unmerged reviewer commits during reclaim + fix empty-stdin deadlock#63
nathanwhit merged 2 commits into
mainfrom
incident-fixes-reclaim-and-stdin

Conversation

@nathanwhit

Copy link
Copy Markdown
Owner

Two fixes from today's a8641186 incident.

1. fix(orch): keep reviewer checkouts with unmerged commits during reclaim

Reviewers are told to commit any fix they make, but that commit lives only on the review branch in an ephemeral checkout — never pushed or merged. The mid-objective reclaim from #62 tore those down the moment the reviewer went terminal, silently destroying the work. This actually happened: a reviewer's compiled-run follow-up commit (cd91a5a62572d) was lost when I cleaned up disk.

reclaimSpentCheckouts now keeps any non-publishing checkout whose branch advanced past its base (checkoutHasUnmergedCommits). Published-PR checkouts stay exempt (their commits are on the PR). The probe is conservative — only reclaims when there's provably nothing to lose (dir gone / not a repo / HEAD == base), keeps on any uncertain probe.

2. fix(exec): close stdin for empty feed-and-wait writes (manager-wedge deadlock)

An empty Stdin was indistinguishable from an interactive session, so the executor left the stdin pipe open — and over SSH allocated a -tt pty. A feed-and-wait tee with empty content (seeding a blanked memory file) hung forever waiting for an EOF that never came. A single 0-byte repo-memory row deadlocked seedRepoMemory on resume, which wedged every manager's StartRun and took the whole fleet down after a redeploy.

New Command.CloseStdin forces the pipe written-and-closed even when Stdin is empty, and suppresses the SSH pty so EOF reaches the remote reader. writeWorkspaceFile sets it.

Tests

  • TestReclaimWorkspaces_KeepsReviewerCheckoutWithUnmergedCommit — real git repos: reviewer-with-commit kept, reviewer-without reclaimed.
  • TestLocal_EmptyStdinClosedWithCloseStdin — the empty feed-and-wait no longer hangs.

A reviewer is told to commit any fix it makes, but that commit lives ONLY on
its review branch in an ephemeral checkout — never pushed or merged. The
mid-objective reclaim (#62) tore those checkouts down the moment the reviewer
went terminal, silently destroying the work. (This bit us live: a reviewer's
compiled-run follow-up commit was lost when its checkout was reclaimed.)

reclaimSpentCheckouts now keeps any non-publishing checkout whose branch has
advanced past its base (checkoutHasUnmergedCommits): disk is the cheaper loss.
Published-PR checkouts stay exempt — their commits are safely on the PR. The
probe is conservative: it only reclaims when there is provably nothing to lose
(dir gone / not a repo / HEAD == base) and keeps the checkout on any uncertain
probe of a still-present dir.
…deadlock)

An empty Stdin was indistinguishable from an interactive session, so the
executor left the stdin pipe open — and over SSH allocated a -tt pty. A
feed-and-wait write with empty content (tee'ing a blanked memory file) then
hung forever waiting for an EOF that never came. A single 0-byte repo-memory
row wedged seedRepoMemory on resume, which deadlocked EVERY manager's StartRun
and took the whole fleet down after a redeploy.

Add Command.CloseStdin: it forces the pipe written-and-closed even when Stdin
is empty, and suppresses the SSH pty so the EOF reaches the remote reader.
writeWorkspaceFile (the tee that seeds memory) sets it, so an empty memory
file now writes cleanly instead of hanging.
@nathanwhit nathanwhit merged commit d5e1544 into main Jun 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant