Skip to content

fix(workspace): don't let reconcileSubmodulePins detach the superproject via a skipped submodule#71

Merged
nathanwhit merged 1 commit into
mainfrom
fix-reconcile-superproject-detach
Jun 25, 2026
Merged

fix(workspace): don't let reconcileSubmodulePins detach the superproject via a skipped submodule#71
nathanwhit merged 1 commit into
mainfrom
fix-reconcile-superproject-detach

Conversation

@nathanwhit

Copy link
Copy Markdown
Owner

Problem

Objective eed9f6b5 (deno #35516) burned a full extra implement+review cycle (~1h) because the first implementer's commit "vanished" — the worker's goal literally read "previous worker reported commit 444923e92019 but Orcha PR creation says there are no commits between main and the push branch."

Root cause is in reconcileSubmodulePins, exposed by the partition-skip optimization (#60) interacting with the reconcile step (#47):

  1. deno's WPT suite (tests/wpt/suite) is partition-skipped as oversized, so prep leaves it an empty placeholder dir with no .git.
  2. reconcileSubmodulePins iterated all .gitmodules paths — including the skipped WPT — and ran git -C tests/wpt/suite checkout --detach --force <pin>.
  3. With no repo in the empty dir, git -C walked up to the superproject and detached its HEAD onto the WPT commit d85e753…, during prep, before the agent started.
  4. The agent (codex) booted into detached-HEAD, ran git checkout main to recover, and committed its fix on main instead of orcha/impl-<id>.
  5. Orcha publishes from orcha/impl-<id>, still at base → "no commits between main and the push branch" → manager re-spawned a reapply worker + second reviewer.

The old rev-parse HEAD guard couldn't catch this: against the superproject the command succeeds, it just returns the wrong HEAD.

It's warm-cache-only — the pin object must be present locally, and base()'s local git clone cache ws hardlinks it in — which is why it bites prod's long-lived .orcha-cache but not a cold checkout. Likely behind recurring "empty PR / no commits" failures on deno.

Proof (impl checkout reflog)

444923e… HEAD@{0}: commit: fix: avoid async emit for sync TypeScript require   ← on main
5a33eed… HEAD@{1}: checkout: moving from d85e753… to main
d85e753… HEAD@{2}: checkout: moving from orcha/impl-3dfac597 to d85e753…        ← during prep

And git rev-parse main:tests/wpt/suite == d85e753… exactly; git -C tests/wpt/suite rev-parse --show-toplevel returns the superproject root.

Fix

Gate the reset on git -C <sub> rev-parse --show-superproject-working-tree being non-empty — true only when <sub> genuinely is a submodule of a superproject, empty when git -C escaped to the superproject (or the dir isn't an initialized submodule). Also covers a kept submodule that failed to initialize.

Test

TestPrepareIsolated_SkippedSubmoduleKeepsSuperprojectOnBranch reproduces the exact prod symptom. It needs a kept submodule alongside the skipped one (with only an oversized submodule, updateSubmodules early-returns before reconcile runs) and two preps to warm the cache. Without the fix it fails with the superproject HEAD detached; with it, HEAD stays on the work branch. Full internal/workspace suite, go vet, and go build ./... all green.

…ect via a skipped submodule

A partition-skipped oversized submodule (deno's tests/wpt/suite) is left as an
empty placeholder dir with no .git. reconcileSubmodulePins iterated it anyway and
ran 'git -C <sub> checkout --detach --force <pin>', which — finding no repo in the
empty dir — walked UP to the superproject and detached ITS HEAD onto the
submodule's pinned commit. The agent then booted into detached-HEAD, ran 'git
checkout main' to recover, and committed off its orcha work branch, so publish saw
'no commits between main and the push branch' and the whole change had to be redone
(observed on deno #35516: HEAD detached onto WPT commit d85e753). The old
'rev-parse HEAD' guard couldn't catch it — against the superproject the command
succeeds, it just returns the wrong HEAD.

Gate the reset on 'git -C <sub> rev-parse --show-superproject-working-tree' being
non-empty, which holds only when <sub> genuinely is a submodule of a superproject
and is empty when git escaped to the superproject (or the dir isn't an initialized
submodule). Also covers a kept submodule that failed to initialize.

Only fires on a warm cache (the pin object must be present locally; base()'s local
clone hardlinks it in), which is why it hit prod but not cold checkouts.

Regression test needs a kept submodule alongside the skipped one (with only an
oversized one, updateSubmodules early-returns before reconcile) and two preps to
warm the cache; without the fix it fails with HEAD detached.
@nathanwhit nathanwhit merged commit 58f6ce6 into main Jun 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant