Skip to content

perf(exec,workspace): reuse SSH connections + stop re-warming skipped submodules#70

Merged
nathanwhit merged 1 commit into
mainfrom
ssh-multiplexing-prep-speedup
Jun 25, 2026
Merged

perf(exec,workspace): reuse SSH connections + stop re-warming skipped submodules#70
nathanwhit merged 1 commit into
mainfrom
ssh-multiplexing-prep-speedup

Conversation

@nathanwhit

Copy link
Copy Markdown
Owner

Problem

A fresh objective's first checkout was taking 6–8 minutes in prod (e.g. session e574fde2 prep 6m22s, implementer 3dfac597 8m09s), and the agent only launches after prep — so it shows up as a session stuck "queued" for minutes. It is not the scheduler (placement is instant) and not inherent submodule cost.

Where the time actually goes (measured)

Replaying prep's git steps on the warm-cache box, the whole thing is ~44s of real git work. The 6–8 min gap is orchestration overhead:

  • orcha runs each of a prep's ~50–60 git subcommands through SSHExecutor, which set no ControlMaster/ControlPersist → every command is a fresh TCP+crypto handshake + remote login shell. Measured quebec→box: 1.15s/connection cold vs 0.19s multiplexed (~6×) → ~60–70s of pure handshake overhead per prep.
  • The WPT submodule (159k files) is warmed every prep (~18.5s) even though it's always size-skipped from checkout, because warmSubmoduleMirror looped all submodules before partitionSubmodulesBySize skipped it.
  • (Box load ~12 on 6 cores amplifies both — separate issue.)

Changes

1. exec/ssh.go — SSH connection multiplexing, scoped to the non-tty path (ControlMaster=auto, ControlPersist=60, ControlPath=<tmpdir>/%C). The long-lived interactive -tt sessions are deliberately excluded — their cancellation relies on a per-connection process-group SIGHUP, so they must never share or outlive a persisted master. Fails open if the control dir can't be created. workspace/prepare.go's run() now uses the clean non-tty capture path (it was forcing a tty per command — that both blocked reuse and dragged the remote login shell in on every call, the bind: warning noise).

2. workspace/prepare.go — don't re-warm a known-oversized submodule. Warm only submodules the cache doesn't already show are oversized. In steady state WPT is sized from the cache and skipped without a re-fetch; a cold or freshly-bumped pin is still warmed once so it can be measured (the checkout-skip is unchanged).

Validation

  • ControlMaster=auto with the exact emitted options, quebec→box: 15 sequential connections 3.14s vs 17.24s cold.
  • New TestSSHArgs_MultiplexesNonTTYOnly (multiplex non-tty, never -tt) and TestPrepareIsolated_SkipsOversizedSubmoduleWarmCache (oversized stays skipped on a warm-cache 2nd prep). Full suite green; go vet clean.

Expected impact

Per prep: ~60–70s off (handshakes) + ~18s off (WPT warm). Helps every SSH op orcha does, not just prep. Complements #68 (managers skip submodules entirely).

https://claude.ai/code/session_018CRsfX79dM42QC2BrW1vtZ

…ed submodules

A fresh objective's checkout was taking 6-8 min in prod, blocking session start. Profiling showed the actual git work is only ~44s on the warm-cache box; the gap was orchestration overhead. orcha runs every prep git subcommand (~50-60 of them) through the SSH executor, which set no ControlMaster/ControlPersist — so each command paid a fresh TCP+crypto handshake plus the remote login shell. Measured quebec->box: 1.15s/connection cold vs 0.19s multiplexed (~6x), i.e. ~60-70s of pure handshake overhead per prep.

Two changes:

1. exec/ssh.go: enable SSH connection multiplexing (ControlMaster=auto, ControlPersist=60, ControlPath=<tmp>/%C) on the NON-tty path only. The long-lived interactive -tt sessions are deliberately excluded: their cancellation relies on a per-connection process-group SIGHUP, so they must never share or outlive a persisted master. Fails open if the control dir can't be created. workspace/prepare.go's run() now uses the clean non-tty capture path (it was forcing a tty per command, which both blocked reuse and dragged the remote login shell in on every call).

2. workspace/prepare.go: stop re-warming a submodule the cache already shows is oversized. warmSubmoduleMirror looped ALL submodules before partitionSubmodulesBySize decided to skip the WPT suite (159k files), so every prep refetched WPT objects (~18.5s) it never checks out. Now warm only submodules not already known-oversized from the cache; a cold/bumped one is still warmed once so it can be sized.

Measured: ControlMaster=auto on the exact emitted options, 15 sequential connections 3.14s vs 17.24s cold. New tests cover both; full suite green.

Claude-Session: https://claude.ai/code/session_018CRsfX79dM42QC2BrW1vtZ
@nathanwhit nathanwhit merged commit baf0f52 into main Jun 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant