Conversation
The linux/amd64 agent is a *native* build on the CI runner, where Go
defaults CGO_ENABLED to 1. Because the agent imports "net" (and os/user),
the binary then dynamically links the runner's glibc and requires that
exact version at runtime — so it failed on older distros (e.g. Ubuntu
20.04, glibc 2.31) with "version `GLIBC_x.xx' not found", surfacing in
the app as an empty-cause ssh.unknown ("SSH workspace validation failed").
Force CGO_ENABLED=0 and add -tags=netgo,osusergo so every target is a
fully static binary, independent of the build host's glibc. Update the
release.yml comment that incorrectly assumed CGO was already unused.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A child-process close produced a bare createSshError("ssh.unknown") with
no cause, and unclassified stderr lines were dropped — so a fatal loader
error (e.g. glibc mismatch) left only an empty-cause ssh.unknown in the
log, making such failures undiagnosable from main.log alone.
- pipe: retain a bounded ring buffer of recent stderr lines; notifyClose
now returns a stderrTail; classified stderr attaches the offending line
as the SshError cause; raise the logged cause budget 300 -> 600 chars.
- reconnecting channel: capture {code, signal, stderrTail} on close and
pass it to closeError.
- ssh channel: render that context into the ssh.unknown cause string.
- stderr-patterns: classify glibc/loader/arch errors as server.spawn-failed
so the user sees "Remote agent failed to start" instead of the generic
transport error. Add regression tests.
All diagnostics go to the local main.log only; raw stderr is still never
forwarded to the renderer.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
moreih29
added a commit
that referenced
this pull request
Jun 2, 2026
Follow-up to the idle watchdog (e528ccd). Review surfaced six issues across correctness, scope, and recovery; this addresses all of them. #1 Monotonic clock: lastInbound was stored as wall-clock UnixNano and compared via time.Since on a time.Unix value, which silently falls back to wall-clock arithmetic. A laptop waking from sleep (local agent) or an NTP step (remote) made elapsed jump past the limit and reap a live session. Now anchored to a monotonic startMono via stampInbound/ idleElapsed. #2 Scope: the watchdog ran for local agents too, where parent death already arrives as stdin EOF (plus Pdeathsig on Linux) — pure downside. Now gated on a new --idle-watchdog flag the SSH launch sets and the local launch omits. #3 Threshold: 60s limit with 3-ping margin was tight enough that a stalled Electron main thread (ping is event-loop bound; ssh ServerAlive is not) could trip it. Widened to 90s limit / 15s ping (6 slots), with the check interval decoupled to limit/6 so the kill window stays tight. #4 Contract: client ping was gated on heartbeat advertisement, the agent watchdog on nothing — drift-prone. The agent now advertises idleWatchdogMs in the Ready frame; the client pings iff positive, at idleWatchdogMs/6. #5 Orphans: drainAndExit reaches os.Exit, which skips the `defer pty.Close()`. Linux survived via Pdeathsig; a darwin remote (supported, shipped) had only SIGHUP-on-fd-close, so SIGHUP-ignoring children orphaned. PTY cleanup is now a shutdown hook that SIGKILLs each process group on every OS. #6 Recovery: the watchdog exited 0, which the client's handleClose treats as a clean terminal exit (no reconnect). On a false positive (client alive but stalled) the session died permanently. Now exits 75 (EX_TEMPFAIL) so the client reconnects. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
요약
특정 원격 서버(구버전 Ubuntu 등)에서 SSH 연결이 "SSH workspace validation failed"로 실패하던 회귀를 수정하고, 같은 부류 장애가 main.log만으로 진단되도록 로깅을 보강합니다. 긴급 패치(v0.5.1).
포함 커밋
fix(agent): force CGO_ENABLED=0 so the Linux agent links staticallynet/os/user가 빌드 호스트 glibc에 동적 링크 → 구버전 배포판(예: Ubuntu 20.04, glibc 2.31)에서GLIBC_x.xx not found로 기동 실패.CGO_ENABLED=0+-tags=netgo,osusergo로 전 타깃 정적 링크.fix(agent): preserve exit code/signal/stderr on SSH transport closeserver.spawn-failed로 분류(사용자에게 "Remote agent failed to start"). 분류기 회귀 테스트 추가. 진단은 로컬 main.log 한정, 렌더러로 raw stderr 미전달.chore: bump version to 0.5.1검증
dev:fresh)로 실패하던 서버 접속 성공 확인 (정적 바이너리 자동 재업로드).🤖 Generated with Claude Code