Conversation
Three related file/image viewing fixes:
- Raise MaxReadableFileSize 5MB -> 50MB (and the renderer mirror
MAX_READABLE_FILE_SIZE) so larger files and images open. Also raise the
agent's inbound NDJSON scanner cap 4MiB -> 64MiB; the previous 4MiB cap
was already below the 5MB limit, silently breaking writeFile for files
4-5MB. Reads are unaffected by the scanner (they travel outbound).
- ImagePreview: replace flex center alignment (items-center/justify-center)
with safe alignment (items-center-safe/justify-center-safe). When a zoomed
image overflows the scroll container, centering pushed the top/left edges
into negative scroll space that scrollLeft/scrollTop can never reach,
leaving part of the image unviewable. Safe alignment falls back to start
on overflow so the whole image is scrollable.
- ImagePreview: on image load failure, stat the file and show a
size-specific message ("too large, max 50 MB") instead of a generic
"could not load image", matching the text editor's TOO_LARGE message.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code sets its window title via OSC 2 while staying in the NORMAL
screen buffer (it never enters the alternate screen). The onTitleChange
handler's alt-screen guard (`buffer.active.type !== "alternate" → return`)
dropped those titles, and since claude emits no alt-screen enter, the
foregroundProcess fallback never fired either — so the tab name never
updated. lazygit/lazydocker/yazi were unaffected (they use the alt screen).
This was transport-independent; SSH was a red herring.
Replace the binary alt-screen guard with classifyOscTitle():
- alternate + non-shell-like -> apply (unchanged)
- alternate + shell-like -> ignore (alt-enter RPC labels these)
- normal + shell-like -> clear (prompt; also resets the tab when an
inline TUI exits)
- normal + non-shell-like -> confirm via foregroundProcess, applying only
when a real (non-login-shell) program holds
the foreground, so starship-style preexec
command echoes can't hijack the tab.
Decision logic extracted to pure functions (isLoginShell, classifyOscTitle,
foregroundConfirmsTitle) with unit tests; end-to-end behavior over a live
PTY still needs manual smoke (claude/lazygit/yazi/plain shell).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Connecting to an SSH workspace uploads the agent binary / Node runtime /
LSP archives to the remote, which can take seconds. The UI showed only a
static "connecting", so users couldn't tell it was working and sometimes
force-quit mid-upload (leaving a busy binary behind).
The bootstrap already emitted onProgress events, but the agent-bootstrap
path never forwarded them (only the LSP path did). Wire them through,
mirroring the existing lsp.bootstrap.progress pattern:
- New workspaceId-scoped IPC event workspace/connectionProgress
{ workspaceId, name, phase, bytesDone?, bytesTotal? }, reusing the
bootstrap phase enum. Because both connect flows (add-new and
app-startup reconnect) funnel through startSshProvider, one event
covers both.
- WorkspaceManager.startSshProvider passes onProgress →
broadcastConnectionProgress; main/index forwards deps.onProgress into
ensureRemoteAgent.
- Renderer store tracks connectionProgressByWorkspaceId, cleared when the
connection reaches a terminal status.
- Workspace panel placeholder renders phase label + artifact size + a bar
(determinate only when 0<bytesDone<bytesTotal — the transport reports
uploads as 0→total, not incremental, so we don't fake a smooth %;
otherwise an indeterminate animated bar). Sidebar dot pulses while
connecting.
Store reducer covered by unit tests; live connect smoke is manual.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two hardening layers for the case where the client dies abnormally
(force-kill, sleep, network drop) while connected to an SSH workspace.
L1 — connection keepalive: add ServerAliveInterval=15 / ServerAliveCountMax=3
to the agent channel, the persistent ControlMaster, and bootstrap transport
commands. Previously there was no keepalive, so a dead client left the remote
agent (holding its binary) alive until the kernel's default TCP timeout
(hours), which then blocked the next launch's re-upload. Now a dead peer is
detected in ~45s, ssh exits, and the remote agent gets stdin EOF and shuts
down (killing its PTY children).
L3 — atomic agent install: upload each artifact to a unique temp path in the
same directory, then `mv -f` it into place. A rename over a file that a
lingering old agent is still executing succeeds (the running process keeps the
old inode), so a stale remote agent can never block reinstall with ETXTBSY
("Text file busy") — the exact failure seen after a force-quit. Same-dir
rename is atomic, so there's no missing-file window either. No `pkill`, so a
co-tenant session's agent on the same host is never killed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Defense-in-depth (L2) for the case where the client is gone but the SSH connection lingers without delivering stdin EOF — a hung client process or a connection the kernel hasn't yet torn down. Previously the remote agent would keep running (holding its binary) indefinitely, since it had no liveness check of its own. Agent: StartIdleWatchdog self-terminates (via drainAndExit, which kills PTY children) when no inbound request line arrives within 60s. Run() stamps lastInbound on every received line, so any real traffic resets it. drainAndExit now calls an injectable `exit` (default os.Exit) so the watchdog and drain paths are unit-testable without killing the runner. Client: pipe.ts pings the agent every 20s (fire-and-forget `ping`, a no-op handler registered on the agent) once heartbeat is enabled, so a healthy but idle session keeps resetting the watchdog (~3 pings per 60s window). The timer is unref'd and cleared on dispose/fail. Client and remote agent are always the same build (the app uploads its own agent), so the `ping` method is always present — no version-skew concern. Combined with the L1 keepalive (ssh ServerAlive) this closes both paths: L1 reaps the connection at the SSH layer (~45s); L2 reaps the agent even if the connection itself never reports dead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sftp exits 0 even when an individual put silently fails, so a transient upload error left the temp file missing; the subsequent 'mv -f tmp final' then threw 'no such file' and aborted the whole bootstrap (intermittent 'SSH transport failed' that cleared on a manual retry). The atomic-install change (ef5c26c) moved the failure ahead of the sha256 verify, removing the pre-existing retry resilience. Wrap each upload->rename->verify pass in try/catch so any failure (missing temp, rename error, sha mismatch) retries the full upload instead of propagating, and best-effort rm the orphaned temp between attempts so failed runs never litter .tmp.<rand> beside the installed binary. The sha256 check remains the sole correctness gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The add-workspace dialog only showed a static '연결 중' spinner while the agent binary uploaded/verified — the heavy work happens inside openBrowseSession, which (unlike registered workspaces' startSshProvider) never wired an onProgress callback. And it could not: progress events are keyed by workspaceId, but during 'add' no workspaceId (or sessionId) exists until the call returns. Key progress by a client-minted correlation id instead: - openBrowseSession accepts an optional progressId; the handler forwards an onProgress that broadcasts ssh.browseProgress keyed by that id. - The renderer mints the id before calling, subscribes via subscribeSshBrowseProgress, and renders the progress bar while connecting. Both add entry points are covered (new-connection form + saved-connection list). Extracted BootstrapProgressBar into a shared presentational component so the panel and the dialog render identical progress; the panel keeps its absolute-positioned placement via a className prop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ng row The connection-list progress bar sat at the top of the list while the row still showed a static '연결 중…'. Move it into the connecting row: the row's subtitle is replaced by the live phase label + progress bar once events arrive (falling back to '연결 중…' until the first event). Per-row error strip already surfaces failures in place, so a row now shows how far the bootstrap got before any failure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root cause of intermittent connect failures to some hosts: when a bootstrap SSH connection goes half-open (client gives up via keepalive, server has no ClientAlive so it never reaps the session), the remote sftp-server lingers holding the just-uploaded agent binary's inode open for WRITE. The next connect then either re-uploads (leaving '.tmp.<rand>' litter the dead connection's rm could not clean) or execs a binary another writer still holds — failing with exit 126 'Text file busy'. Two client-side mitigations: - buildRemoteAgentCommand wraps the spawn in 'shopt -s execfail' + a bounded exec-retry loop (~5s). A failed exec in non-interactive bash otherwise terminates the shell immediately, so execfail is required for the loop to run. exec replaces the shell on success, so the healthy path is unchanged; only a transient busy state retries. Clears ETXTBSY once the writer's fd closes within the window. - uploadAndVerifyFile sweeps stale '<binary>.tmp.*' files (find -delete, which no-ops cleanly on an empty match under any login shell) before installing, so interrupted attempts stop accumulating temp litter. Durable fix for the orphaned-writer source is server-side (ClientAliveInterval on the remote sshd); these make the client tolerate it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… installs On a shared host, multiple workspaces/users can bootstrap the same agent binary path concurrently, each writing its own .tmp.<rand>. The unconditional sweep could delete a sibling's in-flight upload. Restrict the sweep to temp files older than 5 minutes (-mmin +5): a genuine orphan is minutes old, an in-flight upload is seconds old, so concurrent installs are never clobbered. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
moreih29
added a commit
that referenced
this pull request
Jun 2, 2026
Follow-up to the idle watchdog (e528ccd). Review surfaced six issues across correctness, scope, and recovery; this addresses all of them. #1 Monotonic clock: lastInbound was stored as wall-clock UnixNano and compared via time.Since on a time.Unix value, which silently falls back to wall-clock arithmetic. A laptop waking from sleep (local agent) or an NTP step (remote) made elapsed jump past the limit and reap a live session. Now anchored to a monotonic startMono via stampInbound/ idleElapsed. #2 Scope: the watchdog ran for local agents too, where parent death already arrives as stdin EOF (plus Pdeathsig on Linux) — pure downside. Now gated on a new --idle-watchdog flag the SSH launch sets and the local launch omits. #3 Threshold: 60s limit with 3-ping margin was tight enough that a stalled Electron main thread (ping is event-loop bound; ssh ServerAlive is not) could trip it. Widened to 90s limit / 15s ping (6 slots), with the check interval decoupled to limit/6 so the kill window stays tight. #4 Contract: client ping was gated on heartbeat advertisement, the agent watchdog on nothing — drift-prone. The agent now advertises idleWatchdogMs in the Ready frame; the client pings iff positive, at idleWatchdogMs/6. #5 Orphans: drainAndExit reaches os.Exit, which skips the `defer pty.Close()`. Linux survived via Pdeathsig; a darwin remote (supported, shipped) had only SIGHUP-on-fd-close, so SIGHUP-ignoring children orphaned. PTY cleanup is now a shutdown hook that SIGKILLs each process group on every OS. #6 Recovery: the watchdog exited 0, which the client's handleClose treats as a clean terminal exit (no reconnect). On a false positive (client alive but stalled) the session died permanently. Now exits 75 (EX_TEMPFAIL) so the client reconnects. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release v0.5.2 — file/image fixes, SSH connect progress UI, and SSH bootstrap hardening.
Versioned as patch per maintainer decision (contains feat commits; see RELEASING.md note).
Added
Changed
Fixed
Text file busy(exit 126) on agent exec: retry exec (shopt -s execfail) past a transient writer; sweep only stale (>5min) temp uploads so concurrent installs aren't clobbered.Protocol & Remote
agent-0.5.2-<os>-<arch>on first connect.ClientAliveInterval/ClientAliveCountMaxon the server sshd is the durable fix for orphaned sessions.