Skip to content

Release v0.5.2#5

Merged
moreih29 merged 11 commits into
mainfrom
develop
Jun 2, 2026
Merged

Release v0.5.2#5
moreih29 merged 11 commits into
mainfrom
develop

Conversation

@moreih29
Copy link
Copy Markdown
Owner

@moreih29 moreih29 commented Jun 2, 2026

Release v0.5.2 — file/image fixes, SSH connect progress UI, and SSH bootstrap hardening.

Versioned as patch per maintainer decision (contains feat commits; see RELEASING.md note).

Added

  • SSH agent bootstrap progress UI for registered workspaces (panel + sidebar dot).
  • Bootstrap progress in the add-workspace flow (new-connection form + saved-connection list), keyed by a client-minted progressId; rendered inline in the connecting row.
  • Agent idle watchdog (60s self-terminate) + 20s client keepalive ping to reap orphaned remote agents.

Changed

  • Readable file size cap raised 5MB → 50MB (drives image/preview/search gates).

Fixed

  • Image preview left-edge clipping; oversized images now show a size-specific error instead of a generic one.
  • Terminal tab title now updates for inline (normal-screen) TUIs like Claude Code over SSH.
  • Intermittent SSH connect failures: retry the atomic agent install when sftp's silent failure breaks the rename.
  • Text file busy (exit 126) on agent exec: retry exec (shopt -s execfail) past a transient writer; sweep only stale (>5min) temp uploads so concurrent installs aren't clobbered.

Protocol & Remote

  • 첫 SSH 부팅 재업로드 필요: agent binary name derives from the app version, so existing SSH workspaces re-upload agent-0.5.2-<os>-<arch> on first connect.
  • ssh ServerAlive keepalive added to agent/bootstrap connections. For shared multi-user hosts, setting ClientAliveInterval/ClientAliveCountMax on the server sshd is the durable fix for orphaned sessions.

moreih29 and others added 11 commits June 2, 2026 10:09
Three related file/image viewing fixes:

- Raise MaxReadableFileSize 5MB -> 50MB (and the renderer mirror
  MAX_READABLE_FILE_SIZE) so larger files and images open. Also raise the
  agent's inbound NDJSON scanner cap 4MiB -> 64MiB; the previous 4MiB cap
  was already below the 5MB limit, silently breaking writeFile for files
  4-5MB. Reads are unaffected by the scanner (they travel outbound).

- ImagePreview: replace flex center alignment (items-center/justify-center)
  with safe alignment (items-center-safe/justify-center-safe). When a zoomed
  image overflows the scroll container, centering pushed the top/left edges
  into negative scroll space that scrollLeft/scrollTop can never reach,
  leaving part of the image unviewable. Safe alignment falls back to start
  on overflow so the whole image is scrollable.

- ImagePreview: on image load failure, stat the file and show a
  size-specific message ("too large, max 50 MB") instead of a generic
  "could not load image", matching the text editor's TOO_LARGE message.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code sets its window title via OSC 2 while staying in the NORMAL
screen buffer (it never enters the alternate screen). The onTitleChange
handler's alt-screen guard (`buffer.active.type !== "alternate" → return`)
dropped those titles, and since claude emits no alt-screen enter, the
foregroundProcess fallback never fired either — so the tab name never
updated. lazygit/lazydocker/yazi were unaffected (they use the alt screen).
This was transport-independent; SSH was a red herring.

Replace the binary alt-screen guard with classifyOscTitle():
- alternate + non-shell-like  -> apply (unchanged)
- alternate + shell-like      -> ignore (alt-enter RPC labels these)
- normal   + shell-like       -> clear (prompt; also resets the tab when an
                                 inline TUI exits)
- normal   + non-shell-like   -> confirm via foregroundProcess, applying only
                                 when a real (non-login-shell) program holds
                                 the foreground, so starship-style preexec
                                 command echoes can't hijack the tab.

Decision logic extracted to pure functions (isLoginShell, classifyOscTitle,
foregroundConfirmsTitle) with unit tests; end-to-end behavior over a live
PTY still needs manual smoke (claude/lazygit/yazi/plain shell).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Connecting to an SSH workspace uploads the agent binary / Node runtime /
LSP archives to the remote, which can take seconds. The UI showed only a
static "connecting", so users couldn't tell it was working and sometimes
force-quit mid-upload (leaving a busy binary behind).

The bootstrap already emitted onProgress events, but the agent-bootstrap
path never forwarded them (only the LSP path did). Wire them through,
mirroring the existing lsp.bootstrap.progress pattern:

- New workspaceId-scoped IPC event workspace/connectionProgress
  { workspaceId, name, phase, bytesDone?, bytesTotal? }, reusing the
  bootstrap phase enum. Because both connect flows (add-new and
  app-startup reconnect) funnel through startSshProvider, one event
  covers both.
- WorkspaceManager.startSshProvider passes onProgress →
  broadcastConnectionProgress; main/index forwards deps.onProgress into
  ensureRemoteAgent.
- Renderer store tracks connectionProgressByWorkspaceId, cleared when the
  connection reaches a terminal status.
- Workspace panel placeholder renders phase label + artifact size + a bar
  (determinate only when 0<bytesDone<bytesTotal — the transport reports
  uploads as 0→total, not incremental, so we don't fake a smooth %;
  otherwise an indeterminate animated bar). Sidebar dot pulses while
  connecting.

Store reducer covered by unit tests; live connect smoke is manual.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two hardening layers for the case where the client dies abnormally
(force-kill, sleep, network drop) while connected to an SSH workspace.

L1 — connection keepalive: add ServerAliveInterval=15 / ServerAliveCountMax=3
to the agent channel, the persistent ControlMaster, and bootstrap transport
commands. Previously there was no keepalive, so a dead client left the remote
agent (holding its binary) alive until the kernel's default TCP timeout
(hours), which then blocked the next launch's re-upload. Now a dead peer is
detected in ~45s, ssh exits, and the remote agent gets stdin EOF and shuts
down (killing its PTY children).

L3 — atomic agent install: upload each artifact to a unique temp path in the
same directory, then `mv -f` it into place. A rename over a file that a
lingering old agent is still executing succeeds (the running process keeps the
old inode), so a stale remote agent can never block reinstall with ETXTBSY
("Text file busy") — the exact failure seen after a force-quit. Same-dir
rename is atomic, so there's no missing-file window either. No `pkill`, so a
co-tenant session's agent on the same host is never killed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Defense-in-depth (L2) for the case where the client is gone but the SSH
connection lingers without delivering stdin EOF — a hung client process or a
connection the kernel hasn't yet torn down. Previously the remote agent would
keep running (holding its binary) indefinitely, since it had no liveness check
of its own.

Agent: StartIdleWatchdog self-terminates (via drainAndExit, which kills PTY
children) when no inbound request line arrives within 60s. Run() stamps
lastInbound on every received line, so any real traffic resets it. drainAndExit
now calls an injectable `exit` (default os.Exit) so the watchdog and drain
paths are unit-testable without killing the runner.

Client: pipe.ts pings the agent every 20s (fire-and-forget `ping`, a no-op
handler registered on the agent) once heartbeat is enabled, so a healthy but
idle session keeps resetting the watchdog (~3 pings per 60s window). The timer
is unref'd and cleared on dispose/fail. Client and remote agent are always the
same build (the app uploads its own agent), so the `ping` method is always
present — no version-skew concern.

Combined with the L1 keepalive (ssh ServerAlive) this closes both paths: L1
reaps the connection at the SSH layer (~45s); L2 reaps the agent even if the
connection itself never reports dead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sftp exits 0 even when an individual put silently fails, so a transient
upload error left the temp file missing; the subsequent 'mv -f tmp final'
then threw 'no such file' and aborted the whole bootstrap (intermittent
'SSH transport failed' that cleared on a manual retry). The atomic-install
change (ef5c26c) moved the failure ahead of the sha256 verify, removing
the pre-existing retry resilience.

Wrap each upload->rename->verify pass in try/catch so any failure (missing
temp, rename error, sha mismatch) retries the full upload instead of
propagating, and best-effort rm the orphaned temp between attempts so
failed runs never litter .tmp.<rand> beside the installed binary. The
sha256 check remains the sole correctness gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The add-workspace dialog only showed a static '연결 중' spinner while the
agent binary uploaded/verified — the heavy work happens inside
openBrowseSession, which (unlike registered workspaces' startSshProvider)
never wired an onProgress callback. And it could not: progress events are
keyed by workspaceId, but during 'add' no workspaceId (or sessionId) exists
until the call returns.

Key progress by a client-minted correlation id instead:
- openBrowseSession accepts an optional progressId; the handler forwards an
  onProgress that broadcasts ssh.browseProgress keyed by that id.
- The renderer mints the id before calling, subscribes via
  subscribeSshBrowseProgress, and renders the progress bar while connecting.

Both add entry points are covered (new-connection form + saved-connection
list). Extracted BootstrapProgressBar into a shared presentational component
so the panel and the dialog render identical progress; the panel keeps its
absolute-positioned placement via a className prop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ng row

The connection-list progress bar sat at the top of the list while the row
still showed a static '연결 중…'. Move it into the connecting row: the row's
subtitle is replaced by the live phase label + progress bar once events
arrive (falling back to '연결 중…' until the first event). Per-row error
strip already surfaces failures in place, so a row now shows how far the
bootstrap got before any failure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root cause of intermittent connect failures to some hosts: when a bootstrap
SSH connection goes half-open (client gives up via keepalive, server has no
ClientAlive so it never reaps the session), the remote sftp-server lingers
holding the just-uploaded agent binary's inode open for WRITE. The next
connect then either re-uploads (leaving '.tmp.<rand>' litter the dead
connection's rm could not clean) or execs a binary another writer still
holds — failing with exit 126 'Text file busy'.

Two client-side mitigations:
- buildRemoteAgentCommand wraps the spawn in 'shopt -s execfail' + a bounded
  exec-retry loop (~5s). A failed exec in non-interactive bash otherwise
  terminates the shell immediately, so execfail is required for the loop to
  run. exec replaces the shell on success, so the healthy path is unchanged;
  only a transient busy state retries. Clears ETXTBSY once the writer's fd
  closes within the window.
- uploadAndVerifyFile sweeps stale '<binary>.tmp.*' files (find -delete, which
  no-ops cleanly on an empty match under any login shell) before installing,
  so interrupted attempts stop accumulating temp litter.

Durable fix for the orphaned-writer source is server-side (ClientAliveInterval
on the remote sshd); these make the client tolerate it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… installs

On a shared host, multiple workspaces/users can bootstrap the same agent
binary path concurrently, each writing its own .tmp.<rand>. The unconditional
sweep could delete a sibling's in-flight upload. Restrict the sweep to temp
files older than 5 minutes (-mmin +5): a genuine orphan is minutes old, an
in-flight upload is seconds old, so concurrent installs are never clobbered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@moreih29 moreih29 merged commit 5af12be into main Jun 2, 2026
1 check passed
moreih29 added a commit that referenced this pull request Jun 2, 2026
Follow-up to the idle watchdog (e528ccd). Review surfaced six issues
across correctness, scope, and recovery; this addresses all of them.

#1 Monotonic clock: lastInbound was stored as wall-clock UnixNano and
compared via time.Since on a time.Unix value, which silently falls back
to wall-clock arithmetic. A laptop waking from sleep (local agent) or an
NTP step (remote) made elapsed jump past the limit and reap a live
session. Now anchored to a monotonic startMono via stampInbound/
idleElapsed.

#2 Scope: the watchdog ran for local agents too, where parent death
already arrives as stdin EOF (plus Pdeathsig on Linux) — pure downside.
Now gated on a new --idle-watchdog flag the SSH launch sets and the local
launch omits.

#3 Threshold: 60s limit with 3-ping margin was tight enough that a
stalled Electron main thread (ping is event-loop bound; ssh ServerAlive
is not) could trip it. Widened to 90s limit / 15s ping (6 slots), with
the check interval decoupled to limit/6 so the kill window stays tight.

#4 Contract: client ping was gated on heartbeat advertisement, the agent
watchdog on nothing — drift-prone. The agent now advertises idleWatchdogMs
in the Ready frame; the client pings iff positive, at idleWatchdogMs/6.

#5 Orphans: drainAndExit reaches os.Exit, which skips the `defer
pty.Close()`. Linux survived via Pdeathsig; a darwin remote (supported,
shipped) had only SIGHUP-on-fd-close, so SIGHUP-ignoring children
orphaned. PTY cleanup is now a shutdown hook that SIGKILLs each process
group on every OS.

#6 Recovery: the watchdog exited 0, which the client's handleClose treats
as a clean terminal exit (no reconnect). On a false positive (client
alive but stalled) the session died permanently. Now exits 75
(EX_TEMPFAIL) so the client reconnects.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant