Skip to content

fix(ssh): stop benign tar warnings from aborting remote sync#569

Merged
wesm merged 5 commits into
mainfrom
fix/remote-sync-bug
May 29, 2026
Merged

fix(ssh): stop benign tar warnings from aborting remote sync#569
wesm merged 5 commits into
mainfrom
fix/remote-sync-bug

Conversation

@wesm
Copy link
Copy Markdown
Member

@wesm wesm commented May 29, 2026

Remote sync streamed tar cf over SSH into tar xf - and treated any non-zero exit from either tar as fatal, deleting the extracted temp directory. macOS bsdtar exits 1 on self-referential hardlinks (Antigravity .system_generated logs) while still extracting everything else, so an entire multi-GB sync was discarded over skipped junk files. The remote tar cf exits 1 the same way for "file changed as we read it", aborting through cleanup()/Wait().

Exit code 1 is not a safe warning boundary: bsdtar also returns 1 for truncated and corrupt archives. Tolerating exit 1 would persist partial transfers as successful syncs and poison the mtime skip cache, which the sync engine treats as authoritative.

  • Replace the local tar xf with a stdlib archive/tar extractor (internal/ssh/extract.go) that skips only self-referential hardlinks and fails on unexpected EOF, malformed headers, paths escaping the temp dir, escaping symlinks, and short writes; partial files are removed on error.
  • Carry the remote tar exit and its stderr in a typed commandError and classify it (remoteTarStderrBenign): a non-zero remote exit is tolerated only when every stderr line is "file changed"/"file removed as we read it" plus the delayed-exit summary as attached fallout. Everything else stays fatal.

🤖 Generated with Claude Code

Remote sync streamed 'tar cf' over SSH into 'tar xf -' and treated any non-zero exit from either tar as fatal, deleting the extracted temp dir. macOS bsdtar exits 1 for self-referential hardlinks (Antigravity .system_generated logs) yet extracts everything else fine, so a whole multi-GB sync was discarded over a skipped junk file. The remote 'tar cf' likewise exits 1 for "file changed as we read it", aborting through cleanup()/Wait().

Exit code 1 is not a safe warning boundary: bsdtar also returns 1 for truncated and corrupt archives, so tolerating exit 1 would persist partial transfers as successful syncs and poison the authoritative mtime skip cache.

Replace the local 'tar xf' with a stdlib archive/tar extractor that skips only self-referential hardlinks and fails closed on unexpected EOF, bad headers, path escapes, and short writes. Classify the remote tar exit by stderr, tolerating only file-changed/removed warnings (plus the delayed-exit summary as attached fallout) and treating everything else as fatal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread internal/ssh/extract.go Fixed
Comment thread internal/ssh/extract.go Fixed
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 29, 2026

roborev: Combined Review (8e05a4c)

High: The PR has one blocking issue: the new in-process tar extraction drops source mtimes, which can break remote sync incremental behavior.

High

  • internal/ssh/extract.go:159
    The custom tar extractor writes files without restoring hdr.ModTime, unlike the previous tar xf path. Remote sync depends on source mtimes for skip-cache checks, file_mtime, and parser timestamp fallbacks, so unchanged remote files may appear newly modified and session activity metadata may be rewritten to sync time.

    Fix: After successfully writing each regular file, call os.Chtimes(target, hdr.ModTime, hdr.ModTime) when the header has a valid mod time. Add a test that verifies extracted files preserve tar mtimes.


Synthesized from 2 reviews (agents: codex | types: default, security)

The archive/tar extractor regressed two behaviors of the tar xf path
it replaced:

- It wrote files with the current time instead of the archived mtime.
  Remote sync's incremental skip cache keys on (path, mtime) and the
  engine treats it as authoritative, so every sync produced fresh
  mtimes that never matched and nothing was ever skipped (caught by
  the TestSSHSyncIncremental integration test). Restore the archived
  mtime with os.Chtimes after each regular file is written.

- It recreated symlinks, which CodeQL flagged (go/unsafe-unzip-symlink)
  because an extracted symlink can redirect a later write outside the
  extraction dir. Symlinks are not session data and any file they
  alias is extracted on its own, so skip them entirely; this removes
  the write-through risk rather than guarding it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 29, 2026

roborev: Combined Review (1e86fbc)

Medium issue found: the new tar extractor skips a valid regular-file type.

Medium

  • internal/ssh/extract.go:58: The extractor only handles tar.TypeReg, but Go also treats tar.TypeRegA ('\x00') as a regular file type. Archives using that valid typeflag will silently skip session files instead of extracting them.
    • Fix: Handle both cases, e.g. case tar.TypeReg, tar.TypeRegA:, and add test coverage with an entry using Typeflag: tar.TypeRegA.

Synthesized from 2 reviews (agents: codex | types: default, security)

tar.Reader.Next normalizes the deprecated TypeRegA ('\x00') marker to
TypeReg (or TypeDir) before extractEntry sees the header, so the
extractor needs no special case for it and adding one would be dead
code. Add a regression test that authors an entry as TypeRegA and
asserts it still extracts, guarding the assumption that extraction can
rely on the reader's normalization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 29, 2026

roborev: Combined Review (58f9070)

Medium severity finding needs attention before merge.

Medium

  • internal/ssh/transfer.go:213 - remoteTarStderrBenign treats any stderr line containing a benign phrase as benign. This can suppress real tar failures when the phrase appears in a path, e.g. tar: .../file changed as we read it: Cannot open: Permission denied, causing an incomplete download to be processed as successful.

    Fix: Match the actual tar diagnostic text more precisely, such as checking the suffix after the path separator (: file changed as we read it) or using anchored patterns for known tar output formats instead of substring matching the entire line.


Synthesized from 2 reviews (agents: codex | types: default, security)

remoteTarStderrBenign classified a stderr line as benign if it
contained a benign phrase anywhere in the line, so a real failure on a
file whose path contained the phrase (e.g. ".../file changed as we
read it: Cannot open: Permission denied") was suppressed, letting an
incomplete download be processed as a successful sync and persisted to
the authoritative skip cache.

Match the phrase as a suffix after the "<path>: " separator, and the
end-of-run summary as a trailing line (tolerating a trailing period
from bsdtar). A benign phrase embedded in a path can no longer mask a
real error reported for that path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 29, 2026

roborev: Combined Review (076e669)

Medium issue found: remote tar warning classification may miss a benign GNU tar variant.

Medium

  • internal/ssh/transfer.go:184 - GNU tar emits the vanished-file warning as File removed before we read it with a capital F, but the classifier only accepts the lowercase form. This benign case can still abort remote sync.
    • Fix: Normalize tar warning suffixes case-insensitively, or include the capitalized GNU tar variant in benignRemoteTarPrimary.

Synthesized from 2 reviews (agents: codex | types: default, security)

GNU tar capitalizes inconsistently: create.c emits "File removed
before we read it" with a capital F while "file changed as we read it"
is lowercase. The classifier stored only lowercase forms and matched
case-sensitively, so a real, benign file-removed warning was treated
as fatal and aborted the sync -- a common case for active session
dirs whose files rotate or are deleted mid-archive.

Lowercase the stderr line before matching and store all benign phrases
lowercase. Verified the wording against GNU tar's create.c.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 29, 2026

roborev: Combined Review (9964f2c)

No Medium, High, or Critical issues found.

Both reviewers found the changes clean.


Synthesized from 2 reviews (agents: codex | types: default, security)

@wesm wesm merged commit 2197322 into main May 29, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants