fix(ssh): stop benign tar warnings from aborting remote sync#569
Conversation
Remote sync streamed 'tar cf' over SSH into 'tar xf -' and treated any non-zero exit from either tar as fatal, deleting the extracted temp dir. macOS bsdtar exits 1 for self-referential hardlinks (Antigravity .system_generated logs) yet extracts everything else fine, so a whole multi-GB sync was discarded over a skipped junk file. The remote 'tar cf' likewise exits 1 for "file changed as we read it", aborting through cleanup()/Wait(). Exit code 1 is not a safe warning boundary: bsdtar also returns 1 for truncated and corrupt archives, so tolerating exit 1 would persist partial transfers as successful syncs and poison the authoritative mtime skip cache. Replace the local 'tar xf' with a stdlib archive/tar extractor that skips only self-referential hardlinks and fails closed on unexpected EOF, bad headers, path escapes, and short writes. Classify the remote tar exit by stderr, tolerating only file-changed/removed warnings (plus the delayed-exit summary as attached fallout) and treating everything else as fatal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
The archive/tar extractor regressed two behaviors of the tar xf path it replaced: - It wrote files with the current time instead of the archived mtime. Remote sync's incremental skip cache keys on (path, mtime) and the engine treats it as authoritative, so every sync produced fresh mtimes that never matched and nothing was ever skipped (caught by the TestSSHSyncIncremental integration test). Restore the archived mtime with os.Chtimes after each regular file is written. - It recreated symlinks, which CodeQL flagged (go/unsafe-unzip-symlink) because an extracted symlink can redirect a later write outside the extraction dir. Symlinks are not session data and any file they alias is extracted on its own, so skip them entirely; this removes the write-through risk rather than guarding it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
tar.Reader.Next normalizes the deprecated TypeRegA ('\x00') marker to
TypeReg (or TypeDir) before extractEntry sees the header, so the
extractor needs no special case for it and adding one would be dead
code. Add a regression test that authors an entry as TypeRegA and
asserts it still extracts, guarding the assumption that extraction can
rely on the reader's normalization.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
remoteTarStderrBenign classified a stderr line as benign if it contained a benign phrase anywhere in the line, so a real failure on a file whose path contained the phrase (e.g. ".../file changed as we read it: Cannot open: Permission denied") was suppressed, letting an incomplete download be processed as a successful sync and persisted to the authoritative skip cache. Match the phrase as a suffix after the "<path>: " separator, and the end-of-run summary as a trailing line (tolerating a trailing period from bsdtar). A benign phrase embedded in a path can no longer mask a real error reported for that path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
GNU tar capitalizes inconsistently: create.c emits "File removed before we read it" with a capital F while "file changed as we read it" is lowercase. The classifier stored only lowercase forms and matched case-sensitively, so a real, benign file-removed warning was treated as fatal and aborted the sync -- a common case for active session dirs whose files rotate or are deleted mid-archive. Lowercase the stderr line before matching and store all benign phrases lowercase. Verified the wording against GNU tar's create.c. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Remote sync streamed
tar cfover SSH intotar xf -and treated any non-zero exit from either tar as fatal, deleting the extracted temp directory. macOS bsdtar exits 1 on self-referential hardlinks (Antigravity.system_generatedlogs) while still extracting everything else, so an entire multi-GB sync was discarded over skipped junk files. The remotetar cfexits 1 the same way for "file changed as we read it", aborting throughcleanup()/Wait().Exit code 1 is not a safe warning boundary: bsdtar also returns 1 for truncated and corrupt archives. Tolerating exit 1 would persist partial transfers as successful syncs and poison the mtime skip cache, which the sync engine treats as authoritative.
tar xfwith a stdlibarchive/tarextractor (internal/ssh/extract.go) that skips only self-referential hardlinks and fails on unexpected EOF, malformed headers, paths escaping the temp dir, escaping symlinks, and short writes; partial files are removed on error.commandErrorand classify it (remoteTarStderrBenign): a non-zero remote exit is tolerated only when every stderr line is "file changed"/"file removed as we read it" plus the delayed-exit summary as attached fallout. Everything else stays fatal.🤖 Generated with Claude Code