Skip to content

Bubble errors up from worker goroutines; exit non-zero on failure#66

Merged
dolph merged 3 commits into
mainfrom
claude/fix-issue-6-bubble-errors
May 18, 2026
Merged

Bubble errors up from worker goroutines; exit non-zero on failure#66
dolph merged 3 commits into
mainfrom
claude/fix-issue-6-bubble-errors

Conversation

@dolph
Copy link
Copy Markdown
Owner

@dolph dolph commented May 18, 2026

Summary

  • Replaces every log.Fatal* outside of main in find_replace.go and file_handling.go with returned errors so deferred cleanup runs on failure.
  • Error-aggregation primitive: a small errAccumulator (sync.Mutex + []error, exposed via errors.Join) embedded on findReplace. Each error is also log.Print'd at the point of failure so the existing operator-visible stderr stream (Renaming/Rewriting lines and per-error context) is preserved; the join exists so main can surface a non-zero exit code, and so errors.Is / errors.As unwrap the entire accumulated chain. This matches the "log and continue" semantic the issue calls for and intentionally avoids errgroup (which stops on the first error). The struct stays small so the upcoming bounded-worker-pool change (Unbounded goroutine fan-out exhausts file descriptors and memory on large trees #7) can drop in without rearchitecting again.
  • main is now a thin os.Exit wrapper around run(args, stderr) int, making argument parsing and the success/failure exit code testable from inside the package.
  • File.Write defer os.Remove(tempName)s immediately after creating its temp file. On success the file no longer exists at tempName, so the remove is a no-op; on rename failure the temp file is cleaned up instead of being leaked.

Architectural shape (for #7 / #8 follow-ups)

Function Before After
NewFile log.Fatalf on Abs error (*File, error)
File.Info log.Fatalf on stat error (os.FileInfo, error)
File.Read log.Fatalf on open/seek/read (string, error)
File.Write log.Fatalf, no temp cleanup error, deferred temp os.Remove
WalkDir log.Fatalf on ReadDir logs + records on errAccumulator, continues siblings
HandleFile / RenameFile / ReplaceContents log.Fatalf error (walker logs + records)
main log.Fatal for usage; WalkDir(NewFile(".")) run(args, stderr) int; non-zero on any recorded error

Test plan

New tests (each verified to fail for the right reason against the previous behavior before the matching fix was landed):

  • TestWalkDir_PermissionDeniedSubdirContinues — a chmod 0 subdirectory no longer aborts the walk. The sibling subtree is still rewritten; the walker records an fs.ErrPermission referencing the failing path. Skips under root (chmod is a no-op there) and on Windows.
  • TestRenameFile_ReturnsErrorOnExistingDestination — an occupied destination yields an error rather than a log.Fatal. Source and destination both still exist afterwards.
  • TestWalkDir_BadRenameTargetDoesNotAbortSiblings — a sibling whose post-rename name is already occupied does not abort the rest of the tree; the walker logs and records the failure and continues with the free file.
  • TestWriteCleansUpTempFileOnRenameFailure — forcing the rename to fail (target is a non-empty directory; deterministic under both root and non-root on Linux) leaves no stray temp file behind. (Verified the test fails without the deferred os.Remove.)
  • TestRun_ExitsZeroOnSuccess, TestRun_ExitsNonZeroOnTraversalError, TestRun_BadArgCountPrintsUsagerun() returns 0 on a clean walk, non-zero on any recorded error, and prints the usage line to its stderr writer on bad arg count.

Dev loop:

  • gofmt -l . — no output
  • go vet ./... — no output
  • go build ./... — no output
  • go test -race ./...PASS (also verified by running the full suite under nobody so the root-guarded test exercises the real permission path)
  • ./build.sh — green, coverage 80.5%

Issues closed

Closes #6
Closes #11
Closes #5

The --strict flag mentioned in #6's body is intentionally not part of this PR (it's an additive CLI surface change requiring release:minor + a README update); the per-error-log-and-continue policy described in the issue is the default. File-able as a separate enhancement if desired.

https://claude.ai/code/session_01Tep5t8h97Q9KKbpLMbUEJr


Generated by Claude Code

Replace every log.Fatal* in find_replace.go and file_handling.go with
returned errors so deferred cleanup can run, then collect errors across
walker goroutines and surface them at main.

Architecture:

- File methods (NewFile, Info, Read, Write, plus a new error-returning
  Mode) now return (T, error). The walker calls Info on each child so
  Write can rely on cached Mode without re-statting.
- findReplace.{WalkDir,HandleFile,RenameFile,ReplaceContents} return
  errors (WalkDir records them on an embedded accumulator so its
  goroutines can stay fire-and-forget).
- errAccumulator is a sync.Mutex-guarded []error with an errors.Join
  exit point. Each error is also log.Print'd at the point of failure so
  the existing operator-visible stderr UX is preserved; the join exists
  so main can scrape stderr and so errors.Is unwraps the whole chain.
- main is now a thin os.Exit wrapper around a testable run(args, stderr)
  that returns 1 on bad arg count or any traversal error and 0 on a
  clean walk.
- File.Write now defer-Removes its temp file immediately after creating
  it; on a successful rename the file no longer exists at tempName so
  the deferred remove is a no-op, but on a rename failure the temp file
  is cleaned up instead of being leaked.

New tests (each was confirmed to fail for the right reason before the
matching production change was finalized):

- TestWalkDir_PermissionDeniedSubdirContinues: a chmod-0 subdirectory
  no longer aborts the walk; the sibling subtree is rewritten and the
  walker records an fs.ErrPermission referencing the failing path.
  Skips under root (where chmod 0 is bypassed) and Windows.
- TestRenameFile_ReturnsErrorOnExistingDestination: an occupied
  destination is refused with an error, not log.Fatal.
- TestWalkDir_BadRenameTargetDoesNotAbortSiblings: a sibling whose
  post-rename name is already occupied does not abort the rest of the
  tree; the walker logs and records the failure and continues.
- TestWriteCleansUpTempFileOnRenameFailure: forcing the rename step to
  fail (target is a non-empty directory; deterministic across users)
  leaves no stray temp file behind.
- TestRun_{ExitsZeroOnSuccess,ExitsNonZeroOnTraversalError,
  BadArgCountPrintsUsage}: run() returns 0 on a clean walk, non-zero
  when any error was recorded, and non-zero with a usage line on bad
  arg count.

Closes #6
Closes #11
Closes #5

https://claude.ai/code/session_01Tep5t8h97Q9KKbpLMbUEJr
@dolph dolph added the release:patch label May 18, 2026 — with Claude
claude added 2 commits May 18, 2026 02:56
The error-aggregation primitive introduced in this PR uses errors.Join,
which was added in Go 1.20. CI was pinned to 1.19 via go.mod and 'go vet'
failed with 'Join not declared by package errors'.

This is the minimum bump needed for the chosen primitive. The broader
'be on a supported Go release' work (Go 1.19 has been EOL since August
2023) stays as #28.
Since Go 1.20 (the new toolchain floor in this PR), the math/rand global
generator is automatically seeded by the runtime; explicit rand.Seed is
a deprecated no-op and staticcheck (run by golangci-lint) flags it as
SA1019.

The line was already vestigial -- left in place by the original commit
to keep the diff narrow. The toolchain bump in the previous commit makes
that compromise no longer viable.

strings.go's RandomString still uses math/rand.Intn; the change to
crypto/rand for filesystem paths remains tracked under issue #3.
@dolph dolph merged commit 5184f23 into main May 18, 2026
2 checks passed
@dolph dolph deleted the claude/fix-issue-6-bubble-errors branch May 18, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

2 participants