Skip to content

macOS: skip set_readwrite_recursive walk after clonefile (~1.85x materialization speedup)#3

Merged
erneestoc merged 1 commit into
ec/macos-clonefile-optimizationsfrom
ec/macos-skip-chmod-walk
May 19, 2026
Merged

macOS: skip set_readwrite_recursive walk after clonefile (~1.85x materialization speedup)#3
erneestoc merged 1 commit into
ec/macos-clonefile-optimizationsfrom
ec/macos-skip-chmod-walk

Conversation

@erneestoc
Copy link
Copy Markdown
Owner

Stacked on top of TraceMachina#2338.

Summary

The macOS clonefile(2) fast path added in TraceMachina#2338 is followed by a recursive set_readwrite_recursive chmod walk that makes every file in the cloned tree writable. On real Bazel SwiftCompile shapes (~2000 inputs / ~466 MB) that walk accounts for ~46% of materialization time — ~33 µs per file, ~67 ms per action.

This PR replaces the walk with a single chmod(2) on the destination root. Existing entries inside the clone inherit the source's read-only mode (0o555 dirs, 0o444 files). The worker can still create the action's declared output files because the root itself is 0o755.

Why this is correct (and not a regression)

Leaving inputs read-only is what Bazel itself does:

  • linux-sandbox bind-mounts inputs read-only — kernel rejects writes regardless of file mode.
  • darwin-sandbox / sandbox-exec denies writes outside declared output paths via Seatbelt.
  • REAPI Action.output_files / Action.output_directories are the only paths an action may write to.

Workers without sandbox primitives (Nativelink, bazel-remote, BuildBuddy, Buildfarm) substitute file mode for kernel enforcement: 0o444 files / 0o555 dirs is the hermeticity enforcement. The original chmod walk weakened that — making inputs writable to "be nice" to misbehaved actions. Skipping the walk brings Nativelink in line with the REAPI contract and with what Bazel's own sandboxes do.

An action that does try to mutate an input now hits EACCES, which is the correct REAPI behavior — same failure mode as on Bazel's own sandbox.

Bench evidence

From nativelink-util/benches/chmod_strategy.rs on ec/macos-clonefile-optimizations-benchmarks:

shape walk (current) toplevel_only (this PR) speedup
small_flat (64 files, 64 KB) 4.66 ms 2.61 ms 1.79x
pcm_cluster (219 files, 40 MB) 15.17 ms 8.19 ms 1.85x
medium_flat (635 files, 180 MB) 46.36 ms 25.10 ms 1.85x
large_flat (1978 files, 466 MB) 147.39 ms 80.17 ms 1.84x

Walk fraction is ~46% across shapes regardless of file size — confirming the cost is per-file syscall, not per-byte.

Scope

  • macOS only. Linux/Windows fall straight through to hardlink_directory_tree_recursive and never ran set_readwrite_recursive in this code path.
  • set_readwrite_recursive stays public — nativelink-worker/src/directory_cache.rs:451 still uses it on the source side during eviction.
  • No worker-side write wrapper, no audit phase. Trusting Bazel's hermeticity contract — if a future audit discovers worker-internal writes that hit EACCES inside the cloned tree, that's a bug to fix at the write site, not paper over by mutating input perms.

Test plan

  • cargo test -p nativelink-util --lib fs_util:: — 9/9 pass on macOS arm64
  • cargo test -p nativelink-worker --lib directory_cache:: — 2/2 pass
  • cargo build -p nativelink-util -p nativelink-worker — clean
  • cargo clippy -p nativelink-util --all-targets -- -D warnings — clean
  • cargo clippy -p nativelink-worker --lib -- -D warnings — clean (pre-existing test-only errors in multi_worker_cas_test.rs unrelated)

New tests:

  • test_clonefile_root_writable_inputs_readonly — root 0o755, subdirs 0o555, files 0o444 (replaces the old test_clonefile_dest_is_writable which assumed subdirs would be made writable by the walk).
  • test_clonefile_root_accepts_new_files — worker can create outputs at the root even though everything inside the clone is read-only.
  • test_clonefile_input_mutation_fails — writes to existing input files fail with PermissionDenied — encodes the hermeticity contract.

The macOS clonefile fast path was followed by a recursive chmod walk that
made every file in the cloned tree writable (0o644 / 0o755). On real
Bazel input shapes (~2000-file SwiftCompile) that walk accounted for
~46% of materialization time — ~33 µs per file, ~67 ms per action.

Replace the walk with a single chmod(2) on the destination root.
Existing entries inherit the source's read-only mode (0o555 dirs,
0o444 files). The worker can still create the action's declared output
files inside the root because the root itself is 0o755.

This matches the hermeticity contract enforced by Bazel's local
sandbox (linux-sandbox bind-mounts inputs read-only;
darwin-sandbox / sandbox-exec denies writes outside declared output
paths) and the REAPI Action.output_files / output_directories
semantics: actions write only to declared outputs, never mutate
inputs. An action that does try to mutate an input now hits EACCES,
which is the correct REAPI behavior — same failure mode as on
Bazel's own sandbox.

Bench (nativelink-util/benches/chmod_strategy.rs on the bench branch),
toplevel_only vs full walk:

  shape                         walk      toplevel_only   speedup
  small_flat   (64 files)       4.66 ms   2.61 ms         1.79x
  pcm_cluster  (219 files)     15.17 ms   8.19 ms         1.85x
  medium_flat  (635 files)     46.36 ms  25.10 ms         1.85x
  large_flat   (1978 files)   147.39 ms  80.17 ms         1.84x

set_readwrite_recursive stays public — directory_cache.rs:451 still
uses it on the source side during eviction.

Tests:
- test_clonefile_root_writable_inputs_readonly: root 0o755, subdirs
  0o555, files 0o444 (replaces the old test_clonefile_dest_is_writable
  which assumed subdirs would be made writable).
- test_clonefile_root_accepts_new_files: worker can create outputs at
  the root even though everything inside the clone is read-only.
- test_clonefile_input_mutation_fails: writes to existing input files
  fail with PermissionDenied — encodes the hermeticity contract.
@erneestoc erneestoc merged commit 13b62f5 into ec/macos-clonefile-optimizations May 19, 2026
1 check passed
erneestoc added a commit that referenced this pull request May 22, 2026
Reverts the bounded-concurrency construct on this POC branch. After #1 makes construct metadata-only, intra-tree parallelism is bounded by APFS metadata serialization; on a busy worker (inter-action concurrency already saturates the box) 64-wide spawn_blocking fan-out risks oversubscription and stealing cycles from the compiles. Keeping #1/#2/TraceMachina#5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant