macOS: APFS clonefile fast path + concurrency cap + zero-byte fix for Bazel input materialization#2338
Merged
palfrey merged 8 commits intoMay 19, 2026
Conversation
On macOS, try APFS clonefile(2) before falling back to the existing per-file hardlink walk. clonefile is O(1) in tree size and uses copy-on-write, so subtree-cache hits no longer scale with input count. After a successful clone the destination is made writable (0o755/0o644) because the clone inherits the source's permissions and cached subtrees are 0o555/0o444. The COW semantics of clonefile mean writes to the destination do not affect the source, so this is safe. On EXDEV (cross-volume), ENOTSUP, or any other errno, we log at debug and fall through to hardlink_directory_tree_recursive. Linux and Windows paths are unchanged. Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit 13fcc0c).
hardlink_directory_tree now returns CloneMethod (Clonefile | Hardlink) so callers can see which kernel path was taken. DirectoryCache records the result in two atomic counters and exposes them via CacheStats. Without this telemetry, a silent fall-through from clonefile to per-file hardlinks (e.g. cache and workspace on different volumes, or APFS clonefile failing for any other reason) would be invisible. On Linux/Windows hardlink_directory_tree always returns Hardlink — no behavioral change. The new CacheStats fields default to zero on those platforms. Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit 13fcc0c).
Previously download_to_directory pushed every file-hardlink, subdir recursion, and symlink future onto an unbounded FuturesUnordered, then drained it. On macOS this produced thousands of parallel hardlink(2) calls fighting APFS's per-volume metadata lock — the observed exec-log shape was ~4 ms per input file at scale, consistent with serialized metadata mutations plus tokio scheduling overhead. This commit gates each directory level to at most 64 in-flight futures via stream::buffer_unordered(64). 64 is well above the inflection point on any modern Linux filesystem, so Linux is unaffected beyond replacing tokio scheduling overhead with simpler stream polling. Scope notes (vs PR TraceMachina#2243 ee85fdc): - The chunked has_with_results sub-change does not apply directly: the current code calls populate_fast_store per-digest, not a batched has_with_results. - Level-parallel BFS mkdir is not applied here; the recursion structure is unchanged. The 64-cap is per recursive call, not global. Deep trees can therefore still have 64 * depth in-flight futures. A full flatten pass is a follow-up. Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit ee85fdc), narrowed to fit the current code shape.
FilesystemStore (and several other CAS backends) refuse to store zero-byte blobs, so a get_part_unchunked for the zero-byte digest (af1349b9... / e3b0c449...) returns NotFound. Bazel input trees routinely contain empty marker/config files (.linksearchpaths, empty .env, .toml, etc.), so without this fix a single such file in any directory causes the entire DirectoryCache construction to fail — roughly 30% of cache attempts per PR TraceMachina#2243. Short-circuit create_file: if the digest is the zero-byte digest, write b"" to disk directly and never consult the CAS. Cross-platform correctness fix. Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit d198902).
5 tasks
Member
This PR is a lot more readable and reviewable than that one, and I'll take 4x I can review over 10x with harder to evaluate side-effects! Thanks for this, very happy with it. Will also be able to tick off DirectoryCache from #2034 after this merges. |
19 tasks
The pre-step to ensure dst's parent exists is only needed because clonefile(2) requires the parent to exist while dst itself must not. On non-macOS targets the function falls through to per-file hardlink, which calls fs::create_dir_all(dst_dir) further down — that already creates dst_dir along with any missing parents — so the pre-step was pure redundant syscalls on Linux/Windows. Wrap the block in `#[cfg(target_os = "macos")]` to skip it on platforms that don't need it. Behavior on macOS is unchanged; behavior on Linux/Windows loses one redundant `create_dir_all` per call. Per review feedback on TraceMachina#2338.
This was referenced May 18, 2026
palfrey
approved these changes
May 19, 2026
Merged
8 tasks
erneestoc
added a commit
to erneestoc/nativelink
that referenced
this pull request
May 19, 2026
Previously download_to_directory pushed every file's fetch+hardlink as
one fused future into a single FuturesUnordered. Each future ran the
fetch (populate_fast_store) and then the hardlink sequentially within
the same task. With per-future store-side serialization the fetch leg
becomes the bottleneck for SwiftCompile-style workloads (1978+ input
files at p95), where blob-fetch wall time dominates total action time.
This commit restructures download_to_directory into a two-phase
pipeline driven by try_join3:
1. Fetcher launches ALL missing-blob populate_fast_store calls up
front, bounded at DOWNLOAD_TO_DIRECTORY_FETCH_CONCURRENCY = 128
via stream::try_for_each_concurrent. Digests are deduplicated so a
blob referenced by N files is only fetched once. As each blob
arrives, every file index that needs it is sent through an mpsc
channel.
2. Hardlinker drains the channel and performs hardlink + chmod +
set_mtime, bounded at DOWNLOAD_TO_DIRECTORY_HARDLINK_CONCURRENCY
= 64 (the same cap that PR TraceMachina#2338 added to compose with APFS
metadata-lock behavior on macOS). Hardlinking starts as soon as
the first blob arrives - it does not wait for the fetcher to
finish.
3. Subdirectory recursion and symlink creation run in a separate
FuturesUnordered alongside the file pipeline.
Net win on a Mac-Mini RBE worker: fetch wall time becomes
max(per-blob latency) bounded by the 128-cap, not sum(per-blob latency)
serialized inside one future.
Concurrency caps compose: fetch (128) is network/store-bound where extra
in-flight requests are cheap; hardlink (64) is metadata-bound where
extra in-flight syscalls fight APFS's per-volume lock on macOS. The
two phases run independently with their own bound, so on cache-hot
runs only the hardlink leg is active and the fetch cap is irrelevant.
Tests:
- download_to_directory_fetches_blobs_concurrently: a per-blob 75 ms
sleep on the slow store. Serial fetch of 20 blobs would be 1500 ms;
concurrent fetch finishes well under 500 ms, and max_in_flight is
observed to reach >=4 (would be 1 if serial).
- download_to_directory_streams_hardlinks_while_fetching: 30 blobs
with 50 ms per-blob latency. Serial fetch+hardlink would be 1500 ms;
pipelined version finishes under 750 ms. Catches regression to a
fetch-all-then-hardlink-all two-phase shape that drops overlap.
Both tests use a minimal DelayedStore harness (StoreDriver wrapping
MemoryStore) added inline. This pulls in async-trait as a dev-dependency
(version 0.1.88, matching the rest of the workspace).
Scope notes (vs PR TraceMachina#2243 a8db953):
- The upstream commit depends on infrastructure introduced earlier in
the PR that is NOT being ported here: batch_read_small_blobs,
populate_fast_store_unchecked, get_file_entries_batch,
BATCH_READ_MAX_BLOB_SIZE, BatchReadBlobs / ByteStream partitioning.
These hooks are entangled with the peer-to-peer infrastructure and
proto changes we explicitly excluded. The upstream commit was
reduced to its core idea: launch all fetches upfront, hardlink as
blobs arrive.
- Upstream uses a Notify + shared HashSet<DigestInfo> "fetched_set"
to coordinate fetcher and producer. Here we use a simpler mpsc
channel: the fetcher sends file indices directly as their blob
becomes available. There is no separate producer step because there
is no batch-vs-bytestream partitioning to defer to a second pass.
- Follow-on upstream commits 7b093e8 and b64944a are excluded
per task scope - they depend on peer hints and proto changes.
Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit a8db953),
adapted to the current code shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
palfrey
pushed a commit
that referenced
this pull request
May 20, 2026
…ro-digest files) (#2346) * Resolve absolute symlinks in output upload instead of uploading raw targets When the worker's work directory is populated via DirectoryCache, output paths can be absolute symlinks pointing into the cache directory (/var/.../directory_cache/...). The previous output collection code uploaded these as raw SymlinkInfo with absolute targets that are meaningless on the Bazel client, causing "No such file or directory" errors when the client materialised the action result. Detect absolute symlinks in output paths and resolve them: upload directory contents as Tree protos and file contents as regular files. Relative symlinks (intentionally created by the action) are still preserved as symlinks. Updates upload_dir_and_symlink_test to use a relative symlink (the previous /dev/null absolute symlink is now resolved, breaking the old assertion) and adds a new upload_absolute_symlink_resolves_contents regression test that verifies an out-of-tree payload reachable via an absolute symlink lands in CAS as a file. Ported from upstream 0807153 (PR #2243). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix zero-digest files missing from worker execution directories FilesystemStore::get_file_entry_for_digest previously returned a synthetic FileEntry for zero-digest blobs pointing to a content_path that never exists on disk (FilesystemStore deliberately never persists zero-byte files). Downstream worker output-materialisation code that took the prefetched hardlink path would try to hard_link from this non-existent source, fail, and either silently produce a missing file or fall back without enough context to diagnose the failure. This is the OUTPUT-materialisation companion to PR #2338's DirectoryCache zero-byte fix, which lives on the input-fetch path. The two changes are complementary: #2338 covers the directory cache short-circuit; this covers the FilesystemStore API and the running_actions_manager fallback. Changes: - FilesystemStore::get_file_entry_for_digest now returns Code::NotFound for zero digests instead of a synthetic FileEntry, forcing callers to materialise empty files via fs::create_file rather than hard_linking from a phantom source. - running_actions_manager::download_to_directory adds err_tip context on the zero-digest fs::create_file / write_all so a failure here is attributable to the empty-file path. - Updates the existing get_file_entry_for_zero_digest test to assert the new NotFound behaviour and renames it to get_file_entry_for_zero_digest_returns_not_found. - Adds download_to_directory_zero_digest_empty_file_test that exercises an empty file declared inside an input directory and verifies it materialises on disk with zero bytes. This test intentionally does NOT collide with PR #2338's DirectoryCache zero-byte test (which validates a different short-circuit path). Ported from upstream 19d7e20 (PR #2243). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Close test coverage gaps for ported output-upload fixes Audit of the two ported commits (e57b675b, b1dc3a1c) flagged two correctness-relevant gaps. Both are now closed with strict assertions so the underlying bugs cannot silently regress. 1. upload_absolute_symlink_to_directory_uploads_tree The original e57b675b regression test covers absolute-symlink-to-file but not absolute-symlink-to-directory, which exercises an entirely separate code path (upload_directory + Tree proto serialisation). The new test creates an out-of-tree directory containing inner.txt, absolute-symlinks it into the work dir, runs the action, and: - asserts output_directory_symlinks and output_file_symlinks are BOTH empty (the symlink must NOT be preserved), - asserts output_folders contains exactly one entry at the symlink path, proving Tree-upload happened, - walks the uploaded Tree, locates inner.txt, fetches its blob from CAS and verifies the content. This proves the directory was actually traversed and uploaded — not stubbed. 2. download_to_directory_zero_digest_empty_file_test (strengthened) Extended the existing single-file test to cover: - a second sibling zero-digest file (proves the path is not a single-file fluke), - a zero-digest file nested inside a subdirectory (proves the recursive download_to_directory caller also handles NotFound from get_file_entry_for_digest — not just the top-level invocation), - strict per-file assertions: is_file, !is_symlink, len == 0, and a read-back that confirms the file is actually empty rather than a phantom dirent. Verdict on the dropped checks: - Dangling absolute symlink: source returns OutputType::None which is the correct graceful behaviour, but writing a test for it requires fabricating a broken symlink mid-action and is comparatively low value; left uncovered intentionally. - File mode on zero-digest materialisation: the worker uses fs::create_file with no explicit mode, so the result follows umask and would be brittle across CI environments. Length + type assertions are the stronger invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * running_actions_manager_test: use /dev/null in upload_dir_and_symlink_test Previous iterations of this PR sidestepped `/dev/null` by swapping the absolute-symlink fixture for either a relative symlink (lost coverage) or an out-of-tree real empty file (added scaffolding to dodge a corner case we should just test directly). Restore the original `ln -s /dev/null empty_sym` fixture. The post-fix worker code resolves the absolute symlink via `fs::metadata` (follows), sees a character device (`is_dir() == false`), and falls into the "upload as file" branch. `upload_file` opens `/dev/null` and reads to EOF — `/dev/null`'s contract is that reads return 0 bytes immediately — producing the canonical sha256 empty digest (e3b0c44…) with size 0. That's well-defined, harmless behavior and a real-world Bazel pattern (rules sometimes use `ln -sf /dev/null x` to create empty outputs). Test now locks in this contract directly, no scaffolding required. Drops the `external_root` / `external_file` setup since it was only needed to avoid `/dev/null`. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Marcus Eagan <marcuseagan@gmail.com>
MarcusSorealheis
added a commit
that referenced
this pull request
May 21, 2026
…rialization speedup) (#2349) * fs_util: skip set_readwrite_recursive walk after clonefile The macOS clonefile fast path was followed by a recursive chmod walk that made every file in the cloned tree writable (0o644 / 0o755). On real Bazel input shapes (~2000-file SwiftCompile) that walk accounted for ~46% of materialization time — ~33 µs per file, ~67 ms per action. Replace the walk with a single chmod(2) on the destination root. Existing entries inherit the source's read-only mode (0o555 dirs, 0o444 files). The worker can still create the action's declared output files inside the root because the root itself is 0o755. This matches the hermeticity contract enforced by Bazel's local sandbox (linux-sandbox bind-mounts inputs read-only; darwin-sandbox / sandbox-exec denies writes outside declared output paths) and the REAPI Action.output_files / output_directories semantics: actions write only to declared outputs, never mutate inputs. An action that does try to mutate an input now hits EACCES, which is the correct REAPI behavior — same failure mode as on Bazel's own sandbox. Bench (nativelink-util/benches/chmod_strategy.rs on the bench branch), toplevel_only vs full walk: shape walk toplevel_only speedup small_flat (64 files) 4.66 ms 2.61 ms 1.79x pcm_cluster (219 files) 15.17 ms 8.19 ms 1.85x medium_flat (635 files) 46.36 ms 25.10 ms 1.85x large_flat (1978 files) 147.39 ms 80.17 ms 1.84x set_readwrite_recursive stays public — directory_cache.rs:451 still uses it on the source side during eviction. Tests: - test_clonefile_root_writable_inputs_readonly: root 0o755, subdirs 0o555, files 0o444 (replaces the old test_clonefile_dest_is_writable which assumed subdirs would be made writable). - test_clonefile_root_accepts_new_files: worker can create outputs at the root even though everything inside the clone is read-only. - test_clonefile_input_mutation_fails: writes to existing input files fail with PermissionDenied — encodes the hermeticity contract. * fs_util: remove dead set_readwrite_recursive after post-#2347 cleanup After #2347 (DirectoryCache cleanup uses set_dir_writable_recursive) and this PR (post-clonefile path uses chmod_dir_writable), the generic set_readwrite_recursive helper has zero remaining callers. Both replacements are intentionally narrower - set_dir_writable_recursive chmods only directories so file inodes aren't mutated, and chmod_dir_writable chmods only the destination root so the clone tree stays read-only inside. Delete the old helper and its private companion set_readwrite_one_path. Addresses palfrey's review feedback on #2347 ("set_readwrite_recursive and set_readwrite_one_path can be dropped as part of this") - sequenced on this PR instead because both PRs had to land before the helpers became dead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * worker: make the APFS clonefile directory-cache path work end-to-end PR #2338's clonefile fast path is effectively dead in production — it materialized 1 of 1771 actions in a real build. Three coupled bugs kept the directory cache silently falling back to the slow download path, and two of them only surface once the fast path actually fires, so they must land together. Bug 1: prepare_action_inputs received a work directory the caller had already created, but hardlink_directory_tree and clonefile(2) both require the destination to NOT exist. Every cache attempt failed its precondition and fell back to download_to_directory. Fix: remove the empty pre-created directory before invoking the cache, and recreate it on cache failure so the download fallback (which needs an existing destination) still works. Adds fs::remove_dir for the empty-dir removal. Bug 2: set_readonly_recursive chmod'd files to 0o444, stripping the execute bit from cached executables. Once a tree is cloned into a workspace this makes an action's interpreter/wrapper script fail with EACCES. Fix: mark files 0o555 instead of 0o444 — read + execute, still no write bit, so the hermeticity contract is unchanged. Bug 3: the clonefile path chmods only the destination root writable; cloned subdirectories keep the source's 0o555 mode. Bazel actions declare outputs at paths nested inside input subdirectories, and creating those files needs write permission on the parent directory. Fix: after a cache hit, set_dir_writable_recursive makes every directory in the materialized tree writable. Files stay read-only — they may be CAS-hardlinked and chmoding them would corrupt the shared inode. Adds regression tests for nested output creation, which the existing root-only clonefile tests did not cover. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Marcus Eagan <marcuseagan@gmail.com>
erneestoc
added a commit
to erneestoc/nativelink
that referenced
this pull request
May 21, 2026
Previously download_to_directory pushed every file's fetch+hardlink as
one fused future into a single FuturesUnordered. Each future ran the
fetch (populate_fast_store) and then the hardlink sequentially within
the same task. With per-future store-side serialization the fetch leg
becomes the bottleneck for SwiftCompile-style workloads (1978+ input
files at p95), where blob-fetch wall time dominates total action time.
This commit restructures download_to_directory into a two-phase
pipeline driven by try_join3:
1. Fetcher launches ALL missing-blob populate_fast_store calls up
front, bounded at DOWNLOAD_TO_DIRECTORY_FETCH_CONCURRENCY = 128
via stream::try_for_each_concurrent. Digests are deduplicated so a
blob referenced by N files is only fetched once. As each blob
arrives, every file index that needs it is sent through an mpsc
channel.
2. Hardlinker drains the channel and performs hardlink + chmod +
set_mtime, bounded at DOWNLOAD_TO_DIRECTORY_HARDLINK_CONCURRENCY
= 64 (the same cap that PR TraceMachina#2338 added to compose with APFS
metadata-lock behavior on macOS). Hardlinking starts as soon as
the first blob arrives - it does not wait for the fetcher to
finish.
3. Subdirectory recursion and symlink creation run in a separate
FuturesUnordered alongside the file pipeline.
Net win on a Mac-Mini RBE worker: fetch wall time becomes
max(per-blob latency) bounded by the 128-cap, not sum(per-blob latency)
serialized inside one future.
Concurrency caps compose: fetch (128) is network/store-bound where extra
in-flight requests are cheap; hardlink (64) is metadata-bound where
extra in-flight syscalls fight APFS's per-volume lock on macOS. The
two phases run independently with their own bound, so on cache-hot
runs only the hardlink leg is active and the fetch cap is irrelevant.
Tests:
- download_to_directory_fetches_blobs_concurrently: a per-blob 75 ms
sleep on the slow store. Serial fetch of 20 blobs would be 1500 ms;
concurrent fetch finishes well under 500 ms, and max_in_flight is
observed to reach >=4 (would be 1 if serial).
- download_to_directory_streams_hardlinks_while_fetching: 30 blobs
with 50 ms per-blob latency. Serial fetch+hardlink would be 1500 ms;
pipelined version finishes under 750 ms. Catches regression to a
fetch-all-then-hardlink-all two-phase shape that drops overlap.
Both tests use a minimal DelayedStore harness (StoreDriver wrapping
MemoryStore) added inline. This pulls in async-trait as a dev-dependency
(version 0.1.88, matching the rest of the workspace).
Scope notes (vs PR TraceMachina#2243 a8db953):
- The upstream commit depends on infrastructure introduced earlier in
the PR that is NOT being ported here: batch_read_small_blobs,
populate_fast_store_unchecked, get_file_entries_batch,
BATCH_READ_MAX_BLOB_SIZE, BatchReadBlobs / ByteStream partitioning.
These hooks are entangled with the peer-to-peer infrastructure and
proto changes we explicitly excluded. The upstream commit was
reduced to its core idea: launch all fetches upfront, hardlink as
blobs arrive.
- Upstream uses a Notify + shared HashSet<DigestInfo> "fetched_set"
to coordinate fetcher and producer. Here we use a simpler mpsc
channel: the fetcher sends file indices directly as their blob
becomes available. There is no separate producer step because there
is no batch-vs-bytestream partitioning to defer to a second pass.
- Follow-on upstream commits 7b093e8 and b64944a are excluded
per task scope - they depend on peer hints and proto changes.
Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit a8db953),
adapted to the current code shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four small, atomic commits that make macOS RBE workers materialize Bazel
action input trees dramatically faster on APFS. This is a focused
re-extraction of the safest, most independently-verifiable subset of
#2243 — split out so each change can be reviewed, benched, and rolled
back on its own.
b3b0cd3ffs_util: APFSclonefile(2)fast path inhardlink_directory_tree(withset_readwrite_recursiveso cloned trees stay writable). Falls through to per-filehard_linkon Linux/Windows.ab49f162DirectoryCache:clonefile_hits/hardlink_hitsAtomicU64 telemetry so operators can confirm the fast path is firing in production.1ddce0fcrunning_actions_manager::download_to_directory: cap concurrenthard_linkcalls at 64 viastream::buffer_unordered(was unboundedFuturesUnordered).8051ca9eDirectoryCache: handle the case where the CAS has no entry for the zero-byte digest — short-circuit and create an empty file rather than panic.Motivation
Bazel exec-log analysis on our macOS RBE worker fleet shows that input
materialization dominates wall time:
File-fetching wall time exceeds all process wall time combined — the
worker spends more time materializing input trees than running actions.
The dominant action shape is SwiftCompile (635 input files / 183 MB
mean, 1,978 files / 466 MB at p95), which is exactly the workload APFS
`clonefile(2)` was designed for.
#2243 is large and bundles several unrelated changes. This PR isolates
the four pieces that (a) directly attack the file-fetching bottleneck
on macOS and (b) can be A/B tested independently via the telemetry in
`ab49f162`.
Benchmark proof
See erneestoc#1 — adds two criterion
benches under `nativelink-util/benches/` that reproduce the wins on a
real APFS volume.
Headline numbers on Apple M4 Max / APFS:
all five SwiftCompile-shaped trees. p95 shape (1,978 files / 466 MB)
drops from 590 ms → 150 ms per action.
on a single-process microbench (1.01× vs unbounded at 1,978 files).
Confirms the cap is not a regression; any multi-action contention
wins will show up in production telemetry.
Caveat: 4× not 10×
#2243 reported ≥10× wins on this path. Our benches show a stable ~4×,
bounded by the O(N) `set_readwrite_recursive` chmod walk that runs
after the O(1) `clonefile(2)`. Diagnosis and follow-up (single
top-level chmod + lazy per-file) documented in the bench PR. 4× on
the dominant shape already moves the needle hard — proposing this
lands now, with the walk optimization as a separate follow-up.
Telemetry & rollback
`ab49f162` adds `DirectoryCache::stats()` with `clonefile_hits` and
`hardlink_hits` counters so operators can confirm in production:
`hardlink_hits` near zero
macOS-gated at compile time)
The four commits revert independently — each is its own atomic change
with its own test coverage.
Tests
10 `fs_util` tests cover the clonefile path (COW isolation, symlink
handling under `CLONE_NOFOLLOW`, error paths, permission transitions
from `0o555` → `0o755`). 2 `DirectoryCache` tests cover the telemetry
counters and the zero-byte short-circuit.
This change is