Skip to content

directory_cache: fix CAS inode corruption from chmod-during-eviction#2347

Merged
MarcusSorealheis merged 1 commit into
TraceMachina:mainfrom
erneestoc:ec/pr2243-fix-cas-inode-corruption
May 19, 2026
Merged

directory_cache: fix CAS inode corruption from chmod-during-eviction#2347
MarcusSorealheis merged 1 commit into
TraceMachina:mainfrom
erneestoc:ec/pr2243-fix-cas-inode-corruption

Conversation

@erneestoc
Copy link
Copy Markdown
Contributor

@erneestoc erneestoc commented May 18, 2026

Description

DirectoryCache::evict_lru calls set_readwrite_recursive to make the cache entry writable before remove_dir_all. That helper chmods both directories AND files. The files in the cache entry are hardlinked from the CAS — so chmod follows the hardlink and mutates the CAS file's mode mid-flight.

End result: a cached executable like cc_wrapper.sh with CAS mode 0o555 gets silently clobbered to 0o644 when an unrelated DirectoryCache eviction runs. The next action that materializes that CAS file hits EACCES on exec. The symptom looks like a totally unrelated bug because the eviction and the failing action are in different code paths.

Fix: introduce set_dir_writable_recursive in nativelink-util/src/fs_util.rs that chmods only directories (which is all you need to enable unlink on the files inside), and swap evict_lru to use it.

Why this matters

This is silent data corruption that may be happening in any deployment running the DirectoryCache today. The symptom (EACCES from a cached file) presents as "my toolchain wrapper script suddenly stopped working" — operators correlate it with whatever they last touched, not with the eviction path of an unrelated action.

Relationship to in-flight PRs

The two existing callers of set_readwrite_recursive are being replaced by safer variants across two PRs:

After both land, set_readwrite_recursive and its private helper set_readwrite_one_path will have zero callers and can be removed in a small follow-up PR.

Provenance

Equivalent to upstream commit a47774d544 from #2243. Adapted for our codebase: the original commit was ~32 LOC removing a shell-exec chmod -R u+rw; our codebase uses an in-Rust helper, so the port had to be reshaped to chmod only directories (preserving remove_dir_all's ability to unlink files inside) rather than skipping chmod entirely.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Added test_eviction_cleanup_preserves_hardlinked_file_mode in nativelink-worker/src/directory_cache.rs. The test creates a file with mode 0o444, hardlinks it into a cache entry (simulating CAS sharing), runs the eviction path, and asserts the original file's mode is unchanged.

Empirical FAIL-at-HEAD~1 / PASS-at-HEAD proof: locally swapped the fix back to call set_readwrite_recursive and re-ran — test fails with assertion left == right failed: eviction cleanup mutated the inode mode (was 0o444, now 0o644). Restored the fix → passes.

  • cargo build -p nativelink-util -p nativelink-worker clean
  • cargo test -p nativelink-worker --lib directory_cache:: 2/2 pass (3/3 post-rebase, includes upstream's test_directory_cache_zero_byte_file)
  • cargo clippy -p nativelink-worker --lib --tests -- -D warnings clean for changed files
  • cargo fmt --check clean
  • Disallowed-methods grep against clippy.toml — no matches in diff

Checklist

  • Updated documentation if needed (in-code doc comments updated; no public API change)
  • Tests added/amended
  • bazel test //... passes locally (verified via cargo only)
  • PR is contained in a single commit

This change is Reviewable

Copy link
Copy Markdown
Member

@palfrey palfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, nasty one, good catch! One minor item, otherwise good.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_readwrite_recursive and set_readwrite_one_path can be dropped as part of this, as the only user of them is removed in this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the rebase onto current main (which now includes #2338), set_readwrite_recursive has a new caller at fs_util.rs:104 (the post-clonefile chmod walk). #2349 will replace that caller with a different helper, and then both helpers genuinely become dead code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can resolve here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up in here #2349

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator

🔥🔥🔥

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator

MarcusSorealheis commented May 19, 2026

@erneestoc the generated descriptions from Claude did not specify whether or not this is a breaking change which is important for our Changelog generation. I believe this is only a bugfix but it would be good to include the PR template and wrap all of your Claude info in the Description

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator

Nevertheless, fantastic PR and than you.

Port of TraceMachina/nativelink PR TraceMachina#2243 commit a47774d.

Root cause: when DirectoryCache evicts an entry, the cleanup path calls
`set_readwrite_recursive` on the cached tree before `remove_dir_all`. That
helper chmods every entry — including files — to 0o755/0o644. Files in
a cached entry are hardlinked into in-flight action workspaces (via
`hardlink_directory_tree` in `get_or_create`) and ultimately share an
inode with the underlying `FilesystemStore` CAS blob (via `fs::hard_link`
in `download_to_directory`). Chmoding the cached-side file therefore
silently mutates the shared inode's mode for every other in-flight
action holding a hardlink to the same blob.

Production symptom: EACCES on exec for `cc_wrapper.sh`. The CAS mode of
0o555 (r-xr-xr-x) gets clobbered to 0o644 (rw-r--r--), dropping the +x
bit while an unrelated action is mid-exec.

Fix: introduce `set_dir_writable_recursive` which only chmods
directories, never files. On unix, write permission on the parent
directory is sufficient to unlink files inside; the files' own modes
are irrelevant for unlinking. Switch the eviction cleanup path in
`DirectoryCache::evict_lru` to the new helper.

Empirically verified: a regression test that hardlinks a 0o555 file
into a cached tree and runs the cleanup helper FAILS on pre-fix code
(file mode mutated to 0o644) and PASSES on post-fix code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MarcusSorealheis MarcusSorealheis merged commit a00bf8c into TraceMachina:main May 19, 2026
42 checks passed
@erneestoc erneestoc deleted the ec/pr2243-fix-cas-inode-corruption branch May 19, 2026 19:01
erneestoc added a commit to erneestoc/nativelink that referenced this pull request May 19, 2026
…2347 cleanup

After TraceMachina#2347 (DirectoryCache cleanup uses set_dir_writable_recursive)
and this PR (post-clonefile path uses chmod_dir_writable), the
generic set_readwrite_recursive helper has zero remaining callers.
Both replacements are intentionally narrower - set_dir_writable_recursive
chmods only directories so file inodes aren't mutated, and
chmod_dir_writable chmods only the destination root so the clone
tree stays read-only inside. Delete the old helper and its private
companion set_readwrite_one_path.

Addresses palfrey's review feedback on TraceMachina#2347 ("set_readwrite_recursive
and set_readwrite_one_path can be dropped as part of this") - sequenced
on this PR instead because both PRs had to land before the helpers
became dead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarcusSorealheis added a commit that referenced this pull request May 21, 2026
…rialization speedup) (#2349)

* fs_util: skip set_readwrite_recursive walk after clonefile

The macOS clonefile fast path was followed by a recursive chmod walk that
made every file in the cloned tree writable (0o644 / 0o755). On real
Bazel input shapes (~2000-file SwiftCompile) that walk accounted for
~46% of materialization time — ~33 µs per file, ~67 ms per action.

Replace the walk with a single chmod(2) on the destination root.
Existing entries inherit the source's read-only mode (0o555 dirs,
0o444 files). The worker can still create the action's declared output
files inside the root because the root itself is 0o755.

This matches the hermeticity contract enforced by Bazel's local
sandbox (linux-sandbox bind-mounts inputs read-only;
darwin-sandbox / sandbox-exec denies writes outside declared output
paths) and the REAPI Action.output_files / output_directories
semantics: actions write only to declared outputs, never mutate
inputs. An action that does try to mutate an input now hits EACCES,
which is the correct REAPI behavior — same failure mode as on
Bazel's own sandbox.

Bench (nativelink-util/benches/chmod_strategy.rs on the bench branch),
toplevel_only vs full walk:

  shape                         walk      toplevel_only   speedup
  small_flat   (64 files)       4.66 ms   2.61 ms         1.79x
  pcm_cluster  (219 files)     15.17 ms   8.19 ms         1.85x
  medium_flat  (635 files)     46.36 ms  25.10 ms         1.85x
  large_flat   (1978 files)   147.39 ms  80.17 ms         1.84x

set_readwrite_recursive stays public — directory_cache.rs:451 still
uses it on the source side during eviction.

Tests:
- test_clonefile_root_writable_inputs_readonly: root 0o755, subdirs
  0o555, files 0o444 (replaces the old test_clonefile_dest_is_writable
  which assumed subdirs would be made writable).
- test_clonefile_root_accepts_new_files: worker can create outputs at
  the root even though everything inside the clone is read-only.
- test_clonefile_input_mutation_fails: writes to existing input files
  fail with PermissionDenied — encodes the hermeticity contract.

* fs_util: remove dead set_readwrite_recursive after post-#2347 cleanup

After #2347 (DirectoryCache cleanup uses set_dir_writable_recursive)
and this PR (post-clonefile path uses chmod_dir_writable), the
generic set_readwrite_recursive helper has zero remaining callers.
Both replacements are intentionally narrower - set_dir_writable_recursive
chmods only directories so file inodes aren't mutated, and
chmod_dir_writable chmods only the destination root so the clone
tree stays read-only inside. Delete the old helper and its private
companion set_readwrite_one_path.

Addresses palfrey's review feedback on #2347 ("set_readwrite_recursive
and set_readwrite_one_path can be dropped as part of this") - sequenced
on this PR instead because both PRs had to land before the helpers
became dead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* worker: make the APFS clonefile directory-cache path work end-to-end

PR #2338's clonefile fast path is effectively dead in production — it
materialized 1 of 1771 actions in a real build. Three coupled bugs kept
the directory cache silently falling back to the slow download path, and
two of them only surface once the fast path actually fires, so they must
land together.

Bug 1: prepare_action_inputs received a work directory the caller had
already created, but hardlink_directory_tree and clonefile(2) both
require the destination to NOT exist. Every cache attempt failed its
precondition and fell back to download_to_directory. Fix: remove the
empty pre-created directory before invoking the cache, and recreate it
on cache failure so the download fallback (which needs an existing
destination) still works. Adds fs::remove_dir for the empty-dir removal.

Bug 2: set_readonly_recursive chmod'd files to 0o444, stripping the
execute bit from cached executables. Once a tree is cloned into a
workspace this makes an action's interpreter/wrapper script fail with
EACCES. Fix: mark files 0o555 instead of 0o444 — read + execute, still
no write bit, so the hermeticity contract is unchanged.

Bug 3: the clonefile path chmods only the destination root writable;
cloned subdirectories keep the source's 0o555 mode. Bazel actions
declare outputs at paths nested inside input subdirectories, and
creating those files needs write permission on the parent directory.
Fix: after a cache hit, set_dir_writable_recursive makes every directory
in the materialized tree writable. Files stay read-only — they may be
CAS-hardlinked and chmoding them would corrupt the shared inode.

Adds regression tests for nested output creation, which the existing
root-only clonefile tests did not cover.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Marcus Eagan <marcuseagan@gmail.com>
erneestoc added a commit to erneestoc/nativelink that referenced this pull request May 22, 2026
`DirectoryCache` locks each cache entry down with `set_readonly_recursive`
after construction. Previously that helper made the entire entry tree mode
0o555 — directories included — so every materialization had to follow up
with a separate `set_dir_writable_recursive` recursive chmod walk in
`prepare_action_inputs` to re-add write permission to directories (Bazel
actions declare outputs at paths nested inside input subdirectories).

That post-walk is redundant work. Directories are not hardlink-shared
between cache entries — only file content inodes are — so directory mode
can safely be made writable once, at the cache entry, instead of on every
materialization.

`set_readonly_recursive` now locks a tree down as a cache entry by making
only FILES read-only (0o555) and leaving DIRECTORIES writable (0o755).
Both materialization paths then produce a directly-usable tree:

- macOS `clonefile(2)` copies the source's modes verbatim, so the clone's
  directories are writable and its files read-only.
- The Linux per-file hardlink walk creates fresh directories (writable)
  and hardlinks files (which keep the source inode's read-only mode).

Files stay read-only on both paths, so the hermeticity contract and the
CAS-hardlink shared-inode invariant (PR TraceMachina#2347) are preserved. With the
materialized tree already correct, the `set_dir_writable_recursive` call
is removed from `prepare_action_inputs`. `set_dir_writable_recursive`
itself is unchanged and still used by the cache eviction cleanup path.

Tests:
- fs_util: `test_set_readonly_recursive` now also asserts directories stay
  writable; the macOS clonefile tests assert cloned subdirs are writable
  and that a nested output can be created with no `set_dir_writable_recursive`
  walk; `test_set_dir_writable_recursive_walks_nested_dirs` keeps covering
  the eviction-cleanup helper.
- directory_cache: new `test_materialized_tree_dirs_writable_files_readonly`
  builds a nested tree and asserts that, after `get_or_create` on both the
  fresh-materialize and cache-hit paths, every directory is writable and
  every file is read-only, with no separate chmod walk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MarcusSorealheis added a commit that referenced this pull request May 23, 2026
The recursive permission walk `set_perms_recursive_impl` (driving both
`set_readonly_recursive` and `set_dir_writable_recursive`) used
`fs::metadata` (stat), which follows symlinks. On input trees containing
symlinks - e.g. `.venv/bin/python3` produced by rules_python /
rules_apple venv tooling - this had two failure modes:

  * A symlink to a directory reported `is_dir() == true`, so the walk
    recursed *through* the link, escaping the materialized tree or
    descending into an unrelated directory.
  * A symlink was passed to `set_permissions`; `chmod` follows symlinks,
    so it mutated the link's target. When the target did not exist (a
    dangling link - common when a venv points outside the action's
    input set) the `chmod` returned ENOENT and failed the entire walk.

That ENOENT failure surfaced as `set_readonly_recursive` erroring inside
`DirectoryCache::get_or_create`, which made `prepare_action_inputs` log
"Directory cache failed, falling back to traditional download" and take
the slow `download_to_directory` path.

Fix: `set_perms_recursive_impl` now uses `symlink_metadata` (lstat) and
returns early on symlink entries - it never chmods a symlink and never
recurses through one. Regular files keep their existing read-only
(0o555) treatment, so the CAS-hardlinked-inode hermeticity contract
(PR #2347) is unchanged.

`hardlink_directory_tree_recursive` already recreated symlinks as
symlinks; its symlink branch is reordered ahead of the `is_dir()` /
`is_file()` branches to make the symlink-first intent explicit and
robust.

Adds regression tests covering set-readonly, set-dir-writable, and
hardlink/clone walks over a tree containing a symlink to an in-tree
file, a dangling relative symlink, and a symlink to an in-tree
directory, asserting each walk succeeds and the symlinks are preserved
with their targets intact.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Marcus Eagan <marcuseagan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants