Skip to content

macOS: APFS clonefile fast path + concurrency cap + zero-byte fix for Bazel input materialization#2338

Merged
palfrey merged 8 commits into
TraceMachina:mainfrom
erneestoc:ec/macos-clonefile-optimizations
May 19, 2026
Merged

macOS: APFS clonefile fast path + concurrency cap + zero-byte fix for Bazel input materialization#2338
palfrey merged 8 commits into
TraceMachina:mainfrom
erneestoc:ec/macos-clonefile-optimizations

Conversation

@erneestoc
Copy link
Copy Markdown
Contributor

@erneestoc erneestoc commented May 16, 2026

Summary

Four small, atomic commits that make macOS RBE workers materialize Bazel
action input trees dramatically faster on APFS. This is a focused
re-extraction of the safest, most independently-verifiable subset of
#2243 — split out so each change can be reviewed, benched, and rolled
back on its own.

Commit Scope
b3b0cd3f fs_util: APFS clonefile(2) fast path in hardlink_directory_tree (with set_readwrite_recursive so cloned trees stay writable). Falls through to per-file hard_link on Linux/Windows.
ab49f162 DirectoryCache: clonefile_hits / hardlink_hits AtomicU64 telemetry so operators can confirm the fast path is firing in production.
1ddce0fc running_actions_manager::download_to_directory: cap concurrent hard_link calls at 64 via stream::buffer_unordered (was unbounded FuturesUnordered).
8051ca9e DirectoryCache: handle the case where the CAS has no entry for the zero-byte digest — short-circuit and create an empty file rather than panic.

Motivation

Bazel exec-log analysis on our macOS RBE worker fleet shows that input
materialization dominates wall time:

Metric Value
Sum of action durations (process wall time) 4,947 s
Sum of file fetching durations 7,313 s
Critical path 1,074 s
Total wall time 1,213 s

File-fetching wall time exceeds all process wall time combined — the
worker spends more time materializing input trees than running actions.
The dominant action shape is SwiftCompile (635 input files / 183 MB
mean, 1,978 files / 466 MB at p95), which is exactly the workload APFS
`clonefile(2)` was designed for.

#2243 is large and bundles several unrelated changes. This PR isolates
the four pieces that (a) directly attack the file-fetching bottleneck
on macOS and (b) can be A/B tested independently via the telemetry in
`ab49f162`.

Benchmark proof

See erneestoc#1 — adds two criterion
benches under `nativelink-util/benches/` that reproduce the wins on a
real APFS volume.

Headline numbers on Apple M4 Max / APFS:

  • `hardlink_directory_tree` (b3b0cd3) — 3.6×–4.0× faster across
    all five SwiftCompile-shaped trees. p95 shape (1,978 files / 466 MB)
    drops from 590 ms → 150 ms per action.
  • `download_to_directory` 64-cap (1ddce0f) — performance-neutral
    on a single-process microbench (1.01× vs unbounded at 1,978 files).
    Confirms the cap is not a regression; any multi-action contention
    wins will show up in production telemetry.

Caveat: 4× not 10×

#2243 reported ≥10× wins on this path. Our benches show a stable ~4×,
bounded by the O(N) `set_readwrite_recursive` chmod walk that runs
after the O(1) `clonefile(2)`. Diagnosis and follow-up (single
top-level chmod + lazy per-file) documented in the bench PR. 4× on
the dominant shape already moves the needle hard — proposing this
lands now, with the walk optimization as a separate follow-up.

Telemetry & rollback

`ab49f162` adds `DirectoryCache::stats()` with `clonefile_hits` and
`hardlink_hits` counters so operators can confirm in production:

  • macOS workers should show `clonefile_hits` climbing and
    `hardlink_hits` near zero
  • Linux workers should show the inverse (clonefile path is
    macOS-gated at compile time)

The four commits revert independently — each is its own atomic change
with its own test coverage.

Tests

10 `fs_util` tests cover the clonefile path (COW isolation, symlink
handling under `CLONE_NOFOLLOW`, error paths, permission transitions
from `0o555` → `0o755`). 2 `DirectoryCache` tests cover the telemetry
counters and the zero-byte short-circuit.


This change is Reviewable

erneestoc added 4 commits May 15, 2026 23:29
On macOS, try APFS clonefile(2) before falling back to the existing
per-file hardlink walk. clonefile is O(1) in tree size and uses
copy-on-write, so subtree-cache hits no longer scale with input count.

After a successful clone the destination is made writable (0o755/0o644)
because the clone inherits the source's permissions and cached subtrees
are 0o555/0o444. The COW semantics of clonefile mean writes to the
destination do not affect the source, so this is safe.

On EXDEV (cross-volume), ENOTSUP, or any other errno, we log at debug
and fall through to hardlink_directory_tree_recursive. Linux and Windows
paths are unchanged.

Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit 13fcc0c).
hardlink_directory_tree now returns CloneMethod (Clonefile | Hardlink) so
callers can see which kernel path was taken. DirectoryCache records the
result in two atomic counters and exposes them via CacheStats. Without
this telemetry, a silent fall-through from clonefile to per-file
hardlinks (e.g. cache and workspace on different volumes, or APFS
clonefile failing for any other reason) would be invisible.

On Linux/Windows hardlink_directory_tree always returns Hardlink — no
behavioral change. The new CacheStats fields default to zero on those
platforms.

Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit 13fcc0c).
Previously download_to_directory pushed every file-hardlink, subdir
recursion, and symlink future onto an unbounded FuturesUnordered, then
drained it. On macOS this produced thousands of parallel hardlink(2)
calls fighting APFS's per-volume metadata lock — the observed exec-log
shape was ~4 ms per input file at scale, consistent with serialized
metadata mutations plus tokio scheduling overhead.

This commit gates each directory level to at most 64 in-flight futures
via stream::buffer_unordered(64). 64 is well above the inflection point
on any modern Linux filesystem, so Linux is unaffected beyond replacing
tokio scheduling overhead with simpler stream polling.

Scope notes (vs PR TraceMachina#2243 ee85fdc):
- The chunked has_with_results sub-change does not apply directly: the
  current code calls populate_fast_store per-digest, not a batched
  has_with_results.
- Level-parallel BFS mkdir is not applied here; the recursion structure
  is unchanged. The 64-cap is per recursive call, not global. Deep trees
  can therefore still have 64 * depth in-flight futures. A full flatten
  pass is a follow-up.

Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit ee85fdc),
narrowed to fit the current code shape.
FilesystemStore (and several other CAS backends) refuse to store
zero-byte blobs, so a get_part_unchunked for the zero-byte digest
(af1349b9... / e3b0c449...) returns NotFound. Bazel input trees
routinely contain empty marker/config files (.linksearchpaths, empty
.env, .toml, etc.), so without this fix a single such file in any
directory causes the entire DirectoryCache construction to fail —
roughly 30% of cache attempts per PR TraceMachina#2243.

Short-circuit create_file: if the digest is the zero-byte digest, write
b"" to disk directly and never consult the CAS.

Cross-platform correctness fix.

Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit d198902).
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 16, 2026

CLA assistant check
All committers have signed the CLA.

@palfrey
Copy link
Copy Markdown
Member

palfrey commented May 18, 2026

Caveat: 4× not 10×

This PR is a lot more readable and reviewable than that one, and I'll take 4x I can review over 10x with harder to evaluate side-effects!

Thanks for this, very happy with it. Will also be able to tick off DirectoryCache from #2034 after this merges.

Copy link
Copy Markdown
Member

@palfrey palfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor item

Comment thread nativelink-util/src/fs_util.rs Outdated
@palfrey palfrey mentioned this pull request May 18, 2026
19 tasks
The pre-step to ensure dst's parent exists is only needed because
clonefile(2) requires the parent to exist while dst itself must not.
On non-macOS targets the function falls through to per-file hardlink,
which calls fs::create_dir_all(dst_dir) further down — that already
creates dst_dir along with any missing parents — so the pre-step was
pure redundant syscalls on Linux/Windows.

Wrap the block in `#[cfg(target_os = "macos")]` to skip it on platforms
that don't need it. Behavior on macOS is unchanged; behavior on
Linux/Windows loses one redundant `create_dir_all` per call.

Per review feedback on TraceMachina#2338.
@palfrey palfrey merged commit e6def51 into TraceMachina:main May 19, 2026
42 checks passed
erneestoc added a commit to erneestoc/nativelink that referenced this pull request May 19, 2026
Previously download_to_directory pushed every file's fetch+hardlink as
one fused future into a single FuturesUnordered. Each future ran the
fetch (populate_fast_store) and then the hardlink sequentially within
the same task. With per-future store-side serialization the fetch leg
becomes the bottleneck for SwiftCompile-style workloads (1978+ input
files at p95), where blob-fetch wall time dominates total action time.

This commit restructures download_to_directory into a two-phase
pipeline driven by try_join3:

  1. Fetcher launches ALL missing-blob populate_fast_store calls up
     front, bounded at DOWNLOAD_TO_DIRECTORY_FETCH_CONCURRENCY = 128
     via stream::try_for_each_concurrent. Digests are deduplicated so a
     blob referenced by N files is only fetched once. As each blob
     arrives, every file index that needs it is sent through an mpsc
     channel.

  2. Hardlinker drains the channel and performs hardlink + chmod +
     set_mtime, bounded at DOWNLOAD_TO_DIRECTORY_HARDLINK_CONCURRENCY
     = 64 (the same cap that PR TraceMachina#2338 added to compose with APFS
     metadata-lock behavior on macOS). Hardlinking starts as soon as
     the first blob arrives - it does not wait for the fetcher to
     finish.

  3. Subdirectory recursion and symlink creation run in a separate
     FuturesUnordered alongside the file pipeline.

Net win on a Mac-Mini RBE worker: fetch wall time becomes
max(per-blob latency) bounded by the 128-cap, not sum(per-blob latency)
serialized inside one future.

Concurrency caps compose: fetch (128) is network/store-bound where extra
in-flight requests are cheap; hardlink (64) is metadata-bound where
extra in-flight syscalls fight APFS's per-volume lock on macOS. The
two phases run independently with their own bound, so on cache-hot
runs only the hardlink leg is active and the fetch cap is irrelevant.

Tests:

- download_to_directory_fetches_blobs_concurrently: a per-blob 75 ms
  sleep on the slow store. Serial fetch of 20 blobs would be 1500 ms;
  concurrent fetch finishes well under 500 ms, and max_in_flight is
  observed to reach >=4 (would be 1 if serial).

- download_to_directory_streams_hardlinks_while_fetching: 30 blobs
  with 50 ms per-blob latency. Serial fetch+hardlink would be 1500 ms;
  pipelined version finishes under 750 ms. Catches regression to a
  fetch-all-then-hardlink-all two-phase shape that drops overlap.

Both tests use a minimal DelayedStore harness (StoreDriver wrapping
MemoryStore) added inline. This pulls in async-trait as a dev-dependency
(version 0.1.88, matching the rest of the workspace).

Scope notes (vs PR TraceMachina#2243 a8db953):

- The upstream commit depends on infrastructure introduced earlier in
  the PR that is NOT being ported here: batch_read_small_blobs,
  populate_fast_store_unchecked, get_file_entries_batch,
  BATCH_READ_MAX_BLOB_SIZE, BatchReadBlobs / ByteStream partitioning.
  These hooks are entangled with the peer-to-peer infrastructure and
  proto changes we explicitly excluded. The upstream commit was
  reduced to its core idea: launch all fetches upfront, hardlink as
  blobs arrive.

- Upstream uses a Notify + shared HashSet<DigestInfo> "fetched_set"
  to coordinate fetcher and producer. Here we use a simpler mpsc
  channel: the fetcher sends file indices directly as their blob
  becomes available. There is no separate producer step because there
  is no batch-vs-bytestream partitioning to defer to a second pass.

- Follow-on upstream commits 7b093e8 and b64944a are excluded
  per task scope - they depend on peer hints and proto changes.

Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit a8db953),
adapted to the current code shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
palfrey pushed a commit that referenced this pull request May 20, 2026
…ro-digest files) (#2346)

* Resolve absolute symlinks in output upload instead of uploading raw targets

When the worker's work directory is populated via DirectoryCache, output
paths can be absolute symlinks pointing into the cache directory
(/var/.../directory_cache/...). The previous output collection code
uploaded these as raw SymlinkInfo with absolute targets that are
meaningless on the Bazel client, causing "No such file or directory"
errors when the client materialised the action result.

Detect absolute symlinks in output paths and resolve them: upload
directory contents as Tree protos and file contents as regular files.
Relative symlinks (intentionally created by the action) are still
preserved as symlinks.

Updates upload_dir_and_symlink_test to use a relative symlink (the
previous /dev/null absolute symlink is now resolved, breaking the old
assertion) and adds a new upload_absolute_symlink_resolves_contents
regression test that verifies an out-of-tree payload reachable via an
absolute symlink lands in CAS as a file.

Ported from upstream
0807153 (PR
#2243).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix zero-digest files missing from worker execution directories

FilesystemStore::get_file_entry_for_digest previously returned a
synthetic FileEntry for zero-digest blobs pointing to a content_path
that never exists on disk (FilesystemStore deliberately never persists
zero-byte files). Downstream worker output-materialisation code that
took the prefetched hardlink path would try to hard_link from this
non-existent source, fail, and either silently produce a missing file
or fall back without enough context to diagnose the failure.

This is the OUTPUT-materialisation companion to PR #2338's DirectoryCache
zero-byte fix, which lives on the input-fetch path. The two changes are
complementary: #2338 covers the directory cache short-circuit; this
covers the FilesystemStore API and the running_actions_manager fallback.

Changes:
- FilesystemStore::get_file_entry_for_digest now returns
  Code::NotFound for zero digests instead of a synthetic FileEntry,
  forcing callers to materialise empty files via fs::create_file rather
  than hard_linking from a phantom source.
- running_actions_manager::download_to_directory adds err_tip context
  on the zero-digest fs::create_file / write_all so a failure here is
  attributable to the empty-file path.
- Updates the existing get_file_entry_for_zero_digest test to assert
  the new NotFound behaviour and renames it to
  get_file_entry_for_zero_digest_returns_not_found.
- Adds download_to_directory_zero_digest_empty_file_test that
  exercises an empty file declared inside an input directory and
  verifies it materialises on disk with zero bytes. This test
  intentionally does NOT collide with PR #2338's DirectoryCache
  zero-byte test (which validates a different short-circuit path).

Ported from upstream
19d7e20 (PR
#2243).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Close test coverage gaps for ported output-upload fixes

Audit of the two ported commits (e57b675b, b1dc3a1c) flagged two
correctness-relevant gaps. Both are now closed with strict assertions so
the underlying bugs cannot silently regress.

1. upload_absolute_symlink_to_directory_uploads_tree
   The original e57b675b regression test covers absolute-symlink-to-file
   but not absolute-symlink-to-directory, which exercises an entirely
   separate code path (upload_directory + Tree proto serialisation). The
   new test creates an out-of-tree directory containing inner.txt,
   absolute-symlinks it into the work dir, runs the action, and:
   - asserts output_directory_symlinks and output_file_symlinks are
     BOTH empty (the symlink must NOT be preserved),
   - asserts output_folders contains exactly one entry at the symlink
     path, proving Tree-upload happened,
   - walks the uploaded Tree, locates inner.txt, fetches its blob from
     CAS and verifies the content. This proves the directory was
     actually traversed and uploaded — not stubbed.

2. download_to_directory_zero_digest_empty_file_test (strengthened)
   Extended the existing single-file test to cover:
   - a second sibling zero-digest file (proves the path is not a
     single-file fluke),
   - a zero-digest file nested inside a subdirectory (proves the
     recursive download_to_directory caller also handles NotFound from
     get_file_entry_for_digest — not just the top-level invocation),
   - strict per-file assertions: is_file, !is_symlink, len == 0, and
     a read-back that confirms the file is actually empty rather than
     a phantom dirent.

Verdict on the dropped checks:
- Dangling absolute symlink: source returns OutputType::None which is
  the correct graceful behaviour, but writing a test for it requires
  fabricating a broken symlink mid-action and is comparatively low
  value; left uncovered intentionally.
- File mode on zero-digest materialisation: the worker uses
  fs::create_file with no explicit mode, so the result follows umask
  and would be brittle across CI environments. Length + type
  assertions are the stronger invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* running_actions_manager_test: use /dev/null in upload_dir_and_symlink_test

Previous iterations of this PR sidestepped `/dev/null` by swapping the
absolute-symlink fixture for either a relative symlink (lost coverage)
or an out-of-tree real empty file (added scaffolding to dodge a corner
case we should just test directly).

Restore the original `ln -s /dev/null empty_sym` fixture. The post-fix
worker code resolves the absolute symlink via `fs::metadata` (follows),
sees a character device (`is_dir() == false`), and falls into the
"upload as file" branch. `upload_file` opens `/dev/null` and reads to
EOF — `/dev/null`'s contract is that reads return 0 bytes immediately
— producing the canonical sha256 empty digest (e3b0c44…) with size 0.

That's well-defined, harmless behavior and a real-world Bazel pattern
(rules sometimes use `ln -sf /dev/null x` to create empty outputs).
Test now locks in this contract directly, no scaffolding required.

Drops the `external_root` / `external_file` setup since it was only
needed to avoid `/dev/null`.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Marcus Eagan <marcuseagan@gmail.com>
MarcusSorealheis added a commit that referenced this pull request May 21, 2026
…rialization speedup) (#2349)

* fs_util: skip set_readwrite_recursive walk after clonefile

The macOS clonefile fast path was followed by a recursive chmod walk that
made every file in the cloned tree writable (0o644 / 0o755). On real
Bazel input shapes (~2000-file SwiftCompile) that walk accounted for
~46% of materialization time — ~33 µs per file, ~67 ms per action.

Replace the walk with a single chmod(2) on the destination root.
Existing entries inherit the source's read-only mode (0o555 dirs,
0o444 files). The worker can still create the action's declared output
files inside the root because the root itself is 0o755.

This matches the hermeticity contract enforced by Bazel's local
sandbox (linux-sandbox bind-mounts inputs read-only;
darwin-sandbox / sandbox-exec denies writes outside declared output
paths) and the REAPI Action.output_files / output_directories
semantics: actions write only to declared outputs, never mutate
inputs. An action that does try to mutate an input now hits EACCES,
which is the correct REAPI behavior — same failure mode as on
Bazel's own sandbox.

Bench (nativelink-util/benches/chmod_strategy.rs on the bench branch),
toplevel_only vs full walk:

  shape                         walk      toplevel_only   speedup
  small_flat   (64 files)       4.66 ms   2.61 ms         1.79x
  pcm_cluster  (219 files)     15.17 ms   8.19 ms         1.85x
  medium_flat  (635 files)     46.36 ms  25.10 ms         1.85x
  large_flat   (1978 files)   147.39 ms  80.17 ms         1.84x

set_readwrite_recursive stays public — directory_cache.rs:451 still
uses it on the source side during eviction.

Tests:
- test_clonefile_root_writable_inputs_readonly: root 0o755, subdirs
  0o555, files 0o444 (replaces the old test_clonefile_dest_is_writable
  which assumed subdirs would be made writable).
- test_clonefile_root_accepts_new_files: worker can create outputs at
  the root even though everything inside the clone is read-only.
- test_clonefile_input_mutation_fails: writes to existing input files
  fail with PermissionDenied — encodes the hermeticity contract.

* fs_util: remove dead set_readwrite_recursive after post-#2347 cleanup

After #2347 (DirectoryCache cleanup uses set_dir_writable_recursive)
and this PR (post-clonefile path uses chmod_dir_writable), the
generic set_readwrite_recursive helper has zero remaining callers.
Both replacements are intentionally narrower - set_dir_writable_recursive
chmods only directories so file inodes aren't mutated, and
chmod_dir_writable chmods only the destination root so the clone
tree stays read-only inside. Delete the old helper and its private
companion set_readwrite_one_path.

Addresses palfrey's review feedback on #2347 ("set_readwrite_recursive
and set_readwrite_one_path can be dropped as part of this") - sequenced
on this PR instead because both PRs had to land before the helpers
became dead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* worker: make the APFS clonefile directory-cache path work end-to-end

PR #2338's clonefile fast path is effectively dead in production — it
materialized 1 of 1771 actions in a real build. Three coupled bugs kept
the directory cache silently falling back to the slow download path, and
two of them only surface once the fast path actually fires, so they must
land together.

Bug 1: prepare_action_inputs received a work directory the caller had
already created, but hardlink_directory_tree and clonefile(2) both
require the destination to NOT exist. Every cache attempt failed its
precondition and fell back to download_to_directory. Fix: remove the
empty pre-created directory before invoking the cache, and recreate it
on cache failure so the download fallback (which needs an existing
destination) still works. Adds fs::remove_dir for the empty-dir removal.

Bug 2: set_readonly_recursive chmod'd files to 0o444, stripping the
execute bit from cached executables. Once a tree is cloned into a
workspace this makes an action's interpreter/wrapper script fail with
EACCES. Fix: mark files 0o555 instead of 0o444 — read + execute, still
no write bit, so the hermeticity contract is unchanged.

Bug 3: the clonefile path chmods only the destination root writable;
cloned subdirectories keep the source's 0o555 mode. Bazel actions
declare outputs at paths nested inside input subdirectories, and
creating those files needs write permission on the parent directory.
Fix: after a cache hit, set_dir_writable_recursive makes every directory
in the materialized tree writable. Files stay read-only — they may be
CAS-hardlinked and chmoding them would corrupt the shared inode.

Adds regression tests for nested output creation, which the existing
root-only clonefile tests did not cover.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Marcus Eagan <marcuseagan@gmail.com>
erneestoc added a commit to erneestoc/nativelink that referenced this pull request May 21, 2026
Previously download_to_directory pushed every file's fetch+hardlink as
one fused future into a single FuturesUnordered. Each future ran the
fetch (populate_fast_store) and then the hardlink sequentially within
the same task. With per-future store-side serialization the fetch leg
becomes the bottleneck for SwiftCompile-style workloads (1978+ input
files at p95), where blob-fetch wall time dominates total action time.

This commit restructures download_to_directory into a two-phase
pipeline driven by try_join3:

  1. Fetcher launches ALL missing-blob populate_fast_store calls up
     front, bounded at DOWNLOAD_TO_DIRECTORY_FETCH_CONCURRENCY = 128
     via stream::try_for_each_concurrent. Digests are deduplicated so a
     blob referenced by N files is only fetched once. As each blob
     arrives, every file index that needs it is sent through an mpsc
     channel.

  2. Hardlinker drains the channel and performs hardlink + chmod +
     set_mtime, bounded at DOWNLOAD_TO_DIRECTORY_HARDLINK_CONCURRENCY
     = 64 (the same cap that PR TraceMachina#2338 added to compose with APFS
     metadata-lock behavior on macOS). Hardlinking starts as soon as
     the first blob arrives - it does not wait for the fetcher to
     finish.

  3. Subdirectory recursion and symlink creation run in a separate
     FuturesUnordered alongside the file pipeline.

Net win on a Mac-Mini RBE worker: fetch wall time becomes
max(per-blob latency) bounded by the 128-cap, not sum(per-blob latency)
serialized inside one future.

Concurrency caps compose: fetch (128) is network/store-bound where extra
in-flight requests are cheap; hardlink (64) is metadata-bound where
extra in-flight syscalls fight APFS's per-volume lock on macOS. The
two phases run independently with their own bound, so on cache-hot
runs only the hardlink leg is active and the fetch cap is irrelevant.

Tests:

- download_to_directory_fetches_blobs_concurrently: a per-blob 75 ms
  sleep on the slow store. Serial fetch of 20 blobs would be 1500 ms;
  concurrent fetch finishes well under 500 ms, and max_in_flight is
  observed to reach >=4 (would be 1 if serial).

- download_to_directory_streams_hardlinks_while_fetching: 30 blobs
  with 50 ms per-blob latency. Serial fetch+hardlink would be 1500 ms;
  pipelined version finishes under 750 ms. Catches regression to a
  fetch-all-then-hardlink-all two-phase shape that drops overlap.

Both tests use a minimal DelayedStore harness (StoreDriver wrapping
MemoryStore) added inline. This pulls in async-trait as a dev-dependency
(version 0.1.88, matching the rest of the workspace).

Scope notes (vs PR TraceMachina#2243 a8db953):

- The upstream commit depends on infrastructure introduced earlier in
  the PR that is NOT being ported here: batch_read_small_blobs,
  populate_fast_store_unchecked, get_file_entries_batch,
  BATCH_READ_MAX_BLOB_SIZE, BatchReadBlobs / ByteStream partitioning.
  These hooks are entangled with the peer-to-peer infrastructure and
  proto changes we explicitly excluded. The upstream commit was
  reduced to its core idea: launch all fetches upfront, hardlink as
  blobs arrive.

- Upstream uses a Notify + shared HashSet<DigestInfo> "fetched_set"
  to coordinate fetcher and producer. Here we use a simpler mpsc
  channel: the fetcher sends file indices directly as their blob
  becomes available. There is no separate producer step because there
  is no batch-vs-bytestream partitioning to defer to a second pass.

- Follow-on upstream commits 7b093e8 and b64944a are excluded
  per task scope - they depend on peer hints and proto changes.

Extracted from TraceMachina/nativelink PR TraceMachina#2243 (commit a8db953),
adapted to the current code shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants