perf(workspace): skip checking out oversized submodules during prep#60
Merged
Conversation
de063c5 to
85ecda5
Compare
A fresh worker checkout of denoland/deno spends ~7 minutes in prep, and the dominant cost is materializing the WPT submodule (tests/wpt/suite): ~160k files, roughly 3x the rest of the repo's working tree. Even with the --reference object cache, git still has to write every one of those files — that's what a worker sees as 'queued for ~7 min' before its session even flips to running. Skip it generically: after warming each submodule's mirror, count its pinned tree's files from the cache (no working tree written to decide) and leave any submodule over maxEagerSubmoduleFiles (50k) uninitialized — checking out only the small ones via an explicit pathspec. No repo- or path-specific knowledge baked in; it's a file-count heuristic (the superproject itself is ~50k files, so anything bigger is bulk test data). Unmeasurable submodules are kept, so the rule never drops one it merely failed to size. Workers/reviewers/follow-ups get a one-line note in their preamble that an oversized submodule may not be checked out and to run 'git submodule update --init <path>' if their task needs it. Cuts deno worker prep from ~7 min to ~1-2 min and shrinks each checkout by ~1GB / 160k inodes.
85ecda5 to
2504bb6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The problem (measured)
A fresh deno worker checkout sits in prep ~7 min before its session even flips to
running(it shows asqueuedthe whole time). I measured a live deno checkout on Vultr:tests/wpt/suiteThe WPT submodule is 76% of the working-tree file count. The cost isn't download (objects come from the
--referencecache) — it's git materializing 159k files into every fresh worker's working tree. That's the bulk of the 7 minutes, and it repeats on every new worker. It also bloats each checkout by ~1 GB / 160k inodes (feeds the disk pressure too).The fix — generic, no hardcoded paths
After warming each submodule's mirror, count its pinned tree's files from the cache (no working tree written to decide), and leave any submodule over
maxEagerSubmoduleFiles(50k) uninitialized — checking out only the small ones via an explicit pathspec. The superproject itself is ~50k files, so anything bigger is bulk test data; WPT (160k) trips it, deno's small submodules (std, node_compat, lzld) don't.No repo- or path-specific knowledge is baked in — it's a pure file-count heuristic. (You'd flagged hardcoding
tests/wpt/suiteas ugly; this avoids naming it at all.)Safety:
git submodule update --init <path>itself (the objects are already in the warmed cache). Build / lint / unit tests don't need WPT.git statusignores it andgit add -Awon't sweep it.maxEagerSubmoduleFiles <= 0disables the skip (old behavior).Expected: deno worker prep drops from ~7 min to ~1–2 min.
Test
TestPrepareIsolated_SkipsOversizedSubmodule: with the ceiling lowered, a 3-file submodule is left uninitialized while a 1-file submodule alongside it is checked out normally. Existing submodule tests unchanged (default ceiling keeps them).Follow-up (not in this PR)
Worth a one-line hint in the worker preamble — 'large submodules like tests/wpt/suite aren't checked out by default; run
git submodule update --init <path>if your task needs one' — so a WPT-test worker isn't surprised. Happy to add if you want.