Summary
On Windows, soldr cargo build adds 6-12 seconds of wall-clock overhead per invocation to a warm incremental build that cargo itself reports finishing in ~1.8s. The variance is high (7.6s — 19s across 5 consecutive identical runs) and correlates with zccache --private-daemon lifecycle behavior, suggesting the per-session private daemon is being started/stopped/lost between runs instead of staying warm across invocations.
This makes hybrid Python+Rust workflows (uv sync → setup.py → soldr cargo build) pay a 6-12s wrapper tax on every rebuild — separate from anything cargo, rustc, or the linker are doing.
Environment
- OS: Windows 10 Pro 10.0.19045
- soldr: 0.7.55
- zccache: 1.12.7 (pinned, via
C:\Users\<user>\.soldr\bin\zccache-pinned)
- rustc: 1.94.1-x86_64-pc-windows-msvc (rustup-managed)
- cargo target dir:
~/.fbuild/cargo-target/wheel-build (separate from project target/, kept hot)
- Workspace: 437 crates total (15 first-party + transitive deps),
fbuild-cli bin target. Repo: FastLED/fbuild.
Reproduction
The fbuild workspace is convenient because the workspace is large enough that the overhead is unmistakable. Smaller workspaces likely have proportionally smaller absolute numbers but the same relative pattern.
git clone https://github.com/FastLED/fbuild
cd fbuild
soldr cargo build -p fbuild-cli # cold; populates cache
# Now measure 5 back-to-back warm rebuilds. We touch a leaf-ish crate
# to force cargo to invalidate ~8 first-party crates downstream, then
# rebuild. Touching with the same content means zccache should hit
# every rustc invocation — so we're measuring pure overhead.
export CARGO_TARGET_DIR=$HOME/.fbuild/cargo-target/wheel-build
RS_FILE=crates/fbuild-core/src/lib.rs
for i in 1 2 3 4 5; do
echo "" >> "$RS_FILE"
time soldr cargo build -p fbuild-cli >/dev/null 2>&1
git checkout -- "$RS_FILE"
done
Observed timings
5 consecutive runs, same touch-but-no-content-change pattern:
| Run |
Wall-clock |
| 1 |
19.2s |
| 2 |
10.8s |
| 3 |
7.6s |
| 4 |
17.0s |
| 5 |
11.0s |
Mean ~13s, min 7.6s, variance ~12s. Cargo's own "Finished" report in the same runs: 1.6 — 1.8s. So 6-17s of overhead per invocation is the soldr/zccache layer.
Where the overhead is
Per cargo build -v and the soldr session-start banner:
soldr: zccache source: pinned (...\.soldr\bin\zccache-pinned) version=1.12.7
soldr: zccache session-start --stats
--log ...\soldr-dev-5c2d0a9ed30121ce\logs\last-session.log
--journal ...\soldr-dev-5c2d0a9ed30121ce\logs\last-session.jsonl
--private-daemon
--daemon-name soldr-dev-5c2d0a9ed30121ce
--cache-dir ...\soldr-dev-5c2d0a9ed30121ce
--owner-pid 49292
--private-env ZCCACHE_PATH_REMAP=auto
--private-env ZCCACHE_WORKTREE_ROOT=...
Key observations:
--private-daemon is per-session. The --daemon-name includes a hash and the --owner-pid is the current process. Each soldr cargo invocation appears to spin up its own daemon and tear it down.
- One run failed at session-start with
zccache[err][R]: lost connection to daemon (no response). The daemon may have crashed or been killed mid-request. So the daemon lifecycle is not even reliably surviving a single invocation under load.
- Disabling zccache entirely (
RUSTC_WRAPPER='' soldr cargo build) measured 7.6s — so soldr-wrapping cargo itself still has overhead even with zccache out of the picture, but most of the variance disappears when zccache is bypassed.
The 100% hit-rate session summary at the end:
soldr: zccache session summary
compilations: 9; hits: 9; misses: 0; non-cacheable: 0; errors: 0
hit rate: 100.0%
time saved: 348 ms; duration: 2074 ms
zccache reports it saved 348ms while the wrapper adds 6-17s. The cache is working — the wrapper is the cost.
Why this matters
In a hybrid Python+Rust project (the immediate case: FastLED/fbuild, where setup.py invokes soldr cargo build to produce a CLI binary shipped via pip/uv), this overhead lands on every uv sync/pip install that uv decides is stale. Reproduced full breakdown:
| Component |
Time |
Cargo's own Finished report |
1.8s |
| soldr + zccache wrapper overhead (with RUSTC_WRAPPER='') |
7.6s |
| uv sync + reinstall dance (mtime-skip fires, zero cargo work) |
1.5s |
Total warm rebuild via uv run |
~9-15s |
That ~7.6s is the chunk this issue is about. With it gone, warm rebuilds would land at ~3-4s, which is the level users would expect from a workspace where cargo itself is 1.8s.
Suggested directions (not prescriptive)
- Persistent / shared zccache daemon across invocations. A daemon mode that's owned by something other than
--owner-pid so successive soldr cargo calls reuse it instead of starting fresh. This is the highest-leverage change — it likely eliminates the 1-3s daemon-startup tax and the variance.
- Investigate the
lost connection to daemon path. Even one occurrence in 5 runs is suspicious; under heavier load (CI runners, parallel builds) this would be a steady-state failure mode.
- Document an opt-out for users who explicitly want minimum wrapper overhead in tight inner loops, e.g.
SOLDR_BYPASS_ZCCACHE=1 (functionally equivalent to RUSTC_WRAPPER=, but discoverable). Useful for hybrid Python+Rust projects where the wrapper cost lands inside setup.py and the user is already paying for incremental rebuilds.
- A
--quiet or --no-session-summary mode so users / build backends can suppress the multi-line session banner + summary — measured ~3s of pure terminal-IO cost when output goes through a terminal vs >/dev/null (7s vs 10s on the same build).
Related downstream context
This was found while optimizing uv run rebuild times for FastLED/fbuild. The fbuild-side changes (PR FastLED/fbuild#743 and #744) cut the no-source-change reinstall from 14.9s → 1.1s (mtime-skip) and the real-source-change rebuild from 100s → 19s (dev profile + rust-lld). The remaining ~9-15s wall-clock for touch-only rebuilds is the wrapper tax described here.
Happy to gather more data — cargo build -v output, soldr session logs, --timings HTML reports — on request.
Summary
On Windows,
soldr cargo buildadds 6-12 seconds of wall-clock overhead per invocation to a warm incremental build that cargo itself reports finishing in ~1.8s. The variance is high (7.6s — 19s across 5 consecutive identical runs) and correlates withzccache --private-daemonlifecycle behavior, suggesting the per-session private daemon is being started/stopped/lost between runs instead of staying warm across invocations.This makes hybrid Python+Rust workflows (
uv sync→ setup.py →soldr cargo build) pay a 6-12s wrapper tax on every rebuild — separate from anything cargo, rustc, or the linker are doing.Environment
C:\Users\<user>\.soldr\bin\zccache-pinned)~/.fbuild/cargo-target/wheel-build(separate from projecttarget/, kept hot)fbuild-clibin target. Repo:FastLED/fbuild.Reproduction
The fbuild workspace is convenient because the workspace is large enough that the overhead is unmistakable. Smaller workspaces likely have proportionally smaller absolute numbers but the same relative pattern.
Observed timings
5 consecutive runs, same touch-but-no-content-change pattern:
Mean ~13s, min 7.6s, variance ~12s. Cargo's own "Finished" report in the same runs: 1.6 — 1.8s. So 6-17s of overhead per invocation is the soldr/zccache layer.
Where the overhead is
Per
cargo build -vand the soldr session-start banner:Key observations:
--private-daemonis per-session. The--daemon-nameincludes a hash and the--owner-pidis the current process. Eachsoldr cargoinvocation appears to spin up its own daemon and tear it down.zccache[err][R]: lost connection to daemon (no response). The daemon may have crashed or been killed mid-request. So the daemon lifecycle is not even reliably surviving a single invocation under load.RUSTC_WRAPPER='' soldr cargo build) measured 7.6s — so soldr-wrapping cargo itself still has overhead even with zccache out of the picture, but most of the variance disappears when zccache is bypassed.The 100% hit-rate session summary at the end:
zccache reports it
saved 348mswhile the wrapper adds 6-17s. The cache is working — the wrapper is the cost.Why this matters
In a hybrid Python+Rust project (the immediate case: FastLED/fbuild, where
setup.pyinvokessoldr cargo buildto produce a CLI binary shipped via pip/uv), this overhead lands on everyuv sync/pip installthat uv decides is stale. Reproduced full breakdown:Finishedreportuv runThat ~7.6s is the chunk this issue is about. With it gone, warm rebuilds would land at ~3-4s, which is the level users would expect from a workspace where cargo itself is 1.8s.
Suggested directions (not prescriptive)
--owner-pidso successivesoldr cargocalls reuse it instead of starting fresh. This is the highest-leverage change — it likely eliminates the 1-3s daemon-startup tax and the variance.lost connection to daemonpath. Even one occurrence in 5 runs is suspicious; under heavier load (CI runners, parallel builds) this would be a steady-state failure mode.SOLDR_BYPASS_ZCCACHE=1(functionally equivalent toRUSTC_WRAPPER=, but discoverable). Useful for hybrid Python+Rust projects where the wrapper cost lands insidesetup.pyand the user is already paying for incremental rebuilds.--quietor--no-session-summarymode so users / build backends can suppress the multi-line session banner + summary — measured ~3s of pure terminal-IO cost when output goes through a terminal vs>/dev/null(7s vs 10s on the same build).Related downstream context
This was found while optimizing
uv runrebuild times for FastLED/fbuild. The fbuild-side changes (PR FastLED/fbuild#743 and #744) cut the no-source-change reinstall from 14.9s → 1.1s (mtime-skip) and the real-source-change rebuild from 100s → 19s (dev profile + rust-lld). The remaining ~9-15s wall-clock for touch-only rebuilds is the wrapper tax described here.Happy to gather more data —
cargo build -voutput, soldr session logs,--timingsHTML reports — on request.