Windows: per-invocation overhead ~6-12s on warm builds (zccache --private-daemon lifecycle)

## Summary

On Windows, `soldr cargo build` adds **6-12 seconds of wall-clock overhead per invocation** to a warm incremental build that cargo itself reports finishing in ~1.8s. The variance is high (7.6s — 19s across 5 consecutive identical runs) and correlates with `zccache --private-daemon` lifecycle behavior, suggesting the per-session private daemon is being started/stopped/lost between runs instead of staying warm across invocations.

This makes hybrid Python+Rust workflows (`uv sync` → setup.py → `soldr cargo build`) pay a 6-12s wrapper tax on every rebuild — separate from anything cargo, rustc, or the linker are doing.

## Environment

- **OS**: Windows 10 Pro 10.0.19045
- **soldr**: 0.7.55
- **zccache**: 1.12.7 (pinned, via `C:\Users\<user>\.soldr\bin\zccache-pinned`)
- **rustc**: 1.94.1-x86_64-pc-windows-msvc (rustup-managed)
- **cargo target dir**: `~/.fbuild/cargo-target/wheel-build` (separate from project `target/`, kept hot)
- **Workspace**: 437 crates total (15 first-party + transitive deps), `fbuild-cli` bin target. Repo: `FastLED/fbuild`.

## Reproduction

The fbuild workspace is convenient because the workspace is large enough that the overhead is unmistakable. Smaller workspaces likely have proportionally smaller absolute numbers but the same relative pattern.

```bash
git clone https://github.com/FastLED/fbuild
cd fbuild
soldr cargo build -p fbuild-cli   # cold; populates cache

# Now measure 5 back-to-back warm rebuilds. We touch a leaf-ish crate
# to force cargo to invalidate ~8 first-party crates downstream, then
# rebuild. Touching with the same content means zccache should hit
# every rustc invocation — so we're measuring pure overhead.

export CARGO_TARGET_DIR=$HOME/.fbuild/cargo-target/wheel-build
RS_FILE=crates/fbuild-core/src/lib.rs

for i in 1 2 3 4 5; do
    echo "" >> "$RS_FILE"
    time soldr cargo build -p fbuild-cli >/dev/null 2>&1
    git checkout -- "$RS_FILE"
done
```

## Observed timings

5 consecutive runs, same touch-but-no-content-change pattern:

| Run | Wall-clock |
|---:|---:|
| 1 | 19.2s |
| 2 | 10.8s |
| 3 | 7.6s |
| 4 | 17.0s |
| 5 | 11.0s |

Mean ~13s, min 7.6s, variance ~12s. Cargo's own "Finished" report in the same runs: **1.6 — 1.8s**. So **6-17s of overhead per invocation** is the soldr/zccache layer.

## Where the overhead is

Per `cargo build -v` and the soldr session-start banner:

```
soldr: zccache source: pinned (...\.soldr\bin\zccache-pinned) version=1.12.7
soldr: zccache session-start --stats
  --log ...\soldr-dev-5c2d0a9ed30121ce\logs\last-session.log
  --journal ...\soldr-dev-5c2d0a9ed30121ce\logs\last-session.jsonl
  --private-daemon
  --daemon-name soldr-dev-5c2d0a9ed30121ce
  --cache-dir ...\soldr-dev-5c2d0a9ed30121ce
  --owner-pid 49292
  --private-env ZCCACHE_PATH_REMAP=auto
  --private-env ZCCACHE_WORKTREE_ROOT=...
```

Key observations:

1. **`--private-daemon` is per-session.** The `--daemon-name` includes a hash and the `--owner-pid` is the current process. Each `soldr cargo` invocation appears to spin up its own daemon and tear it down.
2. **One run failed at session-start** with `zccache[err][R]: lost connection to daemon (no response). The daemon may have crashed or been killed mid-request`. So the daemon lifecycle is not even reliably surviving a single invocation under load.
3. **Disabling zccache entirely** (`RUSTC_WRAPPER='' soldr cargo build`) measured 7.6s — so soldr-wrapping cargo itself still has overhead even with zccache out of the picture, but most of the variance disappears when zccache is bypassed.

The 100% hit-rate session summary at the end:

```
soldr: zccache session summary
  compilations: 9; hits: 9; misses: 0; non-cacheable: 0; errors: 0
  hit rate: 100.0%
  time saved: 348 ms; duration: 2074 ms
```

zccache reports it `saved 348ms` while the wrapper adds 6-17s. The cache is working — the wrapper is the cost.

## Why this matters

In a hybrid Python+Rust project (the immediate case: FastLED/fbuild, where `setup.py` invokes `soldr cargo build` to produce a CLI binary shipped via pip/uv), this overhead lands on every `uv sync`/`pip install` that uv decides is stale. Reproduced full breakdown:

| Component | Time |
|---|---:|
| Cargo's own `Finished` report | 1.8s |
| **soldr + zccache wrapper overhead (with RUSTC_WRAPPER='')** | **7.6s** |
| uv sync + reinstall dance (mtime-skip fires, zero cargo work) | 1.5s |
| **Total warm rebuild via `uv run`** | **~9-15s** |

That ~7.6s is the chunk this issue is about. With it gone, warm rebuilds would land at ~3-4s, which is the level users would expect from a workspace where cargo itself is 1.8s.

## Suggested directions (not prescriptive)

1. **Persistent / shared zccache daemon across invocations.** A daemon mode that's owned by something other than `--owner-pid` so successive `soldr cargo` calls reuse it instead of starting fresh. This is the highest-leverage change — it likely eliminates the 1-3s daemon-startup tax and the variance.
2. **Investigate the `lost connection to daemon` path.** Even one occurrence in 5 runs is suspicious; under heavier load (CI runners, parallel builds) this would be a steady-state failure mode.
3. **Document an opt-out** for users who explicitly want minimum wrapper overhead in tight inner loops, e.g. `SOLDR_BYPASS_ZCCACHE=1` (functionally equivalent to `RUSTC_WRAPPER=`, but discoverable). Useful for hybrid Python+Rust projects where the wrapper cost lands inside `setup.py` and the user is already paying for incremental rebuilds.
4. **A `--quiet` or `--no-session-summary` mode** so users / build backends can suppress the multi-line session banner + summary — measured ~3s of pure terminal-IO cost when output goes through a terminal vs `>/dev/null` (7s vs 10s on the same build).

## Related downstream context

This was found while optimizing `uv run` rebuild times for FastLED/fbuild. The fbuild-side changes (PR FastLED/fbuild#743 and #744) cut the no-source-change reinstall from 14.9s → 1.1s (mtime-skip) and the real-source-change rebuild from 100s → 19s (dev profile + rust-lld). The remaining ~9-15s wall-clock for touch-only rebuilds is the wrapper tax described here.

Happy to gather more data — `cargo build -v` output, soldr session logs, `--timings` HTML reports — on request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows: per-invocation overhead ~6-12s on warm builds (zccache --private-daemon lifecycle) #883

Summary

Environment

Reproduction

Observed timings

Where the overhead is

Why this matters

Suggested directions (not prescriptive)

Related downstream context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Component	Time
Cargo's own `Finished` report	1.8s
soldr + zccache wrapper overhead (with RUSTC_WRAPPER='')	7.6s
uv sync + reinstall dance (mtime-skip fires, zero cargo work)	1.5s
Total warm rebuild via `uv run`	~9-15s

Windows: per-invocation overhead ~6-12s on warm builds (zccache --private-daemon lifecycle) #883

Description

Summary

Environment

Reproduction

Observed timings

Where the overhead is

Why this matters

Suggested directions (not prescriptive)

Related downstream context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions