Skip to content

Adopt zccache embedded service (drop the IPC roundtrip + zccache subprocess) #977

Description

@zackees

Summary

Replace soldr's current shell-out-to-zccache-bin compile dispatch with an in-process zccache::embedded::ZccacheService call. The MVP Rust API landed upstream in zccache v1.12.11 (crates/zccache/src/embedded.rs) — this issue tracks the soldr-side integration (zccache#907).

Anchor design doc: docs/architecture/embedded-service.md in the zccache repo.

What soldr does today vs target

Today Target
Process model soldr shells out to zccache binary, one IPC roundtrip per compile (UDS on Linux/macOS, named pipe on Windows) soldr calls service.compile(…) in its own Tokio runtime
Runtime 2 processes, 2 runtimes 1 process, 1 runtime
Audit Cross-process stitching at the broker soldr's run_id flows directly into zccache events
Compile dispatch latency + ~50–200 µs IPC roundtrip per command (Windows named pipes are the worst case) Function call

The standalone zccache-daemon binary stays — drop-in CLI use unchanged. Embedded is an alternative front door, not a replacement.

Plan

Phase 1 — vendor for iteration

zccache::embedded may move during the integration. Don't lock to a crates.io version yet.

  • Add a git/path dep on zccache behind a embedded feature flag in soldr's main crate
  • Pin to a specific zccache commit; bump as needed during integration

Phase 2 — wrapper + dispatcher

Keep the upstream-API blast radius to one soldr-side type.

  • Add SoldrZccacheService — the only soldr module that imports zccache::embedded::*. Owns the ZccacheService handle and soldr's identity defaults
  • Add a tagged-enum dispatcher: enum CompileBackend { Embedded(Arc<SoldrZccacheService>), Wrapped(WrappedZccacheBin) }. Default Wrapped initially; flip the default after the parity test passes

Phase 3 — audit context plumbing

Most invasive non-mechanical change.

  • Map soldr's run/build/command IDs → AuditContext { run_id, trace_id, parent_span_id, command_id, session_id }
  • Pick default audit mode (recommend normal for local, verbose for CI runs that emit build reports)
  • Document the mapping in soldr's design docs

Phase 4 — lifecycle wiring

  • ZccacheService::start once at soldr daemon startup; cache root at <soldr_cache_root>/zccache/
  • Expose service.stats() through soldr's existing diagnostics endpoint (so operators see hits/misses without a separate CLI roundtrip)
  • Call service.flush() from the "writing build report" phase
  • Call service.shutdown(ShutdownMode::Graceful) from the normal-exit path; Force from the abort path

Phase 5 — parity gate

  • CI test that runs the same compile through both backends and asserts byte-identical outputs (object file + stdout + stderr + exit code + cache outcome)
  • Flip the default backend to Embedded only after this passes on Linux + macOS + Windows

Phase 6 — ship + measure

  • Cut a soldr release with embedded default-on
  • Document the perf delta (latency / IPC count / audit completeness)

Open questions to resolve in Phase 1–2

  1. RuntimeHooks is a placeholder — currently { service_name: Option<String> }. Likely needs a tokio::runtime::Handle field or an explicit "use the calling runtime" contract before soldr depends on it. May warrant an upstream zccache PR.
  2. Process-spawn priority sharing. zccache's CompilePriority::Auto ticket (PR zccache#919) consults zccache's own in-flight counter. When embedded, that counter doesn't know about soldr's own spawns. Decide whether to:
    • extend ServiceLimits with an external-counter knob upstream, or
    • document that the counter is zccache-internal-only and soldr accepts that.
  3. Cancellation token plumbing. The embedded contract specifies host cancellation participation, but ZccacheConfig does not yet accept a CancellationToken. Additive upstream change.
  4. Cache namespacing. Endpoint is derived from HostIdentity { product, instance_id, workspace_id }. soldr needs stable values across runs (for cache continuity) or explicit per-workspace opt-out.
  5. Audit emission landing late. Schema is shipped; durable hot-path emission is still TBD upstream. Decide whether to ship the embedded transport first with audit as a no-op, or wait.

Risks

  • Upstream API churn during integration → mitigated by Phase 1 (vendor first).
  • Audit emission gap → first embedded soldr release may be "fast but no causal audit". Track explicitly.
  • Shared-runtime contention → soldr's existing async work now shares a Tokio runtime with zccache's compile/persist tasks. PR zccache#919's adaptive priority and perf(release): soldr binary is 16.3 MB (large tier); strip + lto could halve it #920's expanded pool sizing were designed for the standalone daemon; the same numbers may need re-tuning when both products share one runtime. Plan a measurement pass before flipping the default.

What we do NOT change

  • Standalone zccache-daemon binary stays — drop-in CLI users unaffected.
  • Broker / global / private daemon modes stay (per the embedded doc's Non-Goals: "Do not replace the process/global daemon mode").
  • soldr update-zccache stays — the perf-cluster path deliberately builds from source and does not use the embedded service (see PERF.md).
  • fbuild's analogous integration (zccache#908) is a parallel follow-up; not entangled with this issue.

Suggested first PR shape

One PR behind --features embedded:

  1. Add the zccache dep (git or path).
  2. Add SoldrZccacheService and the CompileBackend enum (default Wrapped).
  3. Add the parity test (initially soldr-only, can be promoted to default CI when Embedded is the default).
  4. Add an opt-in env var (e.g. SOLDR_ZCCACHE_EMBEDDED=1) so early adopters can flip the backend at runtime without rebuilding.
  5. Update soldr's docs to describe the embedded mode.

The parity-gate PR (Phase 5) is what flips the default backend in a later release.

Definition of done

  • SoldrZccacheService wrapper exists in soldr
  • CompileBackend::{Embedded, Wrapped} enum in the dispatcher
  • Parity test passes on Linux + macOS + Windows
  • Embedded backend default-on in a soldr release
  • Perf delta documented (latency / IPC count / audit completeness)
  • zccache#907 closed; README + feature matrix updated to list embedded-soldr as shipping
  • Vendored hotfix workflow exercised at least once (proves the upstream loop works for fbuild chore(release): 0.7.59 — restore mac + windows release lanes (closes #902) #908)

Upstream references

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions