perf(build): default to dev profile + rust-lld for ~5x faster Rust-edit rebuild#744
Conversation
… on Rust edits
setup.py was always invoking `soldr cargo build --release -p fbuild-cli`,
so every `pip install`/`uv sync` rebuild went through the full release
codegen pipeline (opt-level=3 + LLVM passes + link.exe) — about 100s
per real source edit even with a hot cache. The shipped wheel needs
that pipeline; the dev iteration loop does not.
Three coordinated changes:
- `setup.py`: build with the dev profile by default. Pass `--release`
only when `FBUILD_BUILD_RELEASE=1` is set, so packaging / perf-test
flows can still produce an optimized binary. The PyPI release path
bypasses setup.py entirely (it calls cargo zigbuild --release direct
in `release-auto.yml`), so this only affects local installs.
- `Cargo.toml`: `[profile.dev.package."*"] opt-level = 3` so third-party
deps stay optimized even in the dev profile — they compile once on
first install and the runtime hot paths (serde, tokio, reqwest) keep
their performance. Only fbuild's own ~15 crates compile at opt-level=0,
which is exactly where edit-then-rebuild churn happens.
- `.cargo/config.toml`: `[target.x86_64-pc-windows-msvc] linker =
"rust-lld.exe"`. Ships with the Rust toolchain, ~2-5x faster than
link.exe on link-heavy rebuilds. Cross-profile, cross-tool.
Measurements (semantic edit to `crates/fbuild-core/src/lib.rs`, then
`uv run python --version`):
| Profile | Time |
|--------------------------|-------:|
| Release (was the default)| 100.1s |
| Dev (new default) | 18.9s |
5.3x speedup on the path the user actually edits in. Touch-only edits
remain dominated by soldr+uv overhead (~14s) because zccache hits
cover the rustc work and dev-vs-release codegen never runs — those
weren't slower to begin with, the cost there is process-spawn overhead.
If you ever need the optimized CLI locally (perf testing, debugging a
real-world slowdown):
FBUILD_BUILD_RELEASE=1 uv sync --reinstall-package fbuild
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 26 minutes and 29 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Stop tracking ad-hoc `port_scan_*.txt` captures. PR #741 added the `fbuild port scan` command; users sometimes redirect its output to a local file for diagnostic comparison. Those captures aren't useful to anyone else and never belonged in the working tree. - Refresh `ci/bench-results/REPORT.md` to cover the full cumulative story after PR #743 and PR #744: - No-source-change reinstall: 14.9s -> 1.1s (13.6x, PR #743). - Real `.rs` edit + uv run: 100.1s -> 18.9s (5.3x, PR #744). - The remaining ~14s touch-only floor is soldr/zccache wrapper overhead; profiled it down to the second and filed upstream as zackees/soldr#883. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… items 2+3) (#3) Two cargo-config knobs ported from FastLED/fbuild#744 that together collapse the local Rust-edit iteration loop without any release-build regression: 1. `[profile.dev.package."*"] opt-level = 3` in Cargo.toml. Cargo compiles each upstream crate once and caches the opt-level-3 artifact, so runtime hot paths (pyo3, serde, etc.) stay at release-grade perf while first-party crates compile unoptimized for fast iteration. This is what makes "default to dev for local iteration" safe for downstream consumers; without it, defaulting to dev silently regresses runtime perf. 2. `[target.{x86_64,aarch64}-pc-windows-msvc] linker = "rust-lld.exe"` in .cargo/config.toml. rust-lld ships with the Rust toolchain (no extra install) and links 2-5x faster than MSVC's link.exe on link-heavy rebuilds. Escape hatch: RUSTFLAGS="-C linker=link.exe" for the rare case rust-lld trips on a foreign object file. POSIX builds keep their platform default linker. Verified `cargo build -p template-cli` succeeds with both changes in place. fbuild measured ~5x faster Rust-edit rebuild from this pair of changes on its workspace; will be smaller here because the template's dep tree is tiny, but the proportional win scales with downstream consumers as they grow. Refs #2 (items 2 + 3). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
….pth (not a PEP 660 finder) (#748) The per-package `package-dir = {fbuild = "python/fbuild", "fbuild.api" = "python/fbuild/api"}` map worked for the wheel build but made setuptools' editable install emit a PEP 660 meta-path finder (`__editable___fbuild_*_finder.py`) whose `MAPPING` only registered `fbuild`. `fbuild.api` resolved at runtime via `importlib.machinery.PathFinder` — fine for `import fbuild.api`, but invisible to static analyzers (ty, pyright, mypy) that walk `top_level.txt` and `.pth` files. Downstream consumers (FastLED hit this on `from fbuild.api import SerialMonitor`) saw `Cannot resolve imported module 'fbuild.api'` even though runtime imports worked. Replace the per-package map with the canonical src-layout pattern `package-dir = {"" = "python"}`. Setuptools now emits a plain `.pth` pointing at `python/`, which every static analyzer handles, and the wheel-build path is unaffected. Verified: - Wheel (`python -m build`): byte-for-byte identical to baseline (10,454,675 B). Same contents: `fbuild/__init__.py`, `fbuild/_native.pyd`, `fbuild/api/__init__.py`, `fbuild-2.2.31.data/scripts/fbuild.exe`, dist-info. - Sdist: +2,811 B / +0.12%, entirely from `fbuild.egg-info/` relocating to `python/fbuild.egg-info/`. Cosmetic. - `ci/publish.py::build_wheel` is independent of setuptools — it hand-rolls wheels from the hard-coded `PYTHON_SHIMS_DIR = ROOT / "python"`. Smoke-tested: still finds the same 2 shims (`fbuild/__init__.py`, `fbuild/api/__init__.py`). Release path unaffected. - Editable reinstall path: ~8s (FastLED → fbuild via uv sources), within noise of baseline. No regression to the perf work in #743 / #744. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
setup.pywas always runningcargo build --release, so every localuv sync/pip installrebuild went through the full release codegen pipeline (~100s after a real.rsedit). Switched the default to the dev profile;FBUILD_BUILD_RELEASE=1opts back into release.[profile.dev.package."*"]— only fbuild's own ~15 crates compile unoptimized, so runtime hot paths (serde/tokio/reqwest) keep their performance.rust-lldbecomes the default Windows linker via.cargo/config.toml. Ships with the Rust toolchain, ~2-5× faster thanlink.exeon link-heavy rebuilds. Verified active via rustc verbose.setup.pyentirely (it callscargo zigbuild --releasedirectly inrelease-auto.yml), so this only affects local dev installs. Published wheels are unchanged.Numbers
Semantic edit to
crates/fbuild-core/src/lib.rsthenuv run python --version:5.3× speedup on the path you actually iterate on. Touch-only / no-source-change cases stay at ~14s and ~1s respectively because they're dominated by soldr+uv orchestration, not rustc — zccache covers the compile and dev-vs-release codegen never runs.
Files
setup.py—_use_release_profile()readsFBUILD_BUILD_RELEASEenv; profile-aware artifact search.Cargo.toml—[profile.dev.package."*"] opt-level = 3keeps deps fast..cargo/config.toml—[target.x86_64-pc-windows-msvc] linker = "rust-lld.exe".Test plan
uv sync --reinstall-package fbuildproduces a workingfbuild.exe(dev profile)FBUILD_BUILD_RELEASE=1 uv sync --reinstall-package fbuildproduces a release binaryfrom fbuild.api import SerialMonitorimports against the rebuilt wheelrust-lld.exeis the linker viacargo build -p fbuild-cli -vopt-level=3is applied to third-party crates via rustc verbose.cargo/config.tomlchange is Windows-target-scoped only, so no effect there; CI will confirm.github/workflows/release-auto.ymlunaffected (callscargo zigbuild --releasedirectly, doesn't go throughsetup.py)🤖 Generated with Claude Code