Skip to content

perf(build): default to dev profile + rust-lld for ~5x faster Rust-edit rebuild#744

Merged
zackees merged 1 commit into
mainfrom
perf/dev-profile-default
Jun 22, 2026
Merged

perf(build): default to dev profile + rust-lld for ~5x faster Rust-edit rebuild#744
zackees merged 1 commit into
mainfrom
perf/dev-profile-default

Conversation

@zackees

@zackees zackees commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

  • setup.py was always running cargo build --release, so every local uv sync/pip install rebuild went through the full release codegen pipeline (~100s after a real .rs edit). Switched the default to the dev profile; FBUILD_BUILD_RELEASE=1 opts back into release.
  • Third-party deps stay opt-level=3 via [profile.dev.package."*"] — only fbuild's own ~15 crates compile unoptimized, so runtime hot paths (serde/tokio/reqwest) keep their performance.
  • rust-lld becomes the default Windows linker via .cargo/config.toml. Ships with the Rust toolchain, ~2-5× faster than link.exe on link-heavy rebuilds. Verified active via rustc verbose.
  • The PyPI release flow bypasses setup.py entirely (it calls cargo zigbuild --release directly in release-auto.yml), so this only affects local dev installs. Published wheels are unchanged.

Numbers

Semantic edit to crates/fbuild-core/src/lib.rs then uv run python --version:

Build profile Time
Release (was the default) 100.1s
Dev (new default) 18.9s

5.3× speedup on the path you actually iterate on. Touch-only / no-source-change cases stay at ~14s and ~1s respectively because they're dominated by soldr+uv orchestration, not rustc — zccache covers the compile and dev-vs-release codegen never runs.

Files

  • setup.py_use_release_profile() reads FBUILD_BUILD_RELEASE env; profile-aware artifact search.
  • Cargo.toml[profile.dev.package."*"] opt-level = 3 keeps deps fast.
  • .cargo/config.toml[target.x86_64-pc-windows-msvc] linker = "rust-lld.exe".

Test plan

  • uv sync --reinstall-package fbuild produces a working fbuild.exe (dev profile)
  • FBUILD_BUILD_RELEASE=1 uv sync --reinstall-package fbuild produces a release binary
  • from fbuild.api import SerialMonitor imports against the rebuilt wheel
  • Verified rust-lld.exe is the linker via cargo build -p fbuild-cli -v
  • Verified opt-level=3 is applied to third-party crates via rustc verbose
  • Linux/macOS — .cargo/config.toml change is Windows-target-scoped only, so no effect there; CI will confirm
  • .github/workflows/release-auto.yml unaffected (calls cargo zigbuild --release directly, doesn't go through setup.py)

🤖 Generated with Claude Code

… on Rust edits

setup.py was always invoking `soldr cargo build --release -p fbuild-cli`,
so every `pip install`/`uv sync` rebuild went through the full release
codegen pipeline (opt-level=3 + LLVM passes + link.exe) — about 100s
per real source edit even with a hot cache. The shipped wheel needs
that pipeline; the dev iteration loop does not.

Three coordinated changes:

- `setup.py`: build with the dev profile by default. Pass `--release`
  only when `FBUILD_BUILD_RELEASE=1` is set, so packaging / perf-test
  flows can still produce an optimized binary. The PyPI release path
  bypasses setup.py entirely (it calls cargo zigbuild --release direct
  in `release-auto.yml`), so this only affects local installs.
- `Cargo.toml`: `[profile.dev.package."*"] opt-level = 3` so third-party
  deps stay optimized even in the dev profile — they compile once on
  first install and the runtime hot paths (serde, tokio, reqwest) keep
  their performance. Only fbuild's own ~15 crates compile at opt-level=0,
  which is exactly where edit-then-rebuild churn happens.
- `.cargo/config.toml`: `[target.x86_64-pc-windows-msvc] linker =
  "rust-lld.exe"`. Ships with the Rust toolchain, ~2-5x faster than
  link.exe on link-heavy rebuilds. Cross-profile, cross-tool.

Measurements (semantic edit to `crates/fbuild-core/src/lib.rs`, then
`uv run python --version`):

| Profile                  | Time   |
|--------------------------|-------:|
| Release (was the default)| 100.1s |
| Dev (new default)        |  18.9s |

5.3x speedup on the path the user actually edits in. Touch-only edits
remain dominated by soldr+uv overhead (~14s) because zccache hits
cover the rustc work and dev-vs-release codegen never runs — those
weren't slower to begin with, the cost there is process-spawn overhead.

If you ever need the optimized CLI locally (perf testing, debugging a
real-world slowdown):

    FBUILD_BUILD_RELEASE=1 uv sync --reinstall-package fbuild

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@zackees, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 26 minutes and 29 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2adb87da-b3f1-43e6-8368-217d420c9037

📥 Commits

Reviewing files that changed from the base of the PR and between 325d8c3 and 38fa437.

📒 Files selected for processing (3)
  • .cargo/config.toml
  • Cargo.toml
  • setup.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/dev-profile-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zackees zackees merged commit 5087c8f into main Jun 22, 2026
84 of 91 checks passed
@zackees zackees deleted the perf/dev-profile-default branch June 22, 2026 00:23
zackees added a commit that referenced this pull request Jun 22, 2026
- Stop tracking ad-hoc `port_scan_*.txt` captures. PR #741 added the
  `fbuild port scan` command; users sometimes redirect its output to a
  local file for diagnostic comparison. Those captures aren't useful
  to anyone else and never belonged in the working tree.

- Refresh `ci/bench-results/REPORT.md` to cover the full cumulative
  story after PR #743 and PR #744:
  - No-source-change reinstall: 14.9s -> 1.1s (13.6x, PR #743).
  - Real `.rs` edit + uv run: 100.1s -> 18.9s (5.3x, PR #744).
  - The remaining ~14s touch-only floor is soldr/zccache wrapper
    overhead; profiled it down to the second and filed upstream as
    zackees/soldr#883.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fastled-project-sync fastled-project-sync Bot moved this to Triage in FastLED Tracker Jun 22, 2026
zackees added a commit to zackees/template-python-rust-cmd that referenced this pull request Jun 22, 2026
… items 2+3) (#3)

Two cargo-config knobs ported from FastLED/fbuild#744 that together
collapse the local Rust-edit iteration loop without any release-build
regression:

1. `[profile.dev.package."*"] opt-level = 3` in Cargo.toml. Cargo
   compiles each upstream crate once and caches the opt-level-3
   artifact, so runtime hot paths (pyo3, serde, etc.) stay at
   release-grade perf while first-party crates compile unoptimized for
   fast iteration. This is what makes "default to dev for local
   iteration" safe for downstream consumers; without it, defaulting to
   dev silently regresses runtime perf.

2. `[target.{x86_64,aarch64}-pc-windows-msvc] linker = "rust-lld.exe"`
   in .cargo/config.toml. rust-lld ships with the Rust toolchain (no
   extra install) and links 2-5x faster than MSVC's link.exe on
   link-heavy rebuilds. Escape hatch: RUSTFLAGS="-C linker=link.exe"
   for the rare case rust-lld trips on a foreign object file. POSIX
   builds keep their platform default linker.

Verified `cargo build -p template-cli` succeeds with both changes in
place. fbuild measured ~5x faster Rust-edit rebuild from this pair of
changes on its workspace; will be smaller here because the template's
dep tree is tiny, but the proportional win scales with downstream
consumers as they grow.

Refs #2 (items 2 + 3).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
zackees added a commit that referenced this pull request Jun 22, 2026
….pth (not a PEP 660 finder) (#748)

The per-package `package-dir = {fbuild = "python/fbuild",
"fbuild.api" = "python/fbuild/api"}` map worked for the wheel build but
made setuptools' editable install emit a PEP 660 meta-path finder
(`__editable___fbuild_*_finder.py`) whose `MAPPING` only registered
`fbuild`. `fbuild.api` resolved at runtime via
`importlib.machinery.PathFinder` — fine for `import fbuild.api`, but
invisible to static analyzers (ty, pyright, mypy) that walk
`top_level.txt` and `.pth` files. Downstream consumers (FastLED hit this
on `from fbuild.api import SerialMonitor`) saw
`Cannot resolve imported module 'fbuild.api'` even though runtime imports
worked.

Replace the per-package map with the canonical src-layout pattern
`package-dir = {"" = "python"}`. Setuptools now emits a plain
`.pth` pointing at `python/`, which every static analyzer handles, and
the wheel-build path is unaffected.

Verified:
- Wheel (`python -m build`): byte-for-byte identical to baseline
  (10,454,675 B). Same contents: `fbuild/__init__.py`,
  `fbuild/_native.pyd`, `fbuild/api/__init__.py`,
  `fbuild-2.2.31.data/scripts/fbuild.exe`, dist-info.
- Sdist: +2,811 B / +0.12%, entirely from `fbuild.egg-info/` relocating
  to `python/fbuild.egg-info/`. Cosmetic.
- `ci/publish.py::build_wheel` is independent of setuptools — it
  hand-rolls wheels from the hard-coded `PYTHON_SHIMS_DIR = ROOT /
  "python"`. Smoke-tested: still finds the same 2 shims
  (`fbuild/__init__.py`, `fbuild/api/__init__.py`). Release path
  unaffected.
- Editable reinstall path: ~8s (FastLED → fbuild via uv sources),
  within noise of baseline. No regression to the perf work in #743 /
  #744.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

1 participant