dylint: candidate code-quality lints (gotcha sweep)

## Goal

zccache ships three dylints (`ban_std_pathbuf`, `ban_unrooted_tempdir`, `ban_raw_subprocess_in_daemon`) to lock in architectural invariants that have caused real bugs. fbuild has the equivalent of the third (`ban_raw_subprocess`) but not the others, and the codebase has its own set of recurring gotchas worth pinning with custom lints.

This issue is a punch list from a multi-agent sweep across five gotcha dimensions: path normalization, async/sync boundaries, process & device containment, module-level env state, test isolation, and panic discipline. Each item lists a candidate lint name, *why* it matters (with a concrete bug class the rule prevents), evidence file:line examples, and a "live violations" count to help triage which lints land as net-new enforcement vs. fix-and-then-lint.

---

## Candidate lints (ranked by impact)

### Tier 1 — direct ports from zccache, would catch real bugs today

**1. `ban_std_pathbuf`** — *direct port from zccache*
- Raw `std::path::PathBuf` doesn't carry fbuild's normalization invariant. Mixed-separator and case-variant paths flowing into cache keys / fingerprints / compile args have caused Windows-only cache key mismatches (#436, #437, #282).
- fbuild already has a `NormalizedPath` type at `crates/fbuild-core/src/path.rs` — the lint forces its use outside an explicit legacy allowlist.
- Evidence: `crates/fbuild-build/src/build_info.rs` BuildInfo struct uses `NormalizedPath` for `*_path` fields (correct pattern) but `crates/fbuild-packages/src/cache.rs` `Cache::cache_root: PathBuf` (raw) gets serialized; `crates/fbuild-build/src/zccache.rs:34` `FingerprintWatch::{cache_file, root}: PathBuf` participates in cache-key generation.

**2. `ban_unrooted_tempdir`** — *direct port from zccache*
- `tempfile::TempDir::new()` / `tempfile::tempdir()` land in `$TMPDIR`, which on Windows hits MAX_PATH and on CI may be tmpfs/RAM-disk — known cause of flaky tests.
- ~250 unrooted-TempDir uses across the workspace (production + tests). Highest concentration in `crates/fbuild-packages/src/toolchain/esp32.rs` (49+) and `crates/fbuild-test-support/src/mini_framework.rs:76`.
- Allowlist legacy callers; force new code to pass an explicit root path.

**3. `ban_std_sync_mutex_in_async` / `prefer_tokio_mutex_in_handlers`**
- After #813's full-async migration, lingering `std::sync::Mutex` in async-reachable code is a poison-panic + scheduler-starvation hazard. clippy's `await_holding_lock` catches the worst cases but misses many.
- ~40 `std::sync::Mutex` + `RwLock` declarations across `fbuild-daemon` and `fbuild-serial`. Examples: `crates/fbuild-daemon/src/device_manager.rs:209,288`, `crates/fbuild-daemon/src/context.rs:580`, `crates/fbuild-daemon/src/handlers/health.rs:193` (`.write().unwrap()` on RwLock).
- Implementation note: the lint should target the *declaration* in types reachable from async fn, not just the `.await` call site clippy already covers.

### Tier 2 — high-value lints to lock in *current clean state* and prevent regression

**4. `ban_direct_serialport`** — *explicitly tracked at FastLED/fbuild#605 Phase 1*
- Only `fbuild-serial` may touch the `serialport` crate; everyone else uses `SharedSerialManager` via the daemon's HTTP API.
- Live violations (intentional, exempt): `crates/fbuild-cli/src/cli/port_scan.rs:63`, `crates/fbuild-cli/src/cli/serial_probe.rs:224,272` — diagnostic subcommands per the crates/CLAUDE.md exception list.
- Allowlist: `fbuild-serial` (owner), `fbuild-deploy` (hardware reset ops), plus the two diagnostic CLI files above.

**5. `ban_deploy_tool_direct_invocation`**
- Per CLAUDE.md (FastLED/fbuild#694): never invoke `esptool` / `avrdude` / `picotool` / `dfu-util` / `pyocd` directly — go through `fbuild deploy`. The existing `ban_raw_subprocess` catches `.spawn()/.output()/.status()` on `Command` but doesn't pattern-match on the binary name.
- Current state: **clean**. No `Command::new("esptool" | "avrdude" | ...)` matches. The lint locks in the invariant.
- Allowlist: `fbuild-deploy` (legitimate owner), tests with `--ignored` hardware-required gates.

**6. `ban_file_based_locks`**
- CLAUDE.md: "No file-based locks — all locking through daemon's in-memory managers."
- Current state: **clean**. `OpenOptions::create_new(true)`, `fs2::FileExt`, `flock()` all return zero hits.
- The lint prevents anyone re-introducing OS-level port locks or lock-file patterns.

**7. `ban_await_in_spawn_blocking`**
- `spawn_blocking` is a sync escape hatch; `.await` inside requires `Handle::current().block_on(...)` which defeats the point and risks deadlock.
- Current state: **clean** — every `spawn_blocking` callsite in `fbuild-packages/src/disk_cache.rs` and friends is purely sync.
- No equivalent clippy lint.

### Tier 3 — error-handling discipline (high-volume, allowlist-heavy)

**8. `ban_unwrap_in_daemon_handlers`** (or narrower: `ban_serde_json_unwrap_in_handlers`)
- A panic inside a handler kills the daemon and disconnects every client. clippy's `unwrap_used` is too noisy for the whole workspace; scoping to `crates/fbuild-daemon/src/handlers/**` makes it precise.
- ~8 high-confidence violations all in one file: `crates/fbuild-daemon/src/handlers/websockets.rs:119,131,140,173,187` (`serde_json::to_string(&err_msg).unwrap()` in error-reply paths).
- 2 violations on streaming output parsing: `crates/fbuild-daemon/src/handlers/operations/build.rs:639,683` (`str::from_utf8(&chunk).unwrap()` on subprocess stdout — toolchains can and do emit invalid UTF-8).
- Recommended scope: handlers/ only; fix the ~10 sites first, then enable the lint to prevent regression.

**9. `require_env_lock_in_tests`**
- Process-wide env mutation in parallel-running cargo tests is a classic flake source. `crates/fbuild-daemon/src/handlers/emulator/tests_npm_cache.rs::env_lock()` is the canonical pattern.
- Violations: `crates/fbuild-build/tests/eh_frame_strip_esp32.rs:93,109,112,113,125` mutate `FBUILD_KEEP_EH_FRAME` without locking. `crates/fbuild-build/tests/avr_build.rs:401` has an `EnvGuard` but it's per-test, not cross-test serializing.
- Scope: `#[test]` and `#[tokio::test]` only.

### Tier 4 — preventive lints worth considering, lower priority

**10. `ban_env_var_set_after_import`** — Rust analog of the Python `paths.py` import-time bug. `crates/fbuild-daemon/src/main.rs:38` does `unsafe { std::env::set_var("FBUILD_DEV_MODE", "1") }` AFTER importing `fbuild_paths` / `fbuild_build`. `fbuild_paths` uses lazy accessors (safe), but any future `OnceLock`/`Lazy` that reads `FBUILD_DEV_MODE` at first-access time would freeze the wrong value.

**11. `require_multi_thread_flavor_when_spawning`** — `#[tokio::test]` defaults to current-thread; tests that `tokio::spawn()` may serialize and deadlock. Examples: `crates/fbuild-daemon/tests/build_streaming.rs:33-42`, `crates/fbuild-serial/src/manager/tests.rs:252`.

**12. `cli_no_build_deploy_direct_use`** — `fbuild-cli` is supposed to be a thin HTTP client. Currently clean (diagnostic subcommands are explicitly exempted per crates/CLAUDE.md) — lint locks it in.

**13. `require_oncelock_install_before_use`** — `crates/fbuild-build/src/compile_backend.rs` `GLOBAL: OnceLock<CompileBackend>` requires `install_global()` before any compile fires. The existing error message is good, but a lint that flags `.unwrap()` on `get_global()` outside of compile-flow contexts would catch ordering bugs at compile-time instead of runtime.

---

## Implementation notes

- Each lint should follow the pattern in `dylints/ban_raw_subprocess/` and zccache's `dylints/ban_std_pathbuf/`: separate crate pinning its own nightly toolchain, registered via `[workspace.metadata.dylint] libraries = [{ path = "dylints/*" }]`.
- Allowlists: zccache's `ban_std_pathbuf` uses `include_str!("allowlist.txt")` for legacy paths. Adopt the same shape so the grandfathered call sites are visible in the repo, not buried in lint code.
- Order: land Tier 1 first (`ban_std_pathbuf`, `ban_unrooted_tempdir`, `ban_std_sync_mutex_in_async`) — these catch *current* bugs. Tier 2 prevents regression on already-clean invariants. Tier 3 needs a fix-first sweep before enabling. Tier 4 is "if/when we have time."
- CI: `.github/workflows/dylint.yml` already runs `cargo dylint --all`, so adding new crates under `dylints/*` picks up automatically.

---

## Out of scope (mentioned, deliberately deferred)

- `require_normalized_path_for_cache_keys` (Tier 1 candidate but overlaps with `ban_std_pathbuf` — fold into it).
- Mocks-as-primary-test-surface (a CLAUDE.md principle but very hard to lint without false positives).
- Ignored-test rot audit (process question, not a lint).
- `process::exit` outside main (current state clean; very low priority).

🤖 Drafted by Claude Code from a 5-agent parallel sweep


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dylint: candidate code-quality lints (gotcha sweep) #826

Goal

Candidate lints (ranked by impact)

Tier 1 — direct ports from zccache, would catch real bugs today

Tier 2 — high-value lints to lock in current clean state and prevent regression

Tier 3 — error-handling discipline (high-volume, allowlist-heavy)

Tier 4 — preventive lints worth considering, lower priority

Implementation notes

Out of scope (mentioned, deliberately deferred)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

dylint: candidate code-quality lints (gotcha sweep) #826

Description

Goal

Candidate lints (ranked by impact)

Tier 1 — direct ports from zccache, would catch real bugs today

Tier 2 — high-value lints to lock in current clean state and prevent regression

Tier 3 — error-handling discipline (high-volume, allowlist-heavy)

Tier 4 — preventive lints worth considering, lower priority

Implementation notes

Out of scope (mentioned, deliberately deferred)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions