Skip to content

dylint: candidate code-quality lints (gotcha sweep) #826

Description

@zackees

Goal

zccache ships three dylints (ban_std_pathbuf, ban_unrooted_tempdir, ban_raw_subprocess_in_daemon) to lock in architectural invariants that have caused real bugs. fbuild has the equivalent of the third (ban_raw_subprocess) but not the others, and the codebase has its own set of recurring gotchas worth pinning with custom lints.

This issue is a punch list from a multi-agent sweep across five gotcha dimensions: path normalization, async/sync boundaries, process & device containment, module-level env state, test isolation, and panic discipline. Each item lists a candidate lint name, why it matters (with a concrete bug class the rule prevents), evidence file:line examples, and a "live violations" count to help triage which lints land as net-new enforcement vs. fix-and-then-lint.


Candidate lints (ranked by impact)

Tier 1 — direct ports from zccache, would catch real bugs today

1. ban_std_pathbufdirect port from zccache

2. ban_unrooted_tempdirdirect port from zccache

  • tempfile::TempDir::new() / tempfile::tempdir() land in $TMPDIR, which on Windows hits MAX_PATH and on CI may be tmpfs/RAM-disk — known cause of flaky tests.
  • ~250 unrooted-TempDir uses across the workspace (production + tests). Highest concentration in crates/fbuild-packages/src/toolchain/esp32.rs (49+) and crates/fbuild-test-support/src/mini_framework.rs:76.
  • Allowlist legacy callers; force new code to pass an explicit root path.

3. ban_std_sync_mutex_in_async / prefer_tokio_mutex_in_handlers

  • After audit: go fully async — whole-app tokio runtime sharing for tokio-console (meta) #813's full-async migration, lingering std::sync::Mutex in async-reachable code is a poison-panic + scheduler-starvation hazard. clippy's await_holding_lock catches the worst cases but misses many.
  • ~40 std::sync::Mutex + RwLock declarations across fbuild-daemon and fbuild-serial. Examples: crates/fbuild-daemon/src/device_manager.rs:209,288, crates/fbuild-daemon/src/context.rs:580, crates/fbuild-daemon/src/handlers/health.rs:193 (.write().unwrap() on RwLock).
  • Implementation note: the lint should target the declaration in types reachable from async fn, not just the .await call site clippy already covers.

Tier 2 — high-value lints to lock in current clean state and prevent regression

4. ban_direct_serialportexplicitly tracked at #605 Phase 1

  • Only fbuild-serial may touch the serialport crate; everyone else uses SharedSerialManager via the daemon's HTTP API.
  • Live violations (intentional, exempt): crates/fbuild-cli/src/cli/port_scan.rs:63, crates/fbuild-cli/src/cli/serial_probe.rs:224,272 — diagnostic subcommands per the crates/CLAUDE.md exception list.
  • Allowlist: fbuild-serial (owner), fbuild-deploy (hardware reset ops), plus the two diagnostic CLI files above.

5. ban_deploy_tool_direct_invocation

6. ban_file_based_locks

  • CLAUDE.md: "No file-based locks — all locking through daemon's in-memory managers."
  • Current state: clean. OpenOptions::create_new(true), fs2::FileExt, flock() all return zero hits.
  • The lint prevents anyone re-introducing OS-level port locks or lock-file patterns.

7. ban_await_in_spawn_blocking

  • spawn_blocking is a sync escape hatch; .await inside requires Handle::current().block_on(...) which defeats the point and risks deadlock.
  • Current state: clean — every spawn_blocking callsite in fbuild-packages/src/disk_cache.rs and friends is purely sync.
  • No equivalent clippy lint.

Tier 3 — error-handling discipline (high-volume, allowlist-heavy)

8. ban_unwrap_in_daemon_handlers (or narrower: ban_serde_json_unwrap_in_handlers)

  • A panic inside a handler kills the daemon and disconnects every client. clippy's unwrap_used is too noisy for the whole workspace; scoping to crates/fbuild-daemon/src/handlers/** makes it precise.
  • ~8 high-confidence violations all in one file: crates/fbuild-daemon/src/handlers/websockets.rs:119,131,140,173,187 (serde_json::to_string(&err_msg).unwrap() in error-reply paths).
  • 2 violations on streaming output parsing: crates/fbuild-daemon/src/handlers/operations/build.rs:639,683 (str::from_utf8(&chunk).unwrap() on subprocess stdout — toolchains can and do emit invalid UTF-8).
  • Recommended scope: handlers/ only; fix the ~10 sites first, then enable the lint to prevent regression.

9. require_env_lock_in_tests

  • Process-wide env mutation in parallel-running cargo tests is a classic flake source. crates/fbuild-daemon/src/handlers/emulator/tests_npm_cache.rs::env_lock() is the canonical pattern.
  • Violations: crates/fbuild-build/tests/eh_frame_strip_esp32.rs:93,109,112,113,125 mutate FBUILD_KEEP_EH_FRAME without locking. crates/fbuild-build/tests/avr_build.rs:401 has an EnvGuard but it's per-test, not cross-test serializing.
  • Scope: #[test] and #[tokio::test] only.

Tier 4 — preventive lints worth considering, lower priority

10. ban_env_var_set_after_import — Rust analog of the Python paths.py import-time bug. crates/fbuild-daemon/src/main.rs:38 does unsafe { std::env::set_var("FBUILD_DEV_MODE", "1") } AFTER importing fbuild_paths / fbuild_build. fbuild_paths uses lazy accessors (safe), but any future OnceLock/Lazy that reads FBUILD_DEV_MODE at first-access time would freeze the wrong value.

11. require_multi_thread_flavor_when_spawning#[tokio::test] defaults to current-thread; tests that tokio::spawn() may serialize and deadlock. Examples: crates/fbuild-daemon/tests/build_streaming.rs:33-42, crates/fbuild-serial/src/manager/tests.rs:252.

12. cli_no_build_deploy_direct_usefbuild-cli is supposed to be a thin HTTP client. Currently clean (diagnostic subcommands are explicitly exempted per crates/CLAUDE.md) — lint locks it in.

13. require_oncelock_install_before_usecrates/fbuild-build/src/compile_backend.rs GLOBAL: OnceLock<CompileBackend> requires install_global() before any compile fires. The existing error message is good, but a lint that flags .unwrap() on get_global() outside of compile-flow contexts would catch ordering bugs at compile-time instead of runtime.


Implementation notes

  • Each lint should follow the pattern in dylints/ban_raw_subprocess/ and zccache's dylints/ban_std_pathbuf/: separate crate pinning its own nightly toolchain, registered via [workspace.metadata.dylint] libraries = [{ path = "dylints/*" }].
  • Allowlists: zccache's ban_std_pathbuf uses include_str!("allowlist.txt") for legacy paths. Adopt the same shape so the grandfathered call sites are visible in the repo, not buried in lint code.
  • Order: land Tier 1 first (ban_std_pathbuf, ban_unrooted_tempdir, ban_std_sync_mutex_in_async) — these catch current bugs. Tier 2 prevents regression on already-clean invariants. Tier 3 needs a fix-first sweep before enabling. Tier 4 is "if/when we have time."
  • CI: .github/workflows/dylint.yml already runs cargo dylint --all, so adding new crates under dylints/* picks up automatically.

Out of scope (mentioned, deliberately deferred)

  • require_normalized_path_for_cache_keys (Tier 1 candidate but overlaps with ban_std_pathbuf — fold into it).
  • Mocks-as-primary-test-surface (a CLAUDE.md principle but very hard to lint without false positives).
  • Ignored-test rot audit (process question, not a lint).
  • process::exit outside main (current state clean; very low priority).

🤖 Drafted by Claude Code from a 5-agent parallel sweep

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions