Skip to content

Harden cook stability: graceful shutdown, preheat race, PID guards#20

Merged
CamSoper merged 1 commit into
mainfrom
claude/focused-fermat-cqmw6d
Jun 15, 2026
Merged

Harden cook stability: graceful shutdown, preheat race, PID guards#20
CamSoper merged 1 commit into
mainfrom
claude/focused-fermat-cqmw6d

Conversation

@CamSoper

Copy link
Copy Markdown
Owner

What & why

Implements the highest-value, low-controversy findings from a full project review, focused on cook stability and safety. Build is clean (0 warnings) and all tests pass (68, incl. 2 new).

🔴 Critical

Graceful shutdown — relays no longer left energized on stop (C1).
Previously, hardware was built at top level and app.Run() just blocked; nothing disposed the Smoker or devices, so a systemctl stop / SIGTERM / Ctrl-C left the auger and igniter relays in their last commanded state. Now:

  • Smoker implements IDisposable; the host's ApplicationStopping hook tears it down and disposes the shared GpioController.
  • Smoker.Dispose() drives a hard safe-off (auger + igniter off) and releases every device (RelayDevice.Dispose drives pins off and closes them; RtdArray frees the ADC/SPI; Display frees the LCD).
  • All background loops (mode, fire-minder, display, ADC read, preheat) now stop on a lifetime cancellation token.

Note: this covers clean stops and SIGTERM. A hardware watchdog (/dev/watchdog, systemd WatchdogSec=) is still the right follow-up to cover hard crashes / SIGKILL / power-supervisor faults, since software can't guarantee cleanup there.

Preheat detector data race (C2).
PreheatMonitor.Update() ran inside the Smoker.Status property getter, mutating a non-concurrent Queue concurrently from the display loop, every /api/status request, and the MQTT poller. Moved sampling to a dedicated 1 Hz loop and guarded the monitor with a lock.

🟡 Nice to have

  • Preheat sampling cadence (N1). Sampling on a fixed 1 Hz loop instead of per-Status-read keeps the 60-sample window a true ~60 s regardless of how many clients poll — no more premature "Preheated."
  • CancellationTokenSource lifecycle (N3). The per-mode token is now created/disposed by ModeLoop under a lock that SetMode also takes to cancel, eliminating the dispose-vs-cancel race that could throw ObjectDisposedException and drop a mode change.
  • PID guards (N4). Seed _lastUpdate/_lastTemp; return proportional-only on the first (or post-dropout) reading; skip integral/derivative when dt<=0; re-seed after a NaN so a sensor dropout can't spike the derivative. (All existing PID tests still pass.)
  • SmokerProxy (N8). 10 s HttpClient timeout so a hung API can't stall callers (e.g. the MQTT bridge's state loop); defensive p-value parse.

Tests

  • SmokerPidTests: back-to-back calls (dt≈0) return finite (no divide-by-zero NaN/Inf).
  • PreheatMonitorTests: concurrent Update/Reset from many threads doesn't throw/corrupt.

Deliberately NOT included (needs a product decision)

FireMinder.GetFireCheckTemp() integer division (N2). SetPoint/180 is integer division, so the reignition margin is a flat 30 °F below setpoint up to 359, then jumps to 60 °F at 360. The intended proportional scaling is lost. Fixing it changes tested cook behavior, and the right target (flat margin vs. proportional SetPoint*(1-30/180)) is a judgment call — happy to do it in a follow-up once you pick the behavior.

Other review items left for follow-up: making pins/timeouts/PID gains configurable + a dev/sim device mode (which would unblock unit-testing the Smoker state machine), and the home-LAN security notes (no API auth, MQTT without TLS, Pulumi writing the MQTT password into the systemd unit).

Verification

  • dotnet build Inferno.sln → clean
  • dotnet test Inferno.Tests → 68 passed

https://claude.ai/code/session_01T8v7xa23G8v1GjsSFBwzme


Generated by Claude Code

Address the highest-value stability/safety findings from the project review:

- Graceful shutdown (C1): Smoker is now IDisposable and the host's
  ApplicationStopping hook tears it down, so a clean stop/SIGTERM drives the
  auger and igniter to a safe off-state and releases GPIO/SPI/I2C instead of
  leaving relays in their last commanded state. All background loops (mode,
  fire minder, display, ADC read, preheat) now stop on a lifetime token.

- Preheat sampling (C2/N1): moved PreheatMonitor.Update out of the Status
  property getter into a dedicated 1 Hz loop, and guarded the monitor's
  non-concurrent queue with a lock. Fixes a data race (display loop + every
  /status caller) and keeps the 60-sample window a true ~60s regardless of
  how many clients poll status.

- CancellationTokenSource lifecycle (N3): the per-mode token is now created
  and disposed by ModeLoop under a lock that SetMode also takes to cancel,
  eliminating the dispose-vs-cancel race that could throw
  ObjectDisposedException and drop a mode change.

- PID guards (N4): seed _lastUpdate/_lastTemp, return proportional-only on the
  first (or post-dropout) reading, skip integral/derivative when dt<=0, and
  re-seed after a NaN so a sensor dropout can't spike the derivative.

- SmokerProxy (N8): add a 10s HTTP timeout so a hung API can't stall callers
  (e.g. the MQTT bridge), and parse p-value defensively.

Adds regression tests for the PID same-instant divide and concurrent
PreheatMonitor access. Full review notes captured separately.
@pulumi

pulumi Bot commented Jun 15, 2026

Copy link
Copy Markdown

🚀 The Update (preview) for CamSoper-org/inferno-deploy/main (at 0826138) was successful.

✨ Neo Code Review

Routine redeployment of updated application binaries to the target device; the infrastructure is unchanged. ✅ Low Risk

This is a routine code deployment. The trigger hashes on the publish/restart commands changed because the application source changed, causing Pulumi to rebuild and redeploy all three service binaries (API, CLI, MQTT) and restart the corresponding services on the target device.

The PR itself is a correctness/safety hardening pass: cooperative cancellation on shutdown (so auger and igniter relays are de-energized cleanly on systemctl stop), a PID fix for divide-by-zero on same-instant calls and NaN seeding, a lock on PreheatMonitor to prevent queue corruption between the preheat loop and mode-change API threads, and an HttpClient timeout in SmokerProxy. None of these changes affect infrastructure resources.

Resource Changes

    Name          Type                    Operation
+-  restart-api   command:remote:Command  replaced
+-  publish-api   command:local:Command   replaced
+-  publish-cli   command:local:Command   replaced
+-  publish-mqtt  command:local:Command   replaced
+-  restart-mqtt  command:remote:Command  replaced

@CamSoper CamSoper marked this pull request as ready for review June 15, 2026 17:45
@CamSoper CamSoper merged commit 5f4c485 into main Jun 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants