feat(ops): Tier 3 operational features — heartbeat + session-freshness#18
Merged
Conversation
…s + docs Operational hardening (PR 3 of the plan; verify-order deferred to its own PR): - Dead-man's-switch heartbeat (heartbeat.ping + Engine._heartbeat_tick): the worker pings ROOMIEORDER_HEARTBEAT_URL on a timer (ROOMIEORDER_HEARTBEAT_INTERVAL_SECONDS, default 300), so a wedged worker thread — which systemd Restart=on-failure can't see — stops the pings and an external monitor alerts. The URL is the only coupling, so it works with hosted Healthchecks.io, a self-hosted open-source Healthchecks instance, Uptime Kuma push, etc. The tick runs before the loop's pause/idle continues, so a paused-but-alive loop still reports liveness. No-op when unset; best-effort. - Proactive session-freshness probe (Engine._session_check_tick): every ROOMIEORDER_SESSION_CHECK_HOURS (default 0 = off) the worker relaunches each present store profile read-only via the buy flow's verify_session and notifies if it reloads logged out — catching an expired session before a real order hits the sign-in wall. Runs on the worker thread between claims (never overlaps a buy), per-provider best-effort. Adds a shared purchase.build_purchaser factory (cli._purchaser_for now delegates to it). - Docs: README status enum synced (adds needs_review/blocked/spend_capped/ unavailable/skipped_debounce), `catalog show` → `catalog`, a Health-monitoring section, and a state backup/restore note; env.example documents the new vars; AGENTS.md status list adds `blocked`. Tests cover ping (empty/success/error-swallowed), the heartbeat interval gate, session-check notify-on-logged-out / disabled-by-default, and the new config vars. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01J322vufhqKDvdJmukTZdkq
The nix pre-commit-check runs mypy across all files, including tests, which the
local `mypy src` run didn't cover. Fixes:
- test_heartbeat.py: monkeypatch httpx via the string target
("roomieorder.heartbeat.httpx.get") instead of the heartbeat.httpx attribute,
which mypy flags as a non-exported re-import [attr-defined].
- test_main.py: replace `lambda url: pings.append(url) or True` with named
helpers (list.append returns None → [func-returns-value]); drop the now-unused
`# type: ignore[assignment]` on the recording-notifier assignment.
Behavior unchanged; pytest/ruff/mypy(src+tests) all green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01J322vufhqKDvdJmukTZdkq
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 3 of the plan — Tier 3 operational features, the CI-testable subset. verify-order (3.1) is intentionally not here — its order-history selectors are entirely unverified and need live bring-up, so it ships on its own branch next (operator decision). This PR is heartbeat + session-freshness + doc fixes.
3.3 Dead-man's-switch heartbeat — open-source friendly
systemd Restart=on-failureonly catches a full process exit, so a worker thread that wedges (hung browser, deadlock) leaves the process up and the queue silently undrained. Newheartbeat.ping()+Engine._heartbeat_tick()pingROOMIEORDER_HEARTBEAT_URLon a timer (ROOMIEORDER_HEARTBEAT_INTERVAL_SECONDS, default 300); when the pings stop, the monitor alerts.The URL is the only coupling, so it works with any push-style monitor — hosted Healthchecks.io or a self-hosted open-source Healthchecks instance (
https://hc.example.com/ping/<uuid>), Uptime Kuma push, etc. The tick runs before the loop's pause/idlecontinues, so a paused-but-alive loop still reports liveness (that's exactly the signal a wedged worker can't send). No-op when unset; ping failures are logged and swallowed — monitoring can never take the worker down.3.2 Proactive session-freshness check + notify
Sessions expire silently; today the first symptom is a failed real order at the sign-in wall.
Engine._session_check_tick()runs everyROOMIEORDER_SESSION_CHECK_HOURS(default 0 = off): it relaunches each present store profile read-only via the buy flow's existingverify_session()and notifies if it reloads logged out — before a real order fails. Runs on the worker thread between claims (never overlaps a buy), per-provider best-effort, timestamp advances even on error so a broken probe can't hammer the stores.Adds a shared
purchase.build_purchaser(config, provider)factory (the worker's probe, the orchestrator, andcli._purchaser_fornow share one source of truth —_purchaser_fordelegates to it).3.4 Documentation fixes
store.Status(addsneeds_review,blocked,spend_capped,unavailable,skipped_debounce);catalog show→catalog.examples/env.exampledocuments the three new vars;AGENTS.mdstatus list addsblocked.Tests
pytest(198 passed / 11 skipped),ruff,mypy srcall green.test_heartbeat.py— empty-URL no-op, success path, error swallowedtest_main.py— heartbeat interval gate (due / skip / due-again), no-op when URL unset; session-check notifies only the logged-out provider, and is a no-op when disabled (default)test_config.py— the three new vars parse with defaults + overridesNot in this PR (follow-up)
verify-order (3.1) — read-only order-history scrape to resolve
needs_review, on its own branch since its selectors require live bring-up (dump-dom) before they can be trusted.🤖 Generated with Claude Code
Generated by Claude Code