Skip to content

feat(ops): Tier 3 operational features — heartbeat + session-freshness#18

Merged
Multipixelone merged 2 commits into
mainfrom
claude/tier3-operational
Jun 22, 2026
Merged

feat(ops): Tier 3 operational features — heartbeat + session-freshness#18
Multipixelone merged 2 commits into
mainfrom
claude/tier3-operational

Conversation

@Multipixelone

Copy link
Copy Markdown
Owner

Summary

PR 3 of the plan — Tier 3 operational features, the CI-testable subset. verify-order (3.1) is intentionally not here — its order-history selectors are entirely unverified and need live bring-up, so it ships on its own branch next (operator decision). This PR is heartbeat + session-freshness + doc fixes.

3.3 Dead-man's-switch heartbeat — open-source friendly

systemd Restart=on-failure only catches a full process exit, so a worker thread that wedges (hung browser, deadlock) leaves the process up and the queue silently undrained. New heartbeat.ping() + Engine._heartbeat_tick() ping ROOMIEORDER_HEARTBEAT_URL on a timer (ROOMIEORDER_HEARTBEAT_INTERVAL_SECONDS, default 300); when the pings stop, the monitor alerts.

The URL is the only coupling, so it works with any push-style monitor — hosted Healthchecks.io or a self-hosted open-source Healthchecks instance (https://hc.example.com/ping/<uuid>), Uptime Kuma push, etc. The tick runs before the loop's pause/idle continues, so a paused-but-alive loop still reports liveness (that's exactly the signal a wedged worker can't send). No-op when unset; ping failures are logged and swallowed — monitoring can never take the worker down.

3.2 Proactive session-freshness check + notify

Sessions expire silently; today the first symptom is a failed real order at the sign-in wall. Engine._session_check_tick() runs every ROOMIEORDER_SESSION_CHECK_HOURS (default 0 = off): it relaunches each present store profile read-only via the buy flow's existing verify_session() and notifies if it reloads logged out — before a real order fails. Runs on the worker thread between claims (never overlaps a buy), per-provider best-effort, timestamp advances even on error so a broken probe can't hammer the stores.

Adds a shared purchase.build_purchaser(config, provider) factory (the worker's probe, the orchestrator, and cli._purchaser_for now share one source of truth — _purchaser_for delegates to it).

3.4 Documentation fixes

  • README status enum synced to store.Status (adds needs_review, blocked, spend_capped, unavailable, skipped_debounce); catalog showcatalog.
  • New README Health monitoring section + a state backup/restore note (the SQLite DB + per-store browser profiles are the only durable state).
  • examples/env.example documents the three new vars; AGENTS.md status list adds blocked.

Tests

pytest (198 passed / 11 skipped), ruff, mypy src all green.

  • test_heartbeat.py — empty-URL no-op, success path, error swallowed
  • test_main.py — heartbeat interval gate (due / skip / due-again), no-op when URL unset; session-check notifies only the logged-out provider, and is a no-op when disabled (default)
  • test_config.py — the three new vars parse with defaults + overrides

Not in this PR (follow-up)

verify-order (3.1) — read-only order-history scrape to resolve needs_review, on its own branch since its selectors require live bring-up (dump-dom) before they can be trusted.

🤖 Generated with Claude Code


Generated by Claude Code

claude added 2 commits June 22, 2026 05:25
…s + docs

Operational hardening (PR 3 of the plan; verify-order deferred to its own PR):

- Dead-man's-switch heartbeat (heartbeat.ping + Engine._heartbeat_tick): the
  worker pings ROOMIEORDER_HEARTBEAT_URL on a timer
  (ROOMIEORDER_HEARTBEAT_INTERVAL_SECONDS, default 300), so a wedged worker
  thread — which systemd Restart=on-failure can't see — stops the pings and an
  external monitor alerts. The URL is the only coupling, so it works with hosted
  Healthchecks.io, a self-hosted open-source Healthchecks instance, Uptime Kuma
  push, etc. The tick runs before the loop's pause/idle continues, so a
  paused-but-alive loop still reports liveness. No-op when unset; best-effort.

- Proactive session-freshness probe (Engine._session_check_tick): every
  ROOMIEORDER_SESSION_CHECK_HOURS (default 0 = off) the worker relaunches each
  present store profile read-only via the buy flow's verify_session and notifies
  if it reloads logged out — catching an expired session before a real order
  hits the sign-in wall. Runs on the worker thread between claims (never overlaps
  a buy), per-provider best-effort. Adds a shared purchase.build_purchaser
  factory (cli._purchaser_for now delegates to it).

- Docs: README status enum synced (adds needs_review/blocked/spend_capped/
  unavailable/skipped_debounce), `catalog show` → `catalog`, a Health-monitoring
  section, and a state backup/restore note; env.example documents the new vars;
  AGENTS.md status list adds `blocked`.

Tests cover ping (empty/success/error-swallowed), the heartbeat interval gate,
session-check notify-on-logged-out / disabled-by-default, and the new config vars.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01J322vufhqKDvdJmukTZdkq
The nix pre-commit-check runs mypy across all files, including tests, which the
local `mypy src` run didn't cover. Fixes:

- test_heartbeat.py: monkeypatch httpx via the string target
  ("roomieorder.heartbeat.httpx.get") instead of the heartbeat.httpx attribute,
  which mypy flags as a non-exported re-import [attr-defined].
- test_main.py: replace `lambda url: pings.append(url) or True` with named
  helpers (list.append returns None → [func-returns-value]); drop the now-unused
  `# type: ignore[assignment]` on the recording-notifier assignment.

Behavior unchanged; pytest/ruff/mypy(src+tests) all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01J322vufhqKDvdJmukTZdkq
@Multipixelone Multipixelone merged commit 9becc97 into main Jun 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants