Skip to content

reliability improvements#11

Merged
Multipixelone merged 15 commits into
mainfrom
reliability-improvements
Jun 23, 2026
Merged

reliability improvements#11
Multipixelone merged 15 commits into
mainfrom
reliability-improvements

Conversation

@Multipixelone

Copy link
Copy Markdown
Owner
  • test(cli): widen pending-ping window so snooze tests survive morning runs
  • feat(reliability): re-fire actionable pings after transient send failure
  • feat(reliability): retry transient LLM failures
  • feat(reliability): surface calendar auth failure in morning digest
  • feat(reliability): cache routes + fall back when live routing is down
  • feat(observability): heartbeat dead-man's-switch for the poll timer
  • fix(venues): replace char-bag fuzzy match with edit-distance + digit guard
  • test(routing): lock in transfer counting across walking transfers
  • fix(mta): match alert routes on whole line codes, not substrings
  • fix(mta): carry structured stop names instead of re-parsing the summary
  • chore(cli): dedup LLM client, warn on inert scheduling keys, english strings
  • docs: fix broken plan.md link, refresh command list, single-source version
  • chore(store,ci): stamp schema version; type-check tests in CI
  • ci: add coverage gate (pytest-cov, 80% floor)
  • feat(weather): pad departure buffer when rain/snow is forecast

Previously a claimed ping was marked fired before the network send, and a
failed send was logged but never retried -- a transient Telegram/relay blip
silently dropped the one notification that matters (the leave/prep alarm).

Keep the atomic cross-process claim, but on send failure of a prep/leave ping
hand the row back (release_ping: 1->0, bump send_attempts) so the next poll
re-attempts. Bounded by a grace window (no stale alarms) and an attempt cap
(no storm from a persistently-broken notifier).
geocode and MTA fetches already retry transient HTTP errors; the opencode-go
client did not, so a single 5xx/timeout dropped a location resolution or alert
classification for that run. Wrap the chat-completion request in retry() (one
retry, transient-only). Also collapses the duplicate _call/_chat_completion
network logic into one path.
An expired/invalid OAuth refresh token was caught by the generic except and
degraded to 'no events' — the user saw an empty day with no hint to re-auth.
Catch AuthError distinctly, set an auth_failed flag, and prepend a loud
're-run commutecompass oauth' note to the digest's Operations footer (which is
sent even when there are zero events).
Previously a Directions outage (or missing key) produced no route -> no plan ->
no alarms for the entire day, and every plan hit the API afresh. Now a
successful route is cached per (origin, dest, mode); when live routing fails the
planner reuses the last good cached route, then a coarse haversine distance/speed
estimate, before giving up with no_route. Fallback routes are flagged
approximate and labelled '(estimated)' in the digest so timing is known to be
best-effort.
A self-hosted alarm fails silently if the per-minute poll timer dies — the user
just stops getting notifications. Record a per-job heartbeat in SQLite; the
morning digest now warns when the poll loop has gone stale (timer dead = no
alarms today). Add an optional [monitoring].heartbeat_url (healthchecks.io-style)
that poll/morning ping on success for an off-host dead-man's-switch.
…guard

The fuzzy matcher built set(collapsed_input) — a set of individual characters —
so it compared character bags: anagrams matched perfectly and 'studio100' vs
'studio200' (different rooms) nearly matched. Use rapidfuzz edit-distance on the
collapsed strings (order-sensitive), gated on digit runs matching exactly so
room/studio numbers can't collide.
Documents that transit->walk->transit counts as one transfer (the WALKING branch
intentionally does not reset the prev-transit flag); a reported bug here was a
misread of the branch.
Substring matching made affected route '1' match bus lines 'B41'/'M15' and any
line containing the digit. Match case-insensitively against the leg line and its
alphanumeric tokens instead, so 'C' still matches a decorated 'C-local' while
'1' no longer matches 'B41'.
_build_route_context recovered boarding/alighting stops by splitting the human
'{line} from {dep} to {arr}' summary on ' from '/' to '/' and ', which shredded
stop names that themselves contain those words. Add departure_stop/arrival_stop
fields to TransitLeg (populated in routing), and read them directly.
…strings

- morning built OpencodeGoClient twice; reuse the planning client for alerts.
- config set scheduling.{morning_run_time,poll_interval_seconds} is inert under
  systemd/cron (an external scheduler drives the jobs); warn so a chat user
  isn't misled into thinking it changed the schedule.
- replace the lone Chinese oauth-success string with English.
…rsion

README linked to a nonexistent plan.md (now AGENTS.md) and omitted status,
snooze/mute/unmute/undo, geocode-cache, mta-alerts, and config unset/reset.
package.nix now reads version from pyproject.toml so it can't drift.
Add a SCHEMA_VERSION stamped into PRAGMA user_version so future migrations have
a version to branch on instead of probing every table (existing idempotent
column-adds are unchanged). Extend CI's mypy step to cover tests/ too.
Add pytest-cov to the dev shell and enforce a branch-coverage floor of 80% in CI
(current coverage ~84%). The floor catches large regressions without tripping on
normal PRs. A NixOS-module integration test for the systemd timers remains a
follow-up — it needs CI/VM iteration to land safely.
Adds an optional [weather] block (Open-Meteo, keyless): when precipitation is
likely around the commute, extra minutes are folded into the buffer so the alarm
fires earlier, and the digest shows a per-event note ('+N min for rain'). Weather
failures are swallowed — a forecast blip never breaks a plan. Disabled by
default.
@Multipixelone Multipixelone merged commit 3b0d34f into main Jun 23, 2026
2 checks passed
@Multipixelone Multipixelone deleted the reliability-improvements branch June 23, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant