reliability improvements#11
Merged
Merged
Conversation
Multipixelone
commented
Jun 23, 2026
Owner
- test(cli): widen pending-ping window so snooze tests survive morning runs
- feat(reliability): re-fire actionable pings after transient send failure
- feat(reliability): retry transient LLM failures
- feat(reliability): surface calendar auth failure in morning digest
- feat(reliability): cache routes + fall back when live routing is down
- feat(observability): heartbeat dead-man's-switch for the poll timer
- fix(venues): replace char-bag fuzzy match with edit-distance + digit guard
- test(routing): lock in transfer counting across walking transfers
- fix(mta): match alert routes on whole line codes, not substrings
- fix(mta): carry structured stop names instead of re-parsing the summary
- chore(cli): dedup LLM client, warn on inert scheduling keys, english strings
- docs: fix broken plan.md link, refresh command list, single-source version
- chore(store,ci): stamp schema version; type-check tests in CI
- ci: add coverage gate (pytest-cov, 80% floor)
- feat(weather): pad departure buffer when rain/snow is forecast
Previously a claimed ping was marked fired before the network send, and a failed send was logged but never retried -- a transient Telegram/relay blip silently dropped the one notification that matters (the leave/prep alarm). Keep the atomic cross-process claim, but on send failure of a prep/leave ping hand the row back (release_ping: 1->0, bump send_attempts) so the next poll re-attempts. Bounded by a grace window (no stale alarms) and an attempt cap (no storm from a persistently-broken notifier).
geocode and MTA fetches already retry transient HTTP errors; the opencode-go client did not, so a single 5xx/timeout dropped a location resolution or alert classification for that run. Wrap the chat-completion request in retry() (one retry, transient-only). Also collapses the duplicate _call/_chat_completion network logic into one path.
An expired/invalid OAuth refresh token was caught by the generic except and degraded to 'no events' — the user saw an empty day with no hint to re-auth. Catch AuthError distinctly, set an auth_failed flag, and prepend a loud 're-run commutecompass oauth' note to the digest's Operations footer (which is sent even when there are zero events).
Previously a Directions outage (or missing key) produced no route -> no plan -> no alarms for the entire day, and every plan hit the API afresh. Now a successful route is cached per (origin, dest, mode); when live routing fails the planner reuses the last good cached route, then a coarse haversine distance/speed estimate, before giving up with no_route. Fallback routes are flagged approximate and labelled '(estimated)' in the digest so timing is known to be best-effort.
A self-hosted alarm fails silently if the per-minute poll timer dies — the user just stops getting notifications. Record a per-job heartbeat in SQLite; the morning digest now warns when the poll loop has gone stale (timer dead = no alarms today). Add an optional [monitoring].heartbeat_url (healthchecks.io-style) that poll/morning ping on success for an off-host dead-man's-switch.
…guard The fuzzy matcher built set(collapsed_input) — a set of individual characters — so it compared character bags: anagrams matched perfectly and 'studio100' vs 'studio200' (different rooms) nearly matched. Use rapidfuzz edit-distance on the collapsed strings (order-sensitive), gated on digit runs matching exactly so room/studio numbers can't collide.
Documents that transit->walk->transit counts as one transfer (the WALKING branch intentionally does not reset the prev-transit flag); a reported bug here was a misread of the branch.
Substring matching made affected route '1' match bus lines 'B41'/'M15' and any line containing the digit. Match case-insensitively against the leg line and its alphanumeric tokens instead, so 'C' still matches a decorated 'C-local' while '1' no longer matches 'B41'.
_build_route_context recovered boarding/alighting stops by splitting the human
'{line} from {dep} to {arr}' summary on ' from '/' to '/' and ', which shredded
stop names that themselves contain those words. Add departure_stop/arrival_stop
fields to TransitLeg (populated in routing), and read them directly.
…strings
- morning built OpencodeGoClient twice; reuse the planning client for alerts.
- config set scheduling.{morning_run_time,poll_interval_seconds} is inert under
systemd/cron (an external scheduler drives the jobs); warn so a chat user
isn't misled into thinking it changed the schedule.
- replace the lone Chinese oauth-success string with English.
…rsion README linked to a nonexistent plan.md (now AGENTS.md) and omitted status, snooze/mute/unmute/undo, geocode-cache, mta-alerts, and config unset/reset. package.nix now reads version from pyproject.toml so it can't drift.
Add a SCHEMA_VERSION stamped into PRAGMA user_version so future migrations have a version to branch on instead of probing every table (existing idempotent column-adds are unchanged). Extend CI's mypy step to cover tests/ too.
Add pytest-cov to the dev shell and enforce a branch-coverage floor of 80% in CI (current coverage ~84%). The floor catches large regressions without tripping on normal PRs. A NixOS-module integration test for the systemd timers remains a follow-up — it needs CI/VM iteration to land safely.
Adds an optional [weather] block (Open-Meteo, keyless): when precipitation is
likely around the commute, extra minutes are folded into the buffer so the alarm
fires earlier, and the digest shows a per-event note ('+N min for rain'). Weather
failures are swallowed — a forecast blip never breaks a plan. Disabled by
default.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.