Skip to content

feat(routing): automatic fallback to OSRM when Valhalla is unreachable #190

@jasoneplumb

Description

@jasoneplumb

Problem

Depends on #189 (provider abstraction + OSRM client).

Once both Valhalla and OSRM backends exist, we still need a policy for when to use each. Multi-day Valhalla outages (the current FOSSGIS outage is the proximate trigger) shouldn't require a code change to recover — the app should detect the failure and fall back transparently.

A naive per-request fallback (try Valhalla → on error, try OSRM) pays the Valhalla connect-timeout on every routing request during an outage: every startGuidance, every setGuidanceCosting, and every off-route recalc fired by maybeRecalc(). Over a 30-minute drive that adds up to noticeable latency before each maneuver-recalc lands. We need circuit-breaking so the first failure flips the app to OSRM and subsequent requests skip Valhalla until the breaker resets.

Expected

  • Circuit breaker: a session-scoped flag in routing.ts. First network-level Valhalla failure (the TypeError-equivalent path that already maps to "routing service unavailable" in routing.ts:121-130) flips the breaker open and routes subsequent calls straight to OSRM. HTTP 4xx/5xx from Valhalla itself do not trip the breaker (those are per-request issues, not service-down signals).
  • Half-open probe: after N minutes (e.g. 10) of open state, the next call probes Valhalla once; success closes the breaker, failure resets the timer.
  • Reset on page load — no persistence across sessions; treat each session as fresh.
  • AbortError: still propagates unchanged from whichever provider was active. The breaker logic must not swallow it.
  • Both providers down: surface the existing routing-unavailable dialog from fix(guidance): show routing failures in a dialog, not an obscured toast #188 with copy that names the providers tried, e.g. "Routing services (Valhalla, OSRM) are both unreachable.".
  • UI signal: small visual indicator in the guidance pill or near the costing chip when the active provider is OSRM (fallback) — distinguishes "service degraded but routing works" from "service down". Exact placement is a design call during implementation.
  • State exposure: AppState.guidance gains a routingProvider: 'valhalla' | 'osrm' | null field so guidance.ts can render the indicator and the dialog can name the right provider.

Scope

  • src/routing.ts — orchestration layer above the two per-provider clients from feat(routing): add OSRM as an alternate routing backend behind a provider abstraction #189: breaker state, half-open probe timing, error aggregation, AbortError pass-through.
  • src/routing.test.ts — cover the breaker transitions: closed → open on network failure, open stays open during cooldown, open → half-open after timeout, half-open → closed on success / → open on failure.
  • src/types.ts — add routingProvider to the guidance slice of AppState.
  • src/guidance.ts — set state.guidance.routingProvider from the Route result or returned provider id.
  • src/style.css + render hook in guidance.ts — small fallback indicator (a single chip / dot / label).

Out of scope

  • Manual provider toggle in the UI (could be v2 if users want to pin to one provider, e.g. for known-bad routes on the OSRM bike profile).
  • Persistent breaker state across sessions (cookie / localStorage).
  • Multi-provider quorum or comparison.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important — fix this cycleenhancementNew feature or requestwebmap.devwebmap.dev app issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions