Skip to content

Fix: don't mark connections unhealthy due to a shared destination#434

Draft
chrisdoehring wants to merge 3 commits into
mainfrom
gundi-fix-shared-destination-health
Draft

Fix: don't mark connections unhealthy due to a shared destination#434
chrisdoehring wants to merge 3 commits into
mainfrom
gundi-fix-shared-destination-health

Conversation

@chrisdoehring

@chrisdoehring chrisdoehring commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

This is a draft/rfc feature because it changes the app behavior. It highlights the need for health statuses at a more granular level (ex. destination unhealthy rather than connection unhealthy).

Problem

Several connections show as Unhealthy in the Gundi portal even though they have no error activity logs of their own within the last few hours.

Root cause

A Connection's health is derived live from the stored IntegrationStatus of its provider and all of its destinations (ConnectionRetrieveSerializer.get_status, and the queryset equivalent filter_connections_by_status).

A destination integration is shared across every connection that routes to it (Integration.destinations is an M2M through routing rules). So when one destination is marked UNHEALTHY — typically from its own dispatcher/custom-log errors — every connection routing to that destination was reported as UNHEALTHY, regardless of whether that connection's own provider had any errors.

From the operator's point of view this looks broken: the connection is Unhealthy but its activity log is clean, because the errors live under the shared destination, not the connection.

Note: connection-level delivery failures are already attributed to the provider (event_consumers/dispatcher_events_consumer.py), so a genuinely failing connection still surfaces as Unhealthy via its provider status. The destination's aggregate health was the source of the false positives.

Fix

A healthy provider with an unhealthy or disabled shared destination is now surfaced as Needs review instead of Unhealthy, consistently in both code paths:

  • ConnectionRetrieveSerializer.get_status — the live status shown in the UI
  • filter_connections_by_status — the ?status= filter and the daily unhealthy-connections email

A connection is reported UNHEALTHY only when its own provider is unhealthy.

Provider Destination Before After
unhealthy any unhealthy unhealthy
healthy unhealthy unhealthy needs_review
healthy disabled needs_review needs_review
healthy healthy healthy healthy

New management command

Adds recalculate_integration_statuses — an on-demand way to run the same health calculation as the hourly "Calculate Integration Statuses" beat task (previously only triggerable via the schedule or an enable/disable save; there was no management command for it).

# Recalculate every integration, synchronously, printing each result
python manage.py recalculate_integration_statuses

# Limit scope to specific integration(s) (--integration-id is repeatable)
python manage.py recalculate_integration_statuses --integration-id <uuid> --integration-id <uuid>

# Enqueue the Celery task instead of running inline
python manage.py recalculate_integration_statuses --integration-id <uuid> --async

Unknown ids are reported to stderr rather than silently doing nothing. Useful for ops/debugging and for forcing a recalc after this fix lands.

Tests

  • New integrations/tests/test_connection_status_derivation.py — function-level coverage of both filter_connections_by_status and get_status for the shared-destination case.
  • Updated test_filter_connections_by_status_unhealthy/needs_review_as_superuser to the new buckets, plus a new HTTP-level assertion on the serialized status field.
  • New tests in integrations/tests/test_commands.py covering the management command's sync path (stale status recomputed to unhealthy from error logs) and --async path (enqueues the Celery task).
  • Existing test_email_alerts.py and test_calc_integration_status.py remain green (17 passed).

🤖 Generated with Claude Code

Chris Doehring and others added 3 commits June 19, 2026 10:52
A Connection's health was derived from the stored status of its provider AND
all of its destinations. Because a destination integration is shared across
every connection that routes to it, an unhealthy destination marked ALL of its
connections "Unhealthy" — even connections whose own provider had no errors.
This produced connections shown as Unhealthy in the portal with no error
activity logs of their own (the destination's errors live under the
destination, not the connection).

Connection-level delivery failures are already attributed to the provider, so a
genuinely failing connection still surfaces as Unhealthy. A healthy provider
with an unhealthy (or disabled) shared destination is now surfaced as
"Needs review" instead of "Unhealthy", in both:
- ConnectionRetrieveSerializer.get_status (live UI status)
- filter_connections_by_status (status filter + unhealthy-connections email)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Provides an on-demand way to run the same health calculation as the hourly
"Calculate Integration Statuses" beat task. Recalculates all integrations by
default, or specific ones via --integration-id (repeatable). Runs inline and
reports each resulting status; --async enqueues the Celery task instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@chrisdoehring chrisdoehring force-pushed the gundi-fix-shared-destination-health branch from 378a558 to 4f97a39 Compare June 19, 2026 17:54
@chrisdoehring chrisdoehring marked this pull request as draft June 19, 2026 18:09

@victorlujanearthranger victorlujanearthranger left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants