Fix: don't mark connections unhealthy due to a shared destination#434
Draft
chrisdoehring wants to merge 3 commits into
Draft
Fix: don't mark connections unhealthy due to a shared destination#434chrisdoehring wants to merge 3 commits into
chrisdoehring wants to merge 3 commits into
Conversation
A Connection's health was derived from the stored status of its provider AND all of its destinations. Because a destination integration is shared across every connection that routes to it, an unhealthy destination marked ALL of its connections "Unhealthy" — even connections whose own provider had no errors. This produced connections shown as Unhealthy in the portal with no error activity logs of their own (the destination's errors live under the destination, not the connection). Connection-level delivery failures are already attributed to the provider, so a genuinely failing connection still surfaces as Unhealthy. A healthy provider with an unhealthy (or disabled) shared destination is now surfaced as "Needs review" instead of "Unhealthy", in both: - ConnectionRetrieveSerializer.get_status (live UI status) - filter_connections_by_status (status filter + unhealthy-connections email) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Provides an on-demand way to run the same health calculation as the hourly "Calculate Integration Statuses" beat task. Recalculates all integrations by default, or specific ones via --integration-id (repeatable). Runs inline and reports each resulting status; --async enqueues the Celery task instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
378a558 to
4f97a39
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a draft/rfc feature because it changes the app behavior. It highlights the need for health statuses at a more granular level (ex. destination unhealthy rather than connection unhealthy).
Problem
Several connections show as Unhealthy in the Gundi portal even though they have no error activity logs of their own within the last few hours.
Root cause
A Connection's health is derived live from the stored
IntegrationStatusof its provider and all of its destinations (ConnectionRetrieveSerializer.get_status, and the queryset equivalentfilter_connections_by_status).A destination integration is shared across every connection that routes to it (
Integration.destinationsis an M2M through routing rules). So when one destination is markedUNHEALTHY— typically from its own dispatcher/custom-log errors — every connection routing to that destination was reported asUNHEALTHY, regardless of whether that connection's own provider had any errors.From the operator's point of view this looks broken: the connection is Unhealthy but its activity log is clean, because the errors live under the shared destination, not the connection.
Note: connection-level delivery failures are already attributed to the provider (
event_consumers/dispatcher_events_consumer.py), so a genuinely failing connection still surfaces as Unhealthy via its provider status. The destination's aggregate health was the source of the false positives.Fix
A healthy provider with an unhealthy or disabled shared destination is now surfaced as Needs review instead of Unhealthy, consistently in both code paths:
ConnectionRetrieveSerializer.get_status— the live status shown in the UIfilter_connections_by_status— the?status=filter and the daily unhealthy-connections emailA connection is reported
UNHEALTHYonly when its own provider is unhealthy.New management command
Adds
recalculate_integration_statuses— an on-demand way to run the same health calculation as the hourly "Calculate Integration Statuses" beat task (previously only triggerable via the schedule or an enable/disable save; there was no management command for it).Unknown ids are reported to stderr rather than silently doing nothing. Useful for ops/debugging and for forcing a recalc after this fix lands.
Tests
integrations/tests/test_connection_status_derivation.py— function-level coverage of bothfilter_connections_by_statusandget_statusfor the shared-destination case.test_filter_connections_by_status_unhealthy/needs_review_as_superuserto the new buckets, plus a new HTTP-level assertion on the serializedstatusfield.integrations/tests/test_commands.pycovering the management command's sync path (stale status recomputed tounhealthyfrom error logs) and--asyncpath (enqueues the Celery task).test_email_alerts.pyandtest_calc_integration_status.pyremain green (17 passed).🤖 Generated with Claude Code