Skip to content

Fix #1145: Webhook (Teams/Slack) alerts re-fire after an app restart#1147

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/1145-restart-resend
Jun 18, 2026
Merged

Fix #1145: Webhook (Teams/Slack) alerts re-fire after an app restart#1147
erikdarlingdata merged 1 commit into
devfrom
feature/1145-restart-resend

Conversation

@erikdarlingdata

Copy link
Copy Markdown
Owner

Closes #1145.

Problem

#981 added restart-dedup for the email channel only. A restart cleared the two guards that suppress a webhook re-send, so reopening Lite re-posted a Teams/Slack alert already delivered before the restart — identical Dedup Key and Occurrences. For a webhook-only deployment there was zero restart protection.

Two independent guards must fail together: the in-memory edge-trigger watermark (#1091) resets to 0 on restart, making the gate fire; the in-memory webhook cooldown is empty, letting the post through.

Fix

1. Webhook cooldown seed — shared, BOTH apps.
WebhookAlertService now seeds its per-(serverId, metricName) cooldown from alert history on first use, mirroring the email seed (#981), via a new IAlertHistoryStore.GetLastWebhookSentUtcAsync. Lite filters notification_type IN ('webhook','email+webhook'); Dashboard filters NotificationType == "webhook". send_error is not filtered on — it tracks the email channel, so an email-failed-but-webhook-sent row must still seed. Wired into the WebhookAlertService DI in both MainWindows.

2. Edge-trigger watermark persistence — Lite.
The rolling-count gate's in-memory watermark (_lastAlertedBlockingCount / _lastAlertedDeadlockCount) reset to 0 on restart, so the first post-restart sweep re-fired for events still in the 1-hour lookback. Because that gap can exceed the cooldown (gotqn's repro was 17 min > 15-min cooldown), the time-bounded seed alone does not cover it — only persisting the watermark does. The watermark now persists to a new config_edge_trigger_watermarks DuckDB table (upsert on change), seeded before the first sweep at startup.

Why Dashboard needs no watermark persistence (parity rationale)

Dashboard's deadlock gate re-baselines on restart (raw delta) or is 5-minute-windowed (FilteredDeadlockCount, always within the cooldown the seed now covers); blocking is level + cooldown (re-fires on its normal cadence, now bounded by the seeded cooldown). None produces the byte-identical duplicate the Lite edge-trigger gate does. The webhook cooldown seed (part 1) is the shared half and applies to both apps.

Tests

  • Lite 505 + Dashboard 487 green; 0 warnings.
  • New: webhook-row history filter (GetLastWebhookSentUtc_FiltersToWebhookRows_IncludingEmailWebhook); watermark save/load/upsert round-trip (EdgeTriggerWatermark_SaveLoad_RoundTripsAndUpserts); WebhookAlertService seed-suppresses-within-window / seed-older-than-cooldown-does-not-suppress / null-store-attempts.

🤖 Generated with Claude Code

#981 added restart-dedup for the email channel only; a restart cleared the
two guards that suppress a webhook re-send, so reopening Lite re-posted a
Teams/Slack alert already delivered before the restart (identical Dedup Key
and Occurrences). Two-part fix:

1. Webhook cooldown seed (shared, BOTH apps). WebhookAlertService now seeds
   its per-(serverId, metricName) cooldown from alert history on first use,
   mirroring the email seed, via a new IAlertHistoryStore.GetLastWebhookSentUtcAsync.
   Lite filters notification_type IN ('webhook','email+webhook'); Dashboard
   filters NotificationType == "webhook". send_error is NOT filtered on -- it
   tracks the email channel, so an email-failed-but-webhook-sent row must still
   seed. Wired into the WebhookAlertService DI in both MainWindows.

2. Edge-trigger watermark persistence (Lite). The rolling-count gate's
   in-memory watermark (#1091) reset to 0 on restart, so the first sweep
   re-fired for events still in the 1-hour lookback -- and because that gap can
   exceed the cooldown, the seed alone (time-bounded) does not cover it. The
   watermark now persists to a new config_edge_trigger_watermarks DuckDB table
   (upsert on change), seeded before the first sweep at startup.

Dashboard needs no watermark persistence: its deadlock gate re-baselines on
restart (raw delta) or is 5-min-windowed (always within the cooldown the seed
now covers), and blocking is level+cooldown -- none produce the byte-identical
duplicate the Lite edge-trigger gate does.

Tests: Lite 505 + Dashboard 487 green. New: webhook-row history filter +
watermark save/load/upsert round-trips + WebhookAlertService seed-suppresses /
seed-older-than-cooldown-does-not / null-store-attempts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 8ae9dbf into dev Jun 18, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/1145-restart-resend branch June 18, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant