Skip to content

[Critical] Wire DLQ size alerts to Slack/PagerDuty and use configurable threshold #163

@robertocarlous

Description

@robertocarlous

Summary

The dead-letter queue (DLQ) has a size threshold concept, but alerting only writes to application logs. In production, operators will not see DLQ growth until users report missing deposits/withdrawals.

Problem

DLQ_ALERT_THRESHOLD is documented in .env.example and parsed in src/config/env.ts, but src/stellar/dlq.ts uses a hardcoded constant:

const SIZE_ALERT_THRESHOLD = 50

private static checkSizeAlert(size: number): void {
  if (size >= SIZE_ALERT_THRESHOLD) {
    logger.error(`[DLQ ALERT] Dead-letter queue size is critically high: ${size} events...`)
  }
}

## Proposed solution
Replace hardcoded SIZE_ALERT_THRESHOLD with config.dlq.alertThreshold (or equivalent).
Add an AlertingService abstraction with pluggable channels:
LOG (default, always on)
SLACK_WEBHOOK_URL (optional)
PAGERDUTY_ROUTING_KEY (optional)
Emit alert payload including:
current DLQ size
count by status (PENDING, RETRIED, RESOLVED)
oldest pending event age
link to admin DLQ inspect endpoint
Add alert deduplication/cooldown (e.g. re-alert only every 15 minutes while above threshold).
Expose DLQ alert state as a Prometheus gauge (e.g. dlq_alert_active).

## Acceptance criteria

 DLQ_ALERT_THRESHOLD from env is the single source of truth
 Crossing threshold triggers at least one external notification when configured
 Alert includes actionable metadata (size, status breakdown, oldest pending age)
 Cooldown prevents duplicate alerts within configured window
 Unit tests for threshold logic and cooldown behavior
 Integration test stubs external webhook without hitting real services
 README/runbook section: “What to do when DLQ alert fires” (inspect  dry-run retry  resolve)

Metadata

Metadata

Assignees

Labels

GrantFox OSSIssue tracked in GrantFox OSSMaybe RewardedIssue may be eligible for a GrantFox rewardOfficial CampaignCampaign: Official Campaignhelp wantedExtra attention is needed

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions