Feat/multi region replication monitoring by TheCreatorNode · Pull Request #715 · SoroScan/soroscan

TheCreatorNode · 2026-06-01T12:06:28Z

fixes #537

- Create prune_events command to delete old ContractEvents - Support configurable retention period via --retention-days or EVENT_RETENTION_DAYS setting - Include --dry-run flag to preview deletions without executing - Add comprehensive test suite covering all functionality - Log number of deleted records with colored output Fixes SoroScan#366

## Description Implements comprehensive replication lag monitoring for multi-region database deployments. ## Changes ### Core Implementation - **soroscan/ingest/replication.py**: New ReplicationLagMonitor class with: - LSN-based lag measurement (fast method) - Write-test lag measurement (accurate method) - Replica health status checks - Configurable threshold-based alerting ### Prometheus Metrics - soroscan_replication_lag_seconds: Current lag in seconds (Gauge) - soroscan_replication_lag_checks_total: Total checks performed (Counter) - soroscan_replication_status: Health status 1=healthy, 0=unhealthy (Gauge) - soroscan_replication_alerts_total: Total alerts triggered (Counter) ### Celery Tasks - monitor_replication_lag(): Periodic lag measurement and alerting - check_replica_health(): Comprehensive replica status validation ### CLI Tool - check_replication_lag command with: - Single check or continuous monitoring modes - Configurable check intervals - LSN or write-test measurement methods ### Monitoring & Alerting - Grafana dashboard panels for lag visualization - Alert rules for warning/critical thresholds - Health status monitoring - Check failure detection ### Documentation - Comprehensive configuration guide - Usage instructions - Troubleshooting guide - Performance impact analysis ## Acceptance Criteria ✅ - [x] Replication lag measured - [x] Metrics exported to Prometheus - [x] Dashboard shows lag over time - [x] Alerts on lag > threshold ## Configuration Set environment variables: - REPLICA_DB_ALIAS: Database alias for replica connection - REPLICATION_LAG_THRESHOLD_SECONDS: Warning threshold (default: 5s) - REPLICATION_LAG_ALERT_THRESHOLD_SECONDS: Critical threshold (default: 10s) - REGION_NAME: Region identifier for multi-region deployments ## Testing - Run single check: python manage.py check_replication_lag - Run continuous: python manage.py check_replication_lag --continuous - View metrics: curl http://localhost:8000/metrics | grep replication

drips-wave · 2026-06-01T12:06:38Z

@TheCreatorNode Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

TheCreatorNode added 2 commits April 27, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/multi region replication monitoring#715

Feat/multi region replication monitoring#715
TheCreatorNode wants to merge 2 commits into
SoroScan:mainfrom
TheCreatorNode:feat/multi-region-replication-monitoring

TheCreatorNode commented Jun 1, 2026

Uh oh!

drips-wave Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TheCreatorNode commented Jun 1, 2026

Uh oh!

drips-wave Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant