Which component(s) does this affect?
Problem Statement
I'd like to be able to monitor the health of availability group replicas, with alerting should the secondaries fall significantly behind the primary, so we know if we need to use the replica it is not significantly out of date and can fail-over safely if required
Proposed Solution
additional collector querying availability group dms collecting stats like redo_queue_size, redo_rate, log_send_queue_size, log_send_rate, synchronization_state etc, plus visualisation in the dashboard and alerting if the secondaries fall significantly behind the primary
Use Case
Useful for anyone using always on availability groups for HA, to allow them to monitor the health of the availability group and to identify issues, such as the secondaries falling behind during periods of heavy usage, so that I can identify and resolve them.
Alternatives Considered
No response
Additional Context
No response
Which component(s) does this affect?
Problem Statement
I'd like to be able to monitor the health of availability group replicas, with alerting should the secondaries fall significantly behind the primary, so we know if we need to use the replica it is not significantly out of date and can fail-over safely if required
Proposed Solution
additional collector querying availability group dms collecting stats like redo_queue_size, redo_rate, log_send_queue_size, log_send_rate, synchronization_state etc, plus visualisation in the dashboard and alerting if the secondaries fall significantly behind the primary
Use Case
Useful for anyone using always on availability groups for HA, to allow them to monitor the health of the availability group and to identify issues, such as the secondaries falling behind during periods of heavy usage, so that I can identify and resolve them.
Alternatives Considered
No response
Additional Context
No response