Description
app/models/health_check.rb lines 153–158 retry a failed check up to 3 times with no delay between attempts:
retry_times = 3
begin
runner.run!
rescue Exception => e
retry_times -= 1
retry unless retry_times == 0
end
Problems
- No delay: All 3 retries happen back-to-back in milliseconds. If the failure is a server-side 500 error, a timeout, or a flapping element, immediate retries are unlikely to succeed.
- Retries non-transient errors: Element not found, wrong URL, assertion failure—these are permanent failures that should not be retried. Retrying wastes time and inflates check_run counts.
- No jitter: If many checks fail simultaneously (site goes down), all retry together, creating a request storm.
Suggested approach
RETRYABLE_ERRORS = [Net::ReadTimeout, Net::OpenTimeout, Errno::ECONNREFUSED].freeze
begin
runner.run!
rescue *RETRYABLE_ERRORS => e
retry_times -= 1
sleep(2 ** (3 - retry_times)) # exponential backoff: 1s, 2s, 4s
retry unless retry_times == 0
rescue StandardError => e
# non-retryable: fail immediately
end
Effort: small
Description
app/models/health_check.rblines 153–158 retry a failed check up to 3 times with no delay between attempts:Problems
Suggested approach
Effort: small