Retry transient request failures with a short per-attempt timeout#164
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #164 +/- ##
==========================================
- Coverage 99.87% 99.87% -0.01%
==========================================
Files 12 12
Lines 792 791 -1
==========================================
- Hits 791 790 -1
Misses 1 1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
kellerza
reviewed
Jun 28, 2026
94e480a to
06b3263
Compare
_request_json retried only ServerDisconnectedError; asyncio.TimeoutError and ClientError raised SmaConnectionException immediately on a single 15s request. Retry all transient connection failures with a fresh connection, and make DEFAULT_TIMEOUT per-attempt (5s) with DEFAULT_REQUEST_RETRIES (3), keeping total request time about 15s across 3 attempts. Measured on a real inverter on a 64-86% packet-loss link, per-poll success rose from 44% to 72%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
06b3263 to
2d1c389
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
_request_jsonretries onlyServerDisconnectedError. Anasyncio.TimeoutErrororClientError(the usual symptom of packet loss) raisesSmaConnectionExceptionimmediately with no retry, on a single request with a 15s timeout. On a lossy network a single long attempt often fails outright, so Home Assistant marks the whole device unavailable on one missed poll and the entities flap (cycling available and unavailable every minute or two), even though most polls would succeed on a quick retry.Change
Retry all transient connection failures (timeouts, disconnects, generic client errors) with a fresh connection, not just
ServerDisconnectedError.DEFAULT_TIMEOUTbecomes a per-attempt timeout (5s) withDEFAULT_REQUEST_RETRIES(3) attempts, so the total request time stays about 15s, now spread over 3 independent attempts instead of 1. A fresh short attempt resets TCP backoff and samples the link several times, which recovers better than one long attempt whose stuck connection keeps retransmitting.Evidence
Measured against a real SMA inverter on a WiFi link with 64 to 86% packet loss, 18 poll cycles on the identical link:
Retry roughly halves the flapping. The residual failures are multi-second dead bursts longer than any timeout budget, which the client cannot fix. A separate live test in Home Assistant showed about 45% fewer dropouts at the same poll interval.
Notes
The exact values (5s and 3 attempts) keep the original 15s worst-case budget and are easy to tune.
DEFAULT_TIMEOUTchanges meaning from a single request timeout to a per-attempt timeout; I can add a separate constant instead if you prefer to keep its semantics. The previous "Server at ... disconnected N times." message is folded into the existing "Could not connect to SMA at ...: ..." message. Existing tests pass unchanged (25 passed) and ruff format and ruff check are clean.