fix(retry-budget): close 7 audit findings (0.8.3)#34
Merged
Conversation
Pairs the 3 RetryBudget-cluster code fixes (deposit per-request hoist, math.ceil ceiling, Retry-After give-up when > max_delay) with the 2 chunk-3 test rewrites that should have caught them, plus the 2 adjacent retry nits (streaming-note scoping, RuntimeError → TransportError via is_closed). Decisions: deposit() hoists above the loop, math.ceil ceiling, Retry-After > max_delay raises the underlying StatusError with PEP 678 note rather than silently capping. All behavioral changes documented in 0.8.3 release notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each task pairs a failing test (or test rewrite that pins the buggy behavior) with the production fix that satisfies it. Task 1 = math.ceil + budget property test rewrite; Task 2 = deposit() per-request hoist; Task 3 = Retry-After give-up (rewrites the existing capped test); Task 4 = budget-exhaustion property test; Task 5 = streaming-note scoping; Task 6 = is_closed mapping; Task 7 = release notes + PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`RetryBudget.try_withdraw`'s ceiling computed `int(deposits * percent) + floor`, truncating so 4 deposits at percent_can_retry=0.2 yielded `int(0.8) = 0` — the configured percentage was never reached at deposit counts below `1/percent_can_retry`. Replace with math.ceil so the threshold is hit at the first deposit-count where it is mathematically expressible. The Hypothesis property test mirrored the same int() formula, so it could not detect the off-by-one. Rewrite it to use math.ceil for the expected ceiling and tighten `permitted <= ceiling` to `permitted == ceiling` so it discriminates between the two roundings going forward. Also update test_budget.py::test_deposit_after_exhaustion_does_not_immediately_unblock, which was written to document the old truncation behaviour; the test is renamed and rewritten to document the correct ceil semantics. Closes audit Low findings (budget.py:67 truncation, test_budget_props.py:52 formula-mirror) from planning/audit/2026-06-07-deep-audit.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`AsyncRetry.__call__` and `Retry.__call__` deposited a token inside the per-attempt for-loop. A request that retried twice contributed three deposits and two withdrawals — inflating the budget's deposits-to- withdrawals ratio by `(attempts-1)/attempts` and letting through more retries than `percent_can_retry` should permit. Move `self.budget.deposit()` immediately above the loop in both worlds so the Finagle contract holds: deposits count original requests; withdrawals count retries. Behavioral impact: tighter retry pacing under load — users with active retry traffic see the budget refuse retries earlier than before. This is the documented contract; the previous behavior was the bug. Closes audit Low finding (retry.py:105 + :236) from planning/audit/2026-06-07-deep-audit.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously when `respect_retry_after=True` and the server sent `Retry-After: 120` while the client had `max_delay=5`, AsyncRetry/Retry silently clamped to 5s and retried — almost certainly hitting the same 503/429 and burning an attempt. The option name implied the header was honored. New behavior: when the header value exceeds max_delay, re-raise the last observed StatusError with a PEP 678 note `httpware: Retry-After (Ns) exceeded max_delay (Ms); giving up`. Same treatment in both AsyncRetry and Retry. `respect_retry_after=False` and raising `max_delay` to accommodate the header are the opt-outs. Replaces the existing `test_retry_after_capped_at_max_delay` (which pinned the old silent-cap) with two tests: the give-up assertion at Retry-After > max_delay, and the boundary case at Retry-After == max_delay which still retries. Sync mirrors added. Also updates `test_retry_after_seconds_honored` (sync) which implicitly tested clamping and now tests the within-budget path instead. docs/resilience.md's respect_retry_after row describes the new give-up behavior. Closes audit Low finding (retry.py:189 + :320) from planning/audit/2026-06-07-deep-audit.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing AsyncRetry property tests build budgets with `min_retries_per_sec=1000.0` — floor of 60_000 retries permitted unconditionally — so the budget-exhaustion path is never exercised by Hypothesis. Add a focused property test on RetryBudget directly that configures `min_retries_per_sec=0.0` and asserts: exactly ceiling permits succeed, then the next one fails. Pairs with the math.ceil fix in commit 1 of this branch — the same ceiling formula is now the regression guard for both production and tests. Closes audit Nit finding (test_retry_props.py:60) from planning/audit/2026-06-07-deep-audit.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The early-out branch for method ineligibility AND non-retryable status also attached the streaming-body refusal note whenever `retryable_status and request.extensions.get(STREAMING_BODY_MARKER)` — telling users their stream couldn't replay when the actual reason for not retrying was method exclusion (e.g. POST not in retry_methods). The same misrouting existed on the NetworkError/TimeoutError arm. Drop the misleading add_note() inside the early-out branches in both AsyncRetry and Retry. The note still fires at the existing dedicated streaming-refusal site (the `# ---- retryable failure path` block) where streaming is the actual blocker. Also update existing tests that exercised POST (method-ineligible) to use PUT (method-eligible) so streaming IS the actual reason for refusal, and flip the two non-idempotent sync tests to assert the note is NOT attached. Closes audit Nit finding (retry.py:111 + :242/:249) from planning/audit/2026-06-07-deep-audit.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…string Both AsyncClient._terminal and Client._terminal mapped RuntimeError to TransportError by checking `"closed" in str(exc)`. This conflates two unrelated conditions: any RuntimeError whose message happens to contain "closed" (from any transport layer, plugin, or future httpx2 wording change) was mis-classified as TransportError; conversely, if httpx2 reworded its closed-client error to "shut down" or "disposed", the mapping would silently break. Replace with `self._httpx2_client.is_closed` — the same public attribute already used elsewhere in client.py for the borrowed-client teardown guards (lines 774, 784, 870, 880). Message-independent. Tests pin both directions: RuntimeError raised while the client is open propagates as-is (even when its message contains "closed"); RuntimeError raised after close (real httpx2 case) maps to TransportError. Closes audit Nit finding (client.py:135 + :852) from planning/audit/2026-06-07-deep-audit.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… robustness Three behavioral changes (deposit per-request, ceiling math.ceil, Retry-After give-up), one mapping cleanup (RuntimeError → TransportError via is_closed), one diagnostic fix (streaming-note scoping), plus two non-user-visible test improvements. Closes 7 audit findings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes 7 of the 35 deep-audit findings — the RetryBudget cross-cutting cluster plus 2 adjacent retry-surface nits. See
planning/specs/2026-06-08-retry-budget-cluster-design.mdfor the design andplanning/releases/0.8.3.mdfor the user-facing release notes.Behavioral changes (read these)
RetryBudgetdeposits once per request, not per attempt. Tighter retry pacing under load.RetryBudgetceiling usesmath.ceil. No more silent off-by-one againstpercent_can_retry.Retry-After > max_delayraises with a PEP 678 note instead of silently capping.RuntimeError → TransportErrorviais_closed. Message-independent mapping.Audit findings closed
retry.py:105+:236retry.py:189+:320budget.py:67test_budget_props.py:52test_retry_props.py:60retry.py:111+:242/:249client.py:135+:852Test plan
just lint-cigreen after each commitRelease
Tag
0.8.3from the merge SHA after this PR lands.🤖 Generated with Claude Code