Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1b5d15a
scaffold(resilience): empty subpackage for Retry + RetryBudget
lesnik512 Jun 4, 2026
5318eaf
fix(resilience): defer __init__.py re-exports until Task 7
lesnik512 Jun 5, 2026
478fa37
feat(errors): add NetworkError(TransportError) for transient httpx2 f…
lesnik512 Jun 5, 2026
08ced9c
fix(errors): correct NetworkError docstring — close, not pool
lesnik512 Jun 5, 2026
f383fc0
feat(errors): add RetryBudgetExhaustedError
lesnik512 Jun 5, 2026
ed70e0e
fix(errors): make RetryBudgetExhaustedError picklable + tighten str()…
lesnik512 Jun 5, 2026
a487e3a
feat(resilience): RetryBudget token-bucket math + tests
lesnik512 Jun 5, 2026
b19b887
docs(resilience): clarify _purge window boundary + test docstring
lesnik512 Jun 5, 2026
9ac10fe
test(resilience): Hypothesis property tests for RetryBudget
lesnik512 Jun 5, 2026
7593f1d
test(resilience): broaden property-test parameter space + self-eviden…
lesnik512 Jun 5, 2026
6325345
feat(resilience): full-jitter exponential backoff helper
lesnik512 Jun 5, 2026
68dd5f0
fix(resilience): backoff uses 2.0 ** to avoid OverflowError on large …
lesnik512 Jun 5, 2026
e2e6a78
chore: gitignore .hypothesis/ cache directory
lesnik512 Jun 5, 2026
3bf2a80
feat(resilience): Retry middleware — status-code retry + exhaustion
lesnik512 Jun 5, 2026
555bc6c
fix(resilience): close coverage gap on unreachable + wire resilience/…
lesnik512 Jun 5, 2026
7fa7a5d
fix(resilience): replace bare assert with -O-safe guard + cross-ref note
lesnik512 Jun 5, 2026
64a2671
feat(resilience): Retry — network/timeout exception retry
lesnik512 Jun 5, 2026
9aac69d
test(resilience): assert sleep count in timeout retry test (symmetry …
lesnik512 Jun 5, 2026
365cd5a
feat(resilience): Retry.attempt_timeout (wall-clock per-attempt cap)
lesnik512 Jun 5, 2026
14a7d5e
docs(resilience): clarify __cause__ assignment + pragma rationale
lesnik512 Jun 5, 2026
a222577
feat(resilience): Retry honors Retry-After header (seconds + HTTP-date)
lesnik512 Jun 5, 2026
2ee664c
fix(resilience): clamp negative integer Retry-After + assert sleep count
lesnik512 Jun 5, 2026
efe1881
test(resilience): Retry budget gate + sharing across instances
lesnik512 Jun 5, 2026
0571c51
test(resilience): Hypothesis property tests for Retry
lesnik512 Jun 5, 2026
9b604a8
feat(api): export Retry, RetryBudget, RetryBudgetExhaustedError, Netw…
lesnik512 Jun 5, 2026
a0b2881
docs(planning): track retry+budget landing + streaming deferred follo…
lesnik512 Jun 5, 2026
1884b53
docs(tests): refresh test_backoff.py docstring post-merge
lesnik512 Jun 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ __pycache__/*
.env
.pytest_cache
.ruff_cache
.hypothesis/
.coverage
htmlcov/
coverage.xml
Expand Down
4 changes: 4 additions & 0 deletions planning/deferred-work.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Items raised in reviews that are real but not actionable now.

## Open

### Retry + streaming bodies (Epic 4 interaction)

- **`Retry` re-invokes `next(request)` with the same `httpx2.Request` on each attempt.** Safe for in-memory bytes/JSON bodies; unsafe for streaming/async-iterable bodies (consumed iterator can't replay). When Epic 4 ships `AsyncClient.stream` (`4-3`), Retry needs to refuse to retry streamed-body requests (or document that callers supply a body factory). Spec: `planning/specs/2026-06-05-retry-and-retry-budget-design.md` §"Open questions".

### Decoder-side

- **`_get_adapter` `lru_cache` is module-global, not per-decoder instance** — keyed by `model` only; two `PydanticDecoder()` instances with different configurations (none today) would share adapters, and the cache survives across tests unless explicitly cleared. Revisit if/when a configurable `PydanticDecoder(mode=..., strict=...)` lands. (`src/httpware/decoders/pydantic.py:12-14`)
Expand Down
4 changes: 3 additions & 1 deletion planning/engineering.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,9 @@ Post-pivot, the roadmap has three categories. Topic slugs in `planning/specs/` a

### Surviving (land in subsequent PRs)

- **Epic 3 — Resilience:** `3-1` per-attempt timeout, `3-2` retry, `3-3` `RetryBudget`, `3-4` `RetryBudget` middleware integration, `3-5` `Bulkhead`, `3-6` extension-slot docs.
- **Epic 3 — Resilience:**
- **Shipped in v0.4 slice 1:** `Retry` middleware + Finagle-style `RetryBudget` token bucket + `attempt_timeout=` parameter (folded-in 3-1). See [`planning/specs/2026-06-05-retry-and-retry-budget-design.md`](specs/2026-06-05-retry-and-retry-budget-design.md) and [`planning/plans/2026-06-05-retry-and-retry-budget-plan.md`](plans/2026-06-05-retry-and-retry-budget-plan.md).
- **Remaining:** `3-5` `Bulkhead`, `3-6` extension-slot docs.
- **Epic 4 — Streaming:** `4-3` `AsyncClient.stream` context manager (forwards to `httpx2.AsyncClient.stream`; no `StreamResponse` type).
- **Epic 5 — Observability:** `5-1` Layer 1 middleware hooks, `5-2` wire into resilience middlewares, `5-4` OpenTelemetry middleware (`otel` extra), `5-5` logging policy CI grep.
- **Epic 6 — Ship v1.0:** `6-2` docs site (`mkdocs`), `6-3` benchmarks, `6-5` release flow (Trusted Publishers + Sigstore).
Expand Down
43 changes: 32 additions & 11 deletions planning/plans/2026-06-05-retry-and-retry-budget-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,14 @@ mkdir -p src/httpware/middleware/resilience

Then create each file with the contents below. Use the Write tool, not bash heredocs.

`src/httpware/middleware/resilience/__init__.py`:
`src/httpware/middleware/resilience/__init__.py` (docstring-only — re-exports defer to Task 7 so intermediate tasks can `import httpware.middleware.resilience.budget` without tripping an import-time `ImportError` from this `__init__.py`):
```python
"""Resilience primitives: Retry middleware and RetryBudget token bucket."""
"""Resilience primitives: Retry middleware and RetryBudget token bucket.

from httpware.middleware.resilience.budget import RetryBudget
from httpware.middleware.resilience.retry import Retry
Re-exports land in Task 7 once both classes exist; until then this file is
docstring-only so that importing ``httpware.middleware.resilience.budget``
during the intermediate tasks does not trip an import-time ``ImportError``.
"""
```

`src/httpware/middleware/resilience/budget.py`:
Expand Down Expand Up @@ -137,7 +139,11 @@ Expected: FAIL (`ImportError: cannot import name 'NetworkError'`).
Edit `src/httpware/errors.py`. Add a new class immediately after the existing `class TransportError`:
```python
class NetworkError(TransportError):
"""Transient network-layer failure (connect/read/write/pool). Safe to retry."""
"""Transient network-layer failure (connect/read/write/close). Safe to retry.

Pool-acquisition timeouts are NOT under this class; they raise ``TimeoutError``
via ``httpx2.PoolTimeout`` (a ``TimeoutException`` subclass).
"""
```

Run: `uv run pytest tests/test_errors.py::test_network_error_is_transport_error -v`
Expand Down Expand Up @@ -218,7 +224,7 @@ Becomes:

The `httpx2.NetworkError` branch must come BEFORE `httpx2.HTTPError` (HTTPError is the broader base). `httpx2.NetworkError` is httpx's documented base for `ConnectError`, `ReadError`, `WriteError`, `PoolTimeout` — if `httpx2`'s symbol name differs (e.g., `httpx2.exceptions.NetworkError`), use whichever import path mirrors the existing `httpx2.ConnectError` import in `tests/test_error_mapping_terminal.py` (which works via top-level `httpx2`).

If `httpx2.NetworkError` does not exist, fall back to enumerating the transient subset explicitly: `except (httpx2.ConnectError, httpx2.ReadError, httpx2.WriteError, httpx2.PoolTimeout) as exc:`. The plan author has confirmed `httpx2.ConnectError` and `httpx2.ReadTimeout` already work in the existing tests; the enumeration fallback is safe.
If `httpx2.NetworkError` does not exist, fall back to enumerating the transient subset explicitly: `except (httpx2.ConnectError, httpx2.ReadError, httpx2.WriteError, httpx2.CloseError) as exc:`. (`PoolTimeout` is NOT in this list — it inherits from `httpx2.TimeoutException` and is already caught by the timeout branch above.) The plan author has confirmed `httpx2.ConnectError` and `httpx2.ReadTimeout` already work in the existing tests; the enumeration fallback is safe.

- [ ] **Step 6: Run the new terminal-mapping test**

Expand Down Expand Up @@ -265,7 +271,7 @@ git add src/httpware/errors.py src/httpware/client.py tests/test_errors.py tests
git commit -m "feat(errors): add NetworkError(TransportError) for transient httpx2 failures

Refines _terminal so httpx2.NetworkError-family exceptions (ConnectError, ReadError,
WriteError, PoolTimeout) map to httpware.NetworkError. InvalidURL and CookieConflict
WriteError, CloseError) map to httpware.NetworkError. InvalidURL and CookieConflict
stay bare TransportError. Prerequisite for the Retry middleware so it can retry
transient failures without retrying typos."
```
Expand Down Expand Up @@ -1029,15 +1035,30 @@ class Retry:
Run: `uv run pytest tests/test_retry.py -v`
Expected: all PASS.

- [ ] **Step 4: Lint**
- [ ] **Step 4: Wire `Retry` + `RetryBudget` into `resilience/__init__.py`**

Run: `uv run ruff check src/httpware/middleware/resilience/retry.py tests/test_retry.py && uv run ty check src/httpware/middleware/resilience/retry.py`
Now that both classes exist, replace `src/httpware/middleware/resilience/__init__.py` with:
```python
"""Resilience primitives: Retry middleware and RetryBudget token bucket."""

from httpware.middleware.resilience.budget import RetryBudget
from httpware.middleware.resilience.retry import Retry


__all__ = ["Retry", "RetryBudget"]
```

The `__all__` is required to silence ruff F401 ("imported but unused") and matches the pattern used by `httpware/__init__.py` and `httpware/decoders/__init__.py`.

- [ ] **Step 5: Lint**

Run: `uv run ruff check src/httpware/middleware/resilience/ tests/test_retry.py && uv run ty check src/httpware/middleware/resilience/`
Expected: clean. If ruff flags `Callable` / `Awaitable` import paths, adjust per existing project pattern (see `middleware/__init__.py` which uses `from collections.abc import Awaitable, Callable`).

- [ ] **Step 5: Stage and commit**
- [ ] **Step 6: Stage and commit**

```bash
git add src/httpware/middleware/resilience/retry.py tests/test_retry.py
git add src/httpware/middleware/resilience/retry.py src/httpware/middleware/resilience/__init__.py tests/test_retry.py
git commit -m "feat(resilience): Retry middleware — status-code retry + exhaustion

Covers: happy path, 503-then-200, max_attempts exhaustion with PEP-678 note,
Expand Down
2 changes: 1 addition & 1 deletion planning/specs/2026-06-05-retry-and-retry-budget-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The current `AsyncClient._terminal` maps every non-timeout `httpx2.HTTPError` (i
```python
# In src/httpware/errors.py:
class NetworkError(TransportError):
"""Transient network-layer failure (connect / read / write / pool). Safe to retry."""
"""Transient network-layer failure (connect / read / write / close). Safe to retry."""
```

And refines the terminal mapping so that `httpx2`'s transient-network exception family (`httpx2.NetworkError` per httpx convention, or whichever symbols httpx2 exposes for the same hierarchy) raises `httpware.NetworkError` rather than the broader `TransportError`. `InvalidURL` and `CookieConflict` continue to raise `TransportError` directly so they are NOT retried. Existing tests catching `TransportError` keep working (`NetworkError` is a subclass).
Expand Down
7 changes: 7 additions & 0 deletions src/httpware/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@
ConflictError,
ForbiddenError,
InternalServerError,
NetworkError,
NotFoundError,
RateLimitedError,
RetryBudgetExhaustedError,
ServerStatusError,
ServiceUnavailableError,
StatusError,
Expand All @@ -21,6 +23,7 @@
UnprocessableEntityError,
)
from httpware.middleware import Middleware, Next, after_response, before_request, on_error
from httpware.middleware.resilience import Retry, RetryBudget


__all__ = [
Expand All @@ -33,10 +36,14 @@
"ForbiddenError",
"InternalServerError",
"Middleware",
"NetworkError",
"Next",
"NotFoundError",
"RateLimitedError",
"ResponseDecoder",
"Retry",
"RetryBudget",
"RetryBudgetExhaustedError",
"ServerStatusError",
"ServiceUnavailableError",
"StatusError",
Expand Down
3 changes: 3 additions & 0 deletions src/httpware/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from httpware.errors import (
STATUS_TO_EXCEPTION,
ClientStatusError,
NetworkError,
ServerStatusError,
TimeoutError, # noqa: A004
TransportError,
Expand Down Expand Up @@ -110,6 +111,8 @@ async def _terminal(self, request: httpx2.Request) -> httpx2.Response:
raise TimeoutError(str(exc)) from exc
except (httpx2.InvalidURL, httpx2.CookieConflict) as exc:
raise TransportError(str(exc)) from exc
except httpx2.NetworkError as exc:
raise NetworkError(str(exc)) from exc
except httpx2.HTTPError as exc:
raise TransportError(str(exc)) from exc
except RuntimeError as exc:
Expand Down
47 changes: 47 additions & 0 deletions src/httpware/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,14 @@ class TransportError(ClientError):
"""Connection / network / protocol failure raised before a response was received."""


class NetworkError(TransportError):
"""Transient network-layer failure (connect/read/write/close). Safe to retry.

Pool-acquisition timeouts are NOT under this class; they raise ``TimeoutError``
via ``httpx2.PoolTimeout`` (a ``TimeoutException`` subclass).
"""


class TimeoutError(ClientError, builtins.TimeoutError): # noqa: A001
"""Client-side timeout (connect / read / write / pool).

Expand Down Expand Up @@ -136,3 +144,42 @@ class ServiceUnavailableError(ServerStatusError):
500: InternalServerError,
503: ServiceUnavailableError,
}


def _reconstruct_budget_exhausted(
cls: "type[RetryBudgetExhaustedError]",
last_response: httpx2.Response | None,
last_exception: BaseException | None,
attempts: int,
) -> "RetryBudgetExhaustedError":
return cls(last_response=last_response, last_exception=last_exception, attempts=attempts)


class RetryBudgetExhaustedError(ClientError):
"""Raised when a retry was needed but the RetryBudget refused to permit it.

Carries the last response and/or exception observed before the budget refused,
plus the number of attempts already completed.
"""

last_response: httpx2.Response | None
last_exception: BaseException | None
attempts: int

def __init__(
self,
*,
last_response: httpx2.Response | None,
last_exception: BaseException | None,
attempts: int,
) -> None:
self.last_response = last_response
self.last_exception = last_exception
self.attempts = attempts
super().__init__(f"retry budget exhausted after {attempts} attempt(s)")

def __reduce__(self) -> tuple[Any, ...]:
return (
_reconstruct_budget_exhausted,
(type(self), self.last_response, self.last_exception, self.attempts),
)
7 changes: 7 additions & 0 deletions src/httpware/middleware/resilience/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""Resilience primitives: Retry middleware and RetryBudget token bucket."""

from httpware.middleware.resilience.budget import RetryBudget
from httpware.middleware.resilience.retry import Retry


__all__ = ["Retry", "RetryBudget"]
26 changes: 26 additions & 0 deletions src/httpware/middleware/resilience/_backoff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
"""Full-jitter exponential backoff helper (private)."""

import random
from collections.abc import Callable


def full_jitter_delay(
attempt_index: int,
*,
base_delay: float,
max_delay: float,
_random_uniform: Callable[[float, float], float] = random.uniform,
) -> float:
"""Return a backoff delay using AWS's "full jitter" formulation.

sleep = uniform(0, min(max_delay, base_delay * 2.0 ** attempt_index))

`attempt_index` is 0 for the first retry, 1 for the second, etc.

Uses ``2.0 **`` (float exponentiation) rather than ``2 **`` so that
``attempt_index >= 1024`` saturates to ``math.inf`` and ``min`` clamps to
``max_delay`` — ``2 ** 1024`` would raise ``OverflowError`` during the
int→float conversion.
"""
ceiling = min(max_delay, base_delay * (2.0**attempt_index))
return _random_uniform(0.0, ceiling)
64 changes: 64 additions & 0 deletions src/httpware/middleware/resilience/budget.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
"""Finagle-style token-bucket retry budget.

See planning/specs/2026-06-05-retry-and-retry-budget-design.md for the contract.
No locking: asyncio runs coroutines cooperatively on a single thread, so deque
mutations between await points are atomic with respect to other coroutines on
the same event loop. Cross-thread use is out of scope.
"""

import time
from collections import deque
from collections.abc import Callable


class RetryBudget:
"""Token-bucket budget bounding retry rate to prevent retry storms.

Each request deposits a token; each retry attempts to withdraw one.
Available retries are bounded by `percent_can_retry` of recent deposits,
plus a `min_retries_per_sec * ttl` floor.
"""

def __init__(
self,
*,
ttl: float = 10.0,
min_retries_per_sec: float = 10.0,
percent_can_retry: float = 0.2,
_now: Callable[[], float] = time.monotonic,
) -> None:
self._ttl = ttl
self._min_retries_per_sec = min_retries_per_sec
self._percent_can_retry = percent_can_retry
self._now = _now
self._deposits: deque[float] = deque()
self._withdrawn: deque[float] = deque()

def _purge(self, now: float) -> None:
# Strict `< cutoff` keeps entries at exactly `now - ttl`: window is [now - ttl, now].
cutoff = now - self._ttl
while self._deposits and self._deposits[0] < cutoff:
self._deposits.popleft()
while self._withdrawn and self._withdrawn[0] < cutoff:
self._withdrawn.popleft()

def deposit(self) -> None:
"""Record a request (success or failure attempt). Adds one token."""
now = self._now()
self._purge(now)
self._deposits.append(now)

def try_withdraw(self) -> bool:
"""Atomically attempt to spend one retry token.

Returns True if a retry is permitted, False if the budget is exhausted.
Never blocks.
"""
now = self._now()
self._purge(now)
floor = int(self._min_retries_per_sec * self._ttl)
ceiling = int(len(self._deposits) * self._percent_can_retry) + floor
if len(self._withdrawn) >= ceiling:
return False
self._withdrawn.append(now)
return True
Loading
Loading