diff --git a/README.md b/README.md index 3fae5d6..5e8862c 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ `httpware` is a thin opinionated wrapper around `httpx2`. It re-exports `httpx2.Request`/`httpx2.Response`, adds a middleware chain composed at client construction, supports opt-in typed response decoding (pydantic and msgspec are both extras), and raises a status-keyed exception tree automatically on 4xx/5xx. It also ships a small resilience suite — `Retry` middleware with a Finagle-style `RetryBudget`, plus a `Bulkhead` concurrency limiter — under `httpware.middleware.resilience`. -> **Status:** Pre-1.0. Public API is subject to change between minor releases until v1.0. Streaming and observability are not yet shipped. +> **Status:** Pre-1.0. Public API is subject to change between minor releases until v1.0. ## Install @@ -17,7 +17,7 @@ pip install httpware # core only — no decoder pip install httpware[pydantic] # + PydanticDecoder (the default-decoder path) pip install httpware[msgspec] # + MsgspecDecoder -pip install httpware[all] # everything declared above (pydantic, msgspec) +pip install httpware[all] # everything declared above (pydantic, msgspec, otel) ``` `AsyncClient()` with no `decoder=` argument defaults to constructing a `PydanticDecoder`; that path requires the `pydantic` extra and raises `ImportError` at `AsyncClient.__init__` if it is missing. @@ -84,6 +84,28 @@ It does NOT pass through the middleware chain: `Retry`, `Bulkhead`, and any cust All 4xx/5xx responses raise typed exceptions automatically: `NotFoundError`, `ServiceUnavailableError`, `RateLimitedError`, etc. — all subclasses of `httpware.StatusError`. Transport-layer transient failures raise `NetworkError`; the resilience middleware raise `RetryBudgetExhaustedError` and `BulkheadFullError`. Everything inherits `httpware.ClientError`. +## Observability + +`Retry` and `Bulkhead` emit operational events via two channels — stdlib `logging` records (always on) and OpenTelemetry span events (when `opentelemetry-api` is installed). + +Logger names (`httpware.retry`, `httpware.bulkhead`) and event names (`retry.giving_up`, `retry.budget_refused`, `retry.streaming_refused`, `bulkhead.rejected`) are the stable public contract. + +```python +import logging + +# Enable visibility into retry / bulkhead operational events +logging.getLogger("httpware.retry").setLevel(logging.WARNING) +logging.getLogger("httpware.bulkhead").setLevel(logging.WARNING) +``` + +For OTel attribute enrichment on the active span — install the extra: + +```bash +pip install httpware[otel] +``` + +When installed, `_emit_event` calls `trace.get_current_span().add_event(name, attributes=...)` automatically. We never create our own spans; for HTTP-level tracing install `opentelemetry-instrumentation-httpx` separately. + ## 🗒️ [Release notes](https://github.com/modern-python/httpware/releases) ## 📦 [PyPI](https://pypi.org/project/httpware) diff --git a/docs/index.md b/docs/index.md index 925c381..0e50f29 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,7 +2,7 @@ A Python async HTTP client framework for building resilient service clients. `httpware` is a thin opinionated wrapper around `httpx2` — it re-exports `httpx2.Request`/`httpx2.Response` as the public request/response surface, adds a middleware chain (with a built-in resilience suite: `Retry` + `RetryBudget`, `Bulkhead`), opt-in typed response decoding, and a status-keyed exception tree raised automatically on 4xx/5xx. -> **Status:** Pre-1.0. Public API is subject to change between minor releases until v1.0. Streaming and observability are not yet shipped. +> **Status:** Pre-1.0. Public API is subject to change between minor releases until v1.0. ## Install @@ -82,6 +82,28 @@ It does NOT pass through the middleware chain: `Retry`, `Bulkhead`, and any cust All 4xx/5xx responses raise typed exceptions automatically: `NotFoundError`, `ServiceUnavailableError`, `RateLimitedError`, etc. — all subclasses of `httpware.StatusError`. Transport-layer transient failures raise `NetworkError`; the resilience middleware raise `RetryBudgetExhaustedError` and `BulkheadFullError`. Everything inherits `httpware.ClientError`. +## Observability + +`Retry` and `Bulkhead` emit operational events via two channels — stdlib `logging` records (always on) and OpenTelemetry span events (when `opentelemetry-api` is installed). + +Logger names (`httpware.retry`, `httpware.bulkhead`) and event names (`retry.giving_up`, `retry.budget_refused`, `retry.streaming_refused`, `bulkhead.rejected`) are the stable public contract. + +```python +import logging + +# Enable visibility into retry / bulkhead operational events +logging.getLogger("httpware.retry").setLevel(logging.WARNING) +logging.getLogger("httpware.bulkhead").setLevel(logging.WARNING) +``` + +For OTel attribute enrichment on the active span — install the extra: + +```bash +pip install httpware[otel] +``` + +When installed, `_emit_event` calls `trace.get_current_span().add_event(name, attributes=...)` automatically. We never create our own spans; for HTTP-level tracing install `opentelemetry-instrumentation-httpx` separately. + ## Where to go next - **[Engineering Notes](https://github.com/modern-python/httpware/blob/main/planning/engineering.md)** — design invariants, the three protocol seams, exception contract, module layout, testing patterns, optional-extras pattern. Lives in the repo at `planning/engineering.md`. diff --git a/planning/engineering.md b/planning/engineering.md index cb4fe79..6d80123 100644 --- a/planning/engineering.md +++ b/planning/engineering.md @@ -4,7 +4,7 @@ This doc is the single distilled reference for `httpware` design rationale, prot ## 1. Project intent -`httpware` is a thin opinionated wrapper around `httpx2`. It re-exports `httpx2.Request` and `httpx2.Response` as the public request/response surface and adds three things on top: typed response decoding (via a `ResponseDecoder` protocol; pydantic and msgspec are both opt-in extras as of 0.3.0), a middleware chain composed at client construction, and a status-keyed exception tree raised automatically on 4xx and 5xx. `AsyncClient(decoder=None)` defaults to constructing a `PydanticDecoder` and so requires the `pydantic` extra; callers can supply an explicit `decoder=` argument to escape the default. As of 0.4.0, the package ships a small resilience suite under `httpware.middleware.resilience` — a `Retry` middleware with a Finagle-style `RetryBudget`, plus a `Bulkhead` concurrency limiter — composed via the standard middleware chain. As of 0.5.0, `AsyncClient.stream()` provides a context-manager API for chunked response bodies; it bypasses the middleware chain by design (see planning/specs/2026-06-05-streaming-design.md). +`httpware` is a thin opinionated wrapper around `httpx2`. It re-exports `httpx2.Request` and `httpx2.Response` as the public request/response surface and adds three things on top: typed response decoding (via a `ResponseDecoder` protocol; pydantic and msgspec are both opt-in extras as of 0.3.0), a middleware chain composed at client construction, and a status-keyed exception tree raised automatically on 4xx and 5xx. `AsyncClient(decoder=None)` defaults to constructing a `PydanticDecoder` and so requires the `pydantic` extra; callers can supply an explicit `decoder=` argument to escape the default. As of 0.4.0, the package ships a small resilience suite under `httpware.middleware.resilience` — a `Retry` middleware with a Finagle-style `RetryBudget`, plus a `Bulkhead` concurrency limiter — composed via the standard middleware chain. As of 0.5.0, `AsyncClient.stream()` provides a context-manager API for chunked response bodies; it bypasses the middleware chain by design (see planning/specs/2026-06-05-streaming-design.md). As of 0.6.0, `Retry` and `Bulkhead` emit operational events via stdlib `logging` records (`httpware.retry` / `httpware.bulkhead` loggers) and — when `opentelemetry-api` is installed — OpenTelemetry span events on the active span. The 0.1.0 release attempted to own a full abstraction over the underlying HTTP client. v0.2 walks that back: `httpx2` is part of the public surface. @@ -110,7 +110,7 @@ msgspec = ["msgspec>=0.18"] Each extra's code lives in a single dedicated module (`decoders/pydantic.py`, `decoders/msgspec.py`). The `import` of the extra happens **inside** that module behind an `is__installed` guard from `_internal/import_checker.py` — never at package top level. This way, `import httpware` works cleanly without the extras installed, and the seam stays observable: `grep -rnE 'from pydantic|import pydantic' src/httpware/ | grep -v import_checker` returns exactly one indented line (the guarded import in `decoders/pydantic.py`), and the same is true for `msgspec`. -New extras are added at the same time as the code that uses them — never preemptively. (An `otel` extra existed pre-0.4 but was removed once we noticed it was advertising functionality that didn't exist. Epic 5 will reintroduce it when the OpenTelemetry middleware actually lands.) +New extras are added at the same time as the code that uses them — never preemptively. (An `otel` extra existed pre-0.4 but was removed once we noticed it was advertising functionality that didn't exist. 0.6.0 reintroduces it paired with the code that uses it — `Retry` and `Bulkhead` add events to the active OpenTelemetry span via `trace.get_current_span().add_event(...)`.) Caller-facing pattern: consumers select the implementation by passing it explicitly, e.g., `AsyncClient(decoder=PydanticDecoder())`. There is no auto-detection or implicit registry. @@ -133,7 +133,7 @@ Post-pivot, the roadmap has three categories. Topic slugs in `planning/specs/` a - **Shipped in v0.4 slice 2:** `Bulkhead` middleware (concurrency limiter via `asyncio.Semaphore` with bounded acquire wait). See [`planning/specs/2026-06-05-bulkhead-design.md`](specs/2026-06-05-bulkhead-design.md) and [`planning/plans/2026-06-05-bulkhead-plan.md`](plans/2026-06-05-bulkhead-plan.md). - **Remaining:** `3-6` extension-slot docs. - **Epic 4 — Streaming:** SHIPPED in v0.5 (PR #…): `AsyncClient.stream()` context manager + Retry refuses streamed-body requests. See [`planning/specs/2026-06-05-streaming-design.md`](specs/2026-06-05-streaming-design.md) and [`planning/plans/2026-06-05-streaming-plan.md`](plans/2026-06-05-streaming-plan.md). -- **Epic 5 — Observability:** `5-1` Layer 1 middleware hooks, `5-2` wire into resilience middlewares, `5-4` OpenTelemetry middleware (will declare the `otel` extra at the same time the code lands), `5-5` logging policy CI grep. +- **Epic 5 — Observability:** SHIPPED in v0.6 (PR #…) — re-scoped from the original 4-story plan. `Retry` and `Bulkhead` emit operational events via stdlib `logging` + opt-in OpenTelemetry span events. Stories `5-1` (Layer 1 middleware hooks) and `5-4` (standalone OTel middleware) RETIRED — `opentelemetry-instrumentation-httpx` already covers transport-level tracing; a separate httpware middleware would duplicate it. See [`planning/specs/2026-06-05-observability-design.md`](specs/2026-06-05-observability-design.md) and [`planning/plans/2026-06-05-observability-plan.md`](plans/2026-06-05-observability-plan.md). - **Epic 6 — Ship v1.0:** `6-2` docs site (`mkdocs`), `6-3` benchmarks, `6-5` release flow (Trusted Publishers + Sigstore). - **Carry-forward decoder:** `1-6` msgspec decoder via extras — second `ResponseDecoder` adapter, already implemented; verified surviving in the pivot. - **Middleware protocol:** `2-1` and `2-2` already implemented in the pivot (protocol, chain, phase decorators). diff --git a/planning/plans/2026-06-05-observability-plan.md b/planning/plans/2026-06-05-observability-plan.md new file mode 100644 index 0000000..ae70348 --- /dev/null +++ b/planning/plans/2026-06-05-observability-plan.md @@ -0,0 +1,1048 @@ +# Resilience observability (0.6.0, Epic 5 re-scoped) Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** `Retry` and `Bulkhead` emit four operational-significance events (`retry.giving_up`, `retry.budget_refused`, `retry.streaming_refused`, `bulkhead.rejected`) via two channels: stdlib `logging` records (always on) and `opentelemetry.trace.get_current_span().add_event(...)` when the new `otel` extra is installed. Logger names `httpware.retry` / `httpware.bulkhead` are the public contract. + +**Architecture:** New `_internal/observability.py` with a single `_emit_event(logger, event_name, *, level, message, attributes)` helper. Lazy `from opentelemetry import trace` happens inside an `if import_checker.is_otel_installed:` gate, preserving the optional-extras isolation invariant. `Retry` and `Bulkhead` acquire module-level loggers and call the helper at their event sites. Re-introduces the `otel` extra (PR #24 removed it as YAGNI; this PR brings it back paired with the code that uses it). + +**Tech Stack:** Python 3.11+, stdlib `logging`, optional `opentelemetry-api>=1.20` (just the API, not the SDK). + +**Target branch:** `feat/v0.6-observability`. Create from `main` before Task 1: `git checkout main && git pull && git checkout -b feat/v0.6-observability`. + +**Source spec:** [`planning/specs/2026-06-05-observability-design.md`](../specs/2026-06-05-observability-design.md). Read it before starting — the *why* for each decision lives there. + +--- + +## File structure + +**New files:** +- `src/httpware/_internal/observability.py` — `_emit_event` helper +- `tests/test_observability.py` — unit tests for the helper +- `tests/test_optional_extras_otel_missing.py` — fail-soft tests +- `planning/releases/0.6.0.md` — release notes + +**Modified files:** +- `pyproject.toml` — re-add `otel` extra; include in `all` +- `src/httpware/_internal/import_checker.py` — add `is_otel_installed` +- `src/httpware/middleware/resilience/retry.py` — add `_LOGGER` + 3 `_emit_event` calls +- `src/httpware/middleware/resilience/bulkhead.py` — add `_LOGGER` + 1 `_emit_event` call +- `tests/test_retry.py` — 3 new emission tests +- `tests/test_bulkhead.py` — 1 new emission test +- `tests/test_optional_extras_isolation.py` — add `opentelemetry` isolation check +- `README.md` — add "Observability" section +- `docs/index.md` — mirror the README addition +- `planning/engineering.md` — §1 (mention observability) + §7 (re-add otel extra description) + §8 (retire 5-1/5-4; mark 5-2 shipped) + +**Commit cadence:** one commit per task. Per-task commits keep history reviewable. + +--- + +## Task 1: Branch + `otel` extra + `is_otel_installed` + isolation test + +**Files:** +- Modify: `pyproject.toml` +- Modify: `src/httpware/_internal/import_checker.py` +- Modify: `tests/test_optional_extras_isolation.py` + +- [ ] **Step 1: Create the branch** + +```bash +git checkout main && git pull && git checkout -b feat/v0.6-observability +``` +Expected: switched to a new branch. + +- [ ] **Step 2: Re-add the `otel` extra in `pyproject.toml`** + +Read the current `[project.optional-dependencies]` block. It currently looks like: +```toml +[project.optional-dependencies] +pydantic = ["pydantic>=2.0,<3.0"] +msgspec = ["msgspec>=0.18"] +all = ["httpware[pydantic,msgspec]"] +``` + +Replace with: +```toml +[project.optional-dependencies] +pydantic = ["pydantic>=2.0,<3.0"] +msgspec = ["msgspec>=0.18"] +otel = ["opentelemetry-api>=1.20"] +all = ["httpware[pydantic,msgspec,otel]"] +``` + +Note: just `opentelemetry-api`, NOT `opentelemetry-sdk`. Users supply their own SDK. + +- [ ] **Step 3: Sync deps** + +```bash +just install +``` +Expected: uv installs `opentelemetry-api` (and its `opentelemetry-semantic-conventions` transitive dep). + +- [ ] **Step 4: Add `is_otel_installed` to `import_checker.py`** + +Replace the file content with: +```python +"""Detect optional extras without importing them. Used by adapter modules to gate hard imports.""" + +from importlib.util import find_spec + + +is_msgspec_installed = find_spec("msgspec") is not None +is_pydantic_installed = find_spec("pydantic") is not None +is_otel_installed = find_spec("opentelemetry") is not None +``` + +- [ ] **Step 5: Add isolation test in `tests/test_optional_extras_isolation.py`** + +Append: +```python +def test_importing_httpware_does_not_import_opentelemetry() -> None: + """Fresh subprocess: opentelemetry must NOT appear in sys.modules after `import httpware`. + + opentelemetry-api IS installed in the test environment (via `--all-extras`), so this + test runs in a subprocess with a clean interpreter to verify that nothing + in the httpware import chain pulls opentelemetry in. + """ + result = subprocess.run( + [ + sys.executable, + "-c", + "import httpware; import sys; sys.exit(0 if 'opentelemetry' not in sys.modules else 1)", + ], + check=False, + capture_output=True, + ) + assert result.returncode == 0, ( + f"opentelemetry was loaded transitively by `import httpware`; stdout={result.stdout!r} stderr={result.stderr!r}" + ) +``` + +- [ ] **Step 6: Run the isolation tests** + +```bash +uv run pytest tests/test_optional_extras_isolation.py -v +``` +Expected: all 3 PASS (pydantic, msgspec, opentelemetry). + +- [ ] **Step 7: Lint + full suite** + +```bash +just lint && just test +``` +Expected: clean, 100% coverage maintained. + +- [ ] **Step 8: Commit** + +```bash +git add pyproject.toml src/httpware/_internal/import_checker.py tests/test_optional_extras_isolation.py +git commit -m "chore(deps): re-add otel optional extra paired with the code that uses it + +PR #24 removed the otel extra as YAGNI (it advertised functionality that +didn't exist). 0.6.0 brings it back: structured-logging observability +in Retry / Bulkhead with opt-in OTel attribute enrichment lands in +the next commits. + +otel = ['opentelemetry-api>=1.20'] only — no SDK. Users supply their +own SDK (or use a no-op tracer in tests). Matches how +opentelemetry-instrumentation-httpx declares its dep. + +import_checker gains is_otel_installed alongside the existing flags. +Isolation test extended to verify import httpware does not pull +opentelemetry into sys.modules." +``` + +--- + +## Task 2: `_internal/observability.py` helper + unit tests + +**Files:** +- Create: `src/httpware/_internal/observability.py` +- Create: `tests/test_observability.py` + +- [ ] **Step 1: Write failing tests in `tests/test_observability.py`** + +Create `tests/test_observability.py`: +```python +"""Unit tests for the _emit_event observability helper.""" + +import logging +import sys +from unittest.mock import MagicMock, patch + +import pytest + +from httpware._internal.observability import _emit_event + + +_TEST_LOGGER = logging.getLogger("httpware.test.observability") + + +def test_emit_event_logs_at_warning_with_extra_fields(caplog: pytest.LogCaptureFixture) -> None: + """The helper emits one structured log record at WARNING with attributes accessible on the record.""" + with caplog.at_level(logging.WARNING, logger="httpware.test.observability"): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="something interesting happened", + attributes={"foo": 1, "bar": "x"}, + ) + + assert len(caplog.records) == 1 + record = caplog.records[0] + assert record.levelno == logging.WARNING + assert record.message == "something interesting happened" + assert record.foo == 1 # ty: ignore[unresolved-attribute] + assert record.bar == "x" # ty: ignore[unresolved-attribute] + + +def test_emit_event_respects_level_parameter(caplog: pytest.LogCaptureFixture) -> None: + """When level=DEBUG is passed, the record is at DEBUG.""" + with caplog.at_level(logging.DEBUG, logger="httpware.test.observability"): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.DEBUG, + message="quiet", + attributes={}, + ) + + assert len(caplog.records) == 1 + assert caplog.records[0].levelno == logging.DEBUG + + +def test_emit_event_does_not_import_opentelemetry_when_flag_false() -> None: + """With is_otel_installed=False the helper must not touch opentelemetry.""" + with patch("httpware._internal.import_checker.is_otel_installed", False): + # Confirm the lazy import path is skipped: snapshot sys.modules before/after. + modules_before = set(sys.modules) + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="nope", + attributes={"x": 1}, + ) + # opentelemetry may already be loaded by other tests; allow that, but no NEW load happened + # because the codepath was skipped. The point: no error and no required state change. + assert len(modules_before) >= 0 # noqa: PLR2004 — sanity assertion the with-block ran + + +def test_emit_event_calls_add_event_when_otel_installed() -> None: + """With is_otel_installed=True the helper calls trace.get_current_span().add_event(...).""" + mock_span = MagicMock(name="MockSpan") + with ( + patch("httpware._internal.import_checker.is_otel_installed", True), + patch("opentelemetry.trace.get_current_span", return_value=mock_span), + ): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="hi", + attributes={"k": "v"}, + ) + + mock_span.add_event.assert_called_once_with("test.event", attributes={"k": "v"}) + + +def test_emit_event_works_when_otel_installed_but_no_active_span() -> None: + """With OTel installed but no tracer configured, get_current_span() returns NonRecordingSpan; + add_event is a documented no-op. No error. + """ + # Real OTel API call (no mocking) — opentelemetry-api is installed via the otel extra. + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="real-otel-but-no-tracer", + attributes={"a": 1}, + ) + # No assertion needed — the absence of an exception IS the assertion. +``` + +Run: `uv run pytest tests/test_observability.py -v` +Expected: FAIL with `ImportError: cannot import name '_emit_event' from 'httpware._internal.observability'`. + +- [ ] **Step 2: Implement the helper** + +Create `src/httpware/_internal/observability.py`: +```python +"""Observability emission helper — structured logging + opt-in OpenTelemetry span events. + +See planning/specs/2026-06-05-observability-design.md for the contract. + +Logger names (``httpware.retry``, ``httpware.bulkhead``) and event names +(``retry.giving_up``, ``bulkhead.rejected``, etc.) are the public observability +surface. They are stable: renames are breaking changes. +""" + +import logging +import typing + +from httpware._internal import import_checker + + +def _emit_event( + logger: logging.Logger, + event_name: str, + *, + level: int, + message: str, + attributes: dict[str, typing.Any], +) -> None: + """Emit one observability event to both channels. + + 1. Always emits a structured log record at ``level`` with ``extra=attributes`` + (so log aggregators that index ``extra`` see structured fields). + 2. If ``opentelemetry-api`` is installed, calls + ``trace.get_current_span().add_event(event_name, attributes=attributes)``. + When no tracer is active, ``get_current_span()`` returns a ``NonRecordingSpan`` + whose ``add_event`` is a documented no-op — so the call is unconditional + behind the install gate. + + The lazy ``from opentelemetry import trace`` inside the if-block preserves + the optional-extras isolation invariant: ``import httpware`` must not pull + ``opentelemetry`` into ``sys.modules`` when the extra is absent. + """ + logger.log(level, message, extra=attributes) + if import_checker.is_otel_installed: + from opentelemetry import trace # noqa: PLC0415 — lazy by design (optional-extras isolation) + trace.get_current_span().add_event(event_name, attributes=attributes) +``` + +- [ ] **Step 3: Run the observability tests** + +```bash +uv run pytest tests/test_observability.py -v +``` +Expected: all PASS. + +- [ ] **Step 4: Lint + full suite** + +```bash +just lint && just test +``` +Expected: clean, 100% coverage. + +- [ ] **Step 5: Stage and commit** + +```bash +git add src/httpware/_internal/observability.py tests/test_observability.py +git commit -m "feat(observability): _emit_event helper for resilience middleware events + +New _internal/observability.py with a single _emit_event helper. Always +emits a structured log record at the requested level; if opentelemetry-api +is installed, calls trace.get_current_span().add_event(name, attributes=...) +on the active span. + +The lazy 'from opentelemetry import trace' inside the if is_otel_installed +gate preserves the optional-extras isolation invariant (import httpware +does not pull opentelemetry when the extra is absent). + +Logger names and event names are the public observability surface; the +helper itself lives in _internal/ so users interact only with the +strings, not Python imports." +``` + +--- + +## Task 3: `Retry` emits 3 operational events + +**Files:** +- Modify: `src/httpware/middleware/resilience/retry.py` +- Modify: `tests/test_retry.py` + +- [ ] **Step 1: Write failing tests in `tests/test_retry.py`** + +Append to `tests/test_retry.py`: +```python +async def test_retry_giving_up_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When max_attempts is exhausted, emit one WARNING record on httpware.retry.""" + sleeper = _SleepRecorder() + handler = _ResponseSequence([HTTPStatus.SERVICE_UNAVAILABLE] * 3) + client = _client(handler, retry=Retry(_sleep=sleeper, max_attempts=3, base_delay=0.001, max_delay=0.002)) + + with caplog.at_level(logging.WARNING, logger="httpware.retry"): + with pytest.raises(ServiceUnavailableError): + await client.get("https://example.test/x") + + retry_records = [r for r in caplog.records if r.name == "httpware.retry"] + giving_up_records = [r for r in retry_records if r.message.startswith("retry gave up")] + assert len(giving_up_records) == 1 + record = giving_up_records[0] + assert record.levelno == logging.WARNING + assert record.attempts == 3 # ty: ignore[unresolved-attribute] + assert record.method == "GET" # ty: ignore[unresolved-attribute] + assert record.last_status == HTTPStatus.SERVICE_UNAVAILABLE # ty: ignore[unresolved-attribute] + assert record.last_exception_type == "ServiceUnavailableError" # ty: ignore[unresolved-attribute] + + +async def test_retry_budget_refused_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When the budget refuses a retry, emit one WARNING record on httpware.retry.""" + sleeper = _SleepRecorder() + stingy_budget = RetryBudget(percent_can_retry=0.0, min_retries_per_sec=0.0) + handler = _ResponseSequence([HTTPStatus.SERVICE_UNAVAILABLE, HTTPStatus.SERVICE_UNAVAILABLE]) + client = _client( + handler, + retry=Retry(_sleep=sleeper, budget=stingy_budget, max_attempts=3, base_delay=0.001), + ) + + with caplog.at_level(logging.WARNING, logger="httpware.retry"): + with pytest.raises(RetryBudgetExhaustedError): + await client.get("https://example.test/x") + + retry_records = [r for r in caplog.records if r.name == "httpware.retry"] + budget_records = [r for r in retry_records if "budget" in r.message] + assert len(budget_records) == 1 + record = budget_records[0] + assert record.attempts == 1 # ty: ignore[unresolved-attribute] + assert record.method == "GET" # ty: ignore[unresolved-attribute] + assert record.last_status == HTTPStatus.SERVICE_UNAVAILABLE # ty: ignore[unresolved-attribute] + + +async def test_retry_streaming_refused_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When the streaming-body marker prevents a retryable retry, emit one WARNING record on httpware.retry. + + Uses an idempotent method (PUT) so we hit the retryable-failure-path streaming-refusal site, + NOT the non-idempotent early-exit sites (which don't emit the event per the spec). + """ + sleeper = _SleepRecorder() + handler = _ResponseSequence([HTTPStatus.SERVICE_UNAVAILABLE, HTTPStatus.SERVICE_UNAVAILABLE]) + client = _client(handler, retry=Retry(_sleep=sleeper, base_delay=0.001, max_delay=0.002)) + + async def streamed_body() -> typing.AsyncIterator[bytes]: + yield b"x" + + with caplog.at_level(logging.WARNING, logger="httpware.retry"): + with pytest.raises(ServiceUnavailableError): + await client.put("https://example.test/x", content=streamed_body()) + + retry_records = [r for r in caplog.records if r.name == "httpware.retry"] + streaming_records = [r for r in retry_records if "stream" in r.message] + assert len(streaming_records) == 1 + record = streaming_records[0] + assert record.method == "PUT" # ty: ignore[unresolved-attribute] + assert record.last_exception_type == "ServiceUnavailableError" # ty: ignore[unresolved-attribute] +``` + +`logging` should already be imported at the top of `tests/test_retry.py` if not, add it. + +Run: `uv run pytest tests/test_retry.py -v -k "emits_observability_event"` +Expected: all 3 FAIL — no events emitted yet. + +- [ ] **Step 2: Add `_LOGGER` constant to `retry.py`** + +In `src/httpware/middleware/resilience/retry.py`, after the existing module-level constants block (around `DEFAULT_IDEMPOTENT_METHODS` / `_MAX_ATTEMPTS_INVALID` / `_STREAMING_BODY_REFUSAL_NOTE`), add: +```python +import logging + +_LOGGER = logging.getLogger("httpware.retry") +``` + +Hoist the `import logging` to the top of the file alongside other stdlib imports if not already present (per project convention — no in-function imports). + +- [ ] **Step 3: Import `_emit_event`** + +Add to the imports block: +```python +from httpware._internal.observability import _emit_event +``` + +- [ ] **Step 4: Emit `retry.giving_up` event in `Retry.__call__`** + +Find the `if is_last:` block (around line 153). Currently: +```python +if is_last: + if last_exc is None: # pragma: no cover — structural invariant from except branch + msg = "Retry: last_exc unset on final attempt — unreachable" + raise AssertionError(msg) + last_exc.add_note(f"httpware: gave up after {attempt + 1} attempts") + raise last_exc +``` + +Insert the emit call after `add_note(...)` and before `raise last_exc`: +```python +if is_last: + if last_exc is None: # pragma: no cover — structural invariant from except branch + msg = "Retry: last_exc unset on final attempt — unreachable" + raise AssertionError(msg) + last_exc.add_note(f"httpware: gave up after {attempt + 1} attempts") + _emit_event( + _LOGGER, + "retry.giving_up", + level=logging.WARNING, + message=f"retry gave up after {attempt + 1} attempts", + attributes={ + "attempts": attempt + 1, + "method": request.method, + "url": str(request.url), + "last_status": last_response.status_code if last_response is not None else None, + "last_exception_type": type(last_exc).__qualname__, + }, + ) + raise last_exc +``` + +- [ ] **Step 5: Emit `retry.budget_refused` event** + +Find the budget exhaustion block (the `if not self.budget.try_withdraw():` site, around line 160). Currently: +```python +if not self.budget.try_withdraw(): + raise RetryBudgetExhaustedError( + last_response=last_response, + last_exception=last_exc, + attempts=attempt + 1, + ) from last_exc +``` + +Insert the emit call BEFORE the raise: +```python +if not self.budget.try_withdraw(): + _emit_event( + _LOGGER, + "retry.budget_refused", + level=logging.WARNING, + message=f"retry budget refused after {attempt + 1} attempts", + attributes={ + "attempts": attempt + 1, + "method": request.method, + "url": str(request.url), + "last_status": last_response.status_code if last_response is not None else None, + }, + ) + raise RetryBudgetExhaustedError( + last_response=last_response, + last_exception=last_exc, + attempts=attempt + 1, + ) from last_exc +``` + +- [ ] **Step 6: Emit `retry.streaming_refused` event at the retryable-failure-path site only** + +Find the streaming-body refusal block in the retryable-failure-path (NOT the early-exit sites). It's the one at around line 144: +```python +# ---- retryable failure path +if request.extensions.get(STREAMING_BODY_MARKER): + if last_exc is None: # pragma: no cover — invariant from except branch + msg = "Retry: streaming-body refusal reached with no last_exc" + raise AssertionError(msg) + last_exc.add_note(_STREAMING_BODY_REFUSAL_NOTE) + raise last_exc +``` + +Insert the emit call after `add_note(...)` and before `raise last_exc`: +```python +# ---- retryable failure path +if request.extensions.get(STREAMING_BODY_MARKER): + if last_exc is None: # pragma: no cover — invariant from except branch + msg = "Retry: streaming-body refusal reached with no last_exc" + raise AssertionError(msg) + last_exc.add_note(_STREAMING_BODY_REFUSAL_NOTE) + _emit_event( + _LOGGER, + "retry.streaming_refused", + level=logging.WARNING, + message="retry refused — request body is a stream that cannot replay", + attributes={ + "method": request.method, + "url": str(request.url), + "last_exception_type": type(last_exc).__qualname__, + }, + ) + raise last_exc +``` + +**IMPORTANT**: do NOT add the emit call at the 3 non-idempotent early-exit sites (lines 111-118, 120-127, 131-138). At those sites the primary reason for not retrying is method-eligibility; the `add_note` call already provides context. The EVENT only fires when streaming was the deciding factor. + +- [ ] **Step 7: Run the new tests** + +```bash +uv run pytest tests/test_retry.py -v -k "emits_observability_event" +``` +Expected: all 3 PASS. + +- [ ] **Step 8: Lint + full suite** + +```bash +just lint && just test +``` +Expected: clean, 100% coverage. + +- [ ] **Step 9: Stage and commit** + +```bash +git add src/httpware/middleware/resilience/retry.py tests/test_retry.py +git commit -m "feat(retry): emit operational events via httpware.retry logger + OTel + +Three event sites: +- retry.giving_up (WARNING): max_attempts exhausted +- retry.budget_refused (WARNING): budget.try_withdraw() refused +- retry.streaming_refused (WARNING): streaming-body marker prevented an + otherwise-retryable retry (retryable-failure-path site only — the 3 + non-idempotent early-exit sites still add the note but do NOT emit + this event, since at those sites method-eligibility is the primary + reason for not retrying). + +All four events have flat, scalar attributes (method, url, attempts, +last_status, last_exception_type) so they index cleanly in log +aggregators and serialize cleanly as OTel attributes." +``` + +--- + +## Task 4: `Bulkhead` emits `bulkhead.rejected` event + +**Files:** +- Modify: `src/httpware/middleware/resilience/bulkhead.py` +- Modify: `tests/test_bulkhead.py` + +- [ ] **Step 1: Write failing test in `tests/test_bulkhead.py`** + +Append: +```python +import logging # add to the top of the file if not present + + +async def test_bulkhead_rejected_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When acquire_timeout elapses without acquisition, emit one WARNING record on httpware.bulkhead.""" + handler = _SlowHandler(delay=_ACQUIRE_TIMEOUT_LONG) + client = _client( + handler, + bulkhead=Bulkhead(max_concurrent=_MAX_CONCURRENT_1, acquire_timeout=_ACQUIRE_TIMEOUT_SHORT), + ) + + first = asyncio.create_task(client.get("https://example.test/a")) + await asyncio.sleep(0.005) # let first acquire + + with caplog.at_level(logging.WARNING, logger="httpware.bulkhead"): + with pytest.raises(BulkheadFullError): + await client.get("https://example.test/b") + + bulkhead_records = [r for r in caplog.records if r.name == "httpware.bulkhead"] + assert len(bulkhead_records) == 1 + record = bulkhead_records[0] + assert record.levelno == logging.WARNING + assert "rejected" in record.message + assert record.max_concurrent == _MAX_CONCURRENT_1 # ty: ignore[unresolved-attribute] + assert record.acquire_timeout == _ACQUIRE_TIMEOUT_SHORT # ty: ignore[unresolved-attribute] + assert record.method == "GET" # ty: ignore[unresolved-attribute] + + await first # cleanup +``` + +Run: `uv run pytest tests/test_bulkhead.py -v -k "rejected_emits_observability_event"` +Expected: FAIL — no event emitted yet. + +- [ ] **Step 2: Add `_LOGGER` constant + emit import to `bulkhead.py`** + +Add at the top of `src/httpware/middleware/resilience/bulkhead.py` (after the existing `import asyncio`): +```python +import logging +``` + +After the existing module-level constants (`_MAX_CONCURRENT_INVALID`, `_ACQUIRE_TIMEOUT_INVALID`): +```python +_LOGGER = logging.getLogger("httpware.bulkhead") +``` + +Add to the imports block: +```python +from httpware._internal.observability import _emit_event +``` + +- [ ] **Step 3: Emit `bulkhead.rejected` event in `Bulkhead.__call__`** + +Find the `except TimeoutError as exc:` block (where `BulkheadFullError` is raised): +```python +except TimeoutError as exc: + raise BulkheadFullError( + max_concurrent=self._max_concurrent, + acquire_timeout=self._acquire_timeout, + ) from exc +``` + +Insert the emit call BEFORE the raise: +```python +except TimeoutError as exc: + _emit_event( + _LOGGER, + "bulkhead.rejected", + level=logging.WARNING, + message=f"bulkhead rejected — full (max_concurrent={self._max_concurrent}, acquire_timeout={self._acquire_timeout})", + attributes={ + "max_concurrent": self._max_concurrent, + "acquire_timeout": self._acquire_timeout, + "method": request.method, + "url": str(request.url), + }, + ) + raise BulkheadFullError( + max_concurrent=self._max_concurrent, + acquire_timeout=self._acquire_timeout, + ) from exc +``` + +- [ ] **Step 4: Run the new test** + +```bash +uv run pytest tests/test_bulkhead.py -v -k "rejected_emits_observability_event" +``` +Expected: PASS. + +- [ ] **Step 5: Lint + full suite** + +```bash +just lint && just test +``` +Expected: clean, 100% coverage. + +- [ ] **Step 6: Stage and commit** + +```bash +git add src/httpware/middleware/resilience/bulkhead.py tests/test_bulkhead.py +git commit -m "feat(bulkhead): emit rejected event via httpware.bulkhead logger + OTel + +One event site: bulkhead.rejected (WARNING) fires immediately before +BulkheadFullError is raised. Attributes: max_concurrent, acquire_timeout, +method, url. + +acquire_timeout=0 (fail-fast) and acquire_timeout>0 (bounded wait) +both flow through this single emission — the attribute value +distinguishes them at consumer-side." +``` + +--- + +## Task 5: Fail-soft tests for OTel-missing + +**Files:** +- Create: `tests/test_optional_extras_otel_missing.py` + +- [ ] **Step 1: Create the file** + +```python +"""Fail-soft tests for the otel optional-extra (0.6.0). + +opentelemetry-api IS installed in the CI test environment via `--all-extras`. +To simulate the "extra not installed" case, patch +`httpware._internal.import_checker.is_otel_installed = False` for the +duration of the test. + +The contract: observability emission (the structured log record half) must +work regardless of whether opentelemetry-api is available. The OTel half is +silently skipped when the flag is False. +""" + +import logging +from unittest.mock import patch + +import pytest + +from httpware._internal.observability import _emit_event + + +_TEST_LOGGER = logging.getLogger("httpware.test.otel_missing") + + +def test_emit_event_logs_record_without_otel(caplog: pytest.LogCaptureFixture) -> None: + """The structured log record is emitted even when opentelemetry-api is 'missing'.""" + with patch("httpware._internal.import_checker.is_otel_installed", False): + with caplog.at_level(logging.WARNING, logger="httpware.test.otel_missing"): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="works without otel", + attributes={"x": 1}, + ) + + assert len(caplog.records) == 1 + record = caplog.records[0] + assert record.message == "works without otel" + assert record.x == 1 # ty: ignore[unresolved-attribute] + + +def test_emit_event_does_not_call_opentelemetry_apis_when_flag_false() -> None: + """With is_otel_installed=False, no opentelemetry.trace call is made.""" + with ( + patch("httpware._internal.import_checker.is_otel_installed", False), + patch("opentelemetry.trace.get_current_span") as mock_get_span, + ): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="silent on otel", + attributes={}, + ) + + mock_get_span.assert_not_called() +``` + +- [ ] **Step 2: Run the tests** + +```bash +uv run pytest tests/test_optional_extras_otel_missing.py -v +``` +Expected: both PASS. + +- [ ] **Step 3: Lint + full suite** + +```bash +just lint && just test +``` +Expected: clean, 100% coverage. + +- [ ] **Step 4: Commit** + +```bash +git add tests/test_optional_extras_otel_missing.py +git commit -m "test(optional): observability emission works without otel extra + +Mirrors the existing test_optional_extras_pydantic_missing.py pattern: +patches httpware._internal.import_checker.is_otel_installed to False +to simulate the 'extra not installed' case. Verifies that the +structured-log half of _emit_event still works and that no +opentelemetry.trace.get_current_span call is attempted." +``` + +--- + +## Task 6: Documentation + release notes + +**Files:** +- Modify: `README.md` +- Modify: `docs/index.md` +- Modify: `planning/engineering.md` +- Create: `planning/releases/0.6.0.md` + +- [ ] **Step 1: Add Observability section to README.md** + +After the existing `## Errors` section and BEFORE the link section (`## 🗒️ [Release notes]`), insert: + +```markdown + +## Observability + +`Retry` and `Bulkhead` emit operational events via two channels — stdlib `logging` records (always on) and OpenTelemetry span events (when `opentelemetry-api` is installed). + +Logger names (`httpware.retry`, `httpware.bulkhead`) and event names (`retry.giving_up`, `retry.budget_refused`, `retry.streaming_refused`, `bulkhead.rejected`) are the stable public contract. + +```python +import logging + +# Enable visibility into retry / bulkhead operational events +logging.getLogger("httpware.retry").setLevel(logging.WARNING) +logging.getLogger("httpware.bulkhead").setLevel(logging.WARNING) +``` + +For OTel attribute enrichment on the active span — install the extra: + +```bash +pip install httpware[otel] +``` + +When installed, `_emit_event` calls `trace.get_current_span().add_event(name, attributes=...)` automatically. We never create our own spans; for HTTP-level tracing install `opentelemetry-instrumentation-httpx` separately. +``` + +- [ ] **Step 2: Update the [all] install line in README.md** + +Find: +```markdown +pip install httpware[all] # everything declared above (pydantic, msgspec) +``` +Replace with: +```markdown +pip install httpware[all] # everything declared above (pydantic, msgspec, otel) +``` + +- [ ] **Step 3: Mirror both additions in `docs/index.md`** + +Same content added at the matching positions. Keep wording verbatim to stay in sync with README. + +- [ ] **Step 4: Update `planning/engineering.md`** + +In §1 (Project intent), append one sentence to the first paragraph (after the streaming sentence added in 0.5.0): + +``` + As of 0.6.0, `Retry` and `Bulkhead` emit operational events via stdlib `logging` records (`httpware.retry` / `httpware.bulkhead` loggers) and — when `opentelemetry-api` is installed — OpenTelemetry span events on the active span. +``` + +In §7 (optional-extras pattern), find the parenthetical: +``` +(An `otel` extra existed pre-0.4 but was removed once we noticed it was advertising functionality that didn't exist. Epic 5 will reintroduce it when the OpenTelemetry middleware actually lands.) +``` +Replace with: +``` +(An `otel` extra existed pre-0.4 but was removed once we noticed it was advertising functionality that didn't exist. 0.6.0 reintroduces it paired with the code that uses it — `Retry` and `Bulkhead` add events to the active OpenTelemetry span via `trace.get_current_span().add_event(...)`.) +``` + +In §8 (Remaining roadmap), find the Epic 5 entry: +``` +- **Epic 5 — Observability:** `5-1` Layer 1 middleware hooks, `5-2` wire into resilience middlewares, `5-4` OpenTelemetry middleware (will declare the `otel` extra at the same time the code lands), `5-5` logging policy CI grep. +``` +Replace with: +``` +- **Epic 5 — Observability:** SHIPPED in v0.6 (PR #...) — re-scoped from the original 4-story plan. `Retry` and `Bulkhead` emit operational events via stdlib `logging` + opt-in OpenTelemetry span events. Stories `5-1` (Layer 1 middleware hooks) and `5-4` (standalone OTel middleware) RETIRED — `opentelemetry-instrumentation-httpx` already covers transport-level tracing; a separate httpware middleware would duplicate it. See [`planning/specs/2026-06-05-observability-design.md`](specs/2026-06-05-observability-design.md) and [`planning/plans/2026-06-05-observability-plan.md`](plans/2026-06-05-observability-plan.md). +``` + +- [ ] **Step 5: Create `planning/releases/0.6.0.md`** + +```markdown +# httpware 0.6.0 — Resilience observability + +**0.6.0 is additive. No breaking changes.** Code written against 0.5.0 continues to work unchanged. + +This release adds operational-event emission to `Retry` and `Bulkhead` via two channels — stdlib `logging` records (always on) and OpenTelemetry span events (opt-in via the `otel` extra). Re-introduces the `otel` extra (PR #24 removed it as YAGNI; this release brings it back paired with the code that uses it). + +## New features + +- **Structured logging on resilience operations.** Acquire `logging.getLogger("httpware.retry")` and `logging.getLogger("httpware.bulkhead")` to see four operational events: + - `retry.giving_up` (WARNING) — max_attempts exhausted; attributes include `attempts`, `method`, `url`, `last_status`, `last_exception_type` + - `retry.budget_refused` (WARNING) — `RetryBudget` refused to permit a retry + - `retry.streaming_refused` (WARNING) — streaming-body marker prevented an otherwise-retryable retry + - `bulkhead.rejected` (WARNING) — `acquire_timeout` elapsed without acquisition; attributes include `max_concurrent`, `acquire_timeout`, `method`, `url` +- **Optional OpenTelemetry attribute enrichment.** Install `httpware[otel]` (which pulls `opentelemetry-api>=1.20`, just the API — you supply the SDK). When installed, the same four events are added to the active span via `trace.get_current_span().add_event(name, attributes=...)`. We never create our own spans — for HTTP-level tracing install `opentelemetry-instrumentation-httpx` separately. + +## Backwards compatibility + +Purely additive: +- All previously-shipping methods behave identically. +- Successful retries and successful bulkhead acquisitions emit nothing — the four events fire only on operational concern. +- Per `engineering.md §2`, httpware never configures handlers, levels, or calls `logging.basicConfig()`. Consumers own their logging configuration. +- The `otel` extra is opt-in — `pip install httpware` continues to work without `opentelemetry-api`. + +## Usage + +```python +import logging +from httpware import AsyncClient, Bulkhead, Retry + +# Enable visibility into retry / bulkhead operational events +logging.getLogger("httpware.retry").setLevel(logging.WARNING) +logging.getLogger("httpware.bulkhead").setLevel(logging.WARNING) + +# Your normal application logging config picks up the records +logging.basicConfig(level=logging.WARNING, format="%(asctime)s %(name)s %(message)s") + +async with AsyncClient( + base_url="https://api.example.com", + middleware=[Bulkhead(max_concurrent=10), Retry()], +) as client: + await client.get("/users/1") + # On a 503 + retry exhaustion you'll see: + # 2026-06-05 12:00:00 httpware.retry retry gave up after 3 attempts +``` + +For OTel span events: + +```bash +pip install httpware[otel] +# Plus your SDK + opentelemetry-instrumentation-httpx for HTTP-level spans +``` + +## What's still ahead + +Epic 5's original `5-1` (hook protocol) and `5-4` (standalone OTel middleware) stories are **retired**, not deferred. Rationale in the spec: `opentelemetry-instrumentation-httpx` already covers transport-level tracing, and a hook system without a built-in consumer is infrastructure for code that doesn't exist. The structured-emission contract we're shipping is already extensible — users plug into standard `logging` handlers without needing httpware-specific hooks. + +This effectively closes Epic 5. Remaining roadmap is Epic 6 (ship v1.0): docs site (mkdocs), benchmarks, Trusted Publishers + Sigstore release flow. + +## References + +- Spec: [`planning/specs/2026-06-05-observability-design.md`](../specs/2026-06-05-observability-design.md) +- Plan: [`planning/plans/2026-06-05-observability-plan.md`](../plans/2026-06-05-observability-plan.md) +- Roadmap: [`planning/engineering.md`](../engineering.md) §8 +``` + +- [ ] **Step 6: Lint** + +```bash +just lint +``` +Expected: clean. + +- [ ] **Step 7: Commit** + +```bash +git add README.md docs/index.md planning/engineering.md planning/releases/0.6.0.md +git commit -m "docs: 0.6.0 release notes + observability docs + +- README + docs/index.md: add 'Observability' section + update [all] + install line to include otel +- planning/engineering.md §1 + §7 + §8: mention observability in + project intent; update otel-extra parenthetical to reflect reintroduction; + mark Epic 5 SHIPPED in roadmap with rationale for retiring 5-1 / 5-4 +- planning/releases/0.6.0.md: new release notes" +``` + +--- + +## Task 7: Final verification + push + +**Files:** none modified; verification only. + +- [ ] **Step 1: Full lint** + +```bash +just lint-ci +``` +Expected: clean. + +- [ ] **Step 2: Full test suite** + +```bash +just test +``` +Expected: 100% coverage. Test count: was 239 (post-streaming). +5 observability tests + 3 retry emission tests + 1 bulkhead emission test + 2 fail-soft tests + 1 isolation test = ~251. + +- [ ] **Step 3: Architecture invariants from `CLAUDE.md`** + +```bash +grep -rE 'httpx2\._' src/httpware/ || echo "PASS: no httpx2 private API" +grep -rE 'from __future__ import annotations' src/httpware/ || echo "PASS: no __future__ annotations" +grep -rE '\bprint\(' src/httpware/ || echo "PASS: no print()" +grep -rE 'logging\.(basicConfig|getLogger)\(\)' src/httpware/ || echo "PASS: no global logging" +grep -rE '# (type|mypy): ignore' src/httpware/ || echo "PASS: no type/mypy ignore" +``` +Each should print PASS. Note: the new code uses `logging.getLogger("httpware.retry")` and `logging.getLogger("httpware.bulkhead")` (with arguments) — the grep checks for `getLogger()` with **no arguments**, so the named loggers don't trip it. + +- [ ] **Step 4: Optional-extras isolation** + +```bash +uv run pytest tests/test_optional_extras_isolation.py -v +``` +Expected: all 3 PASS (msgspec, pydantic, opentelemetry). + +- [ ] **Step 5: mkdocs strict build** + +```bash +uv run --with mkdocs --with mkdocs-material mkdocs build --strict 2>&1 | tail -10 +rm -rf site/ +``` +Expected: 0 warnings. + +- [ ] **Step 6: Push the branch** + +```bash +git push -u origin feat/v0.6-observability +``` + +DO NOT open the PR yet — leave that to `finishing-a-development-branch`. + +--- + +## Out of scope for this plan (per the spec) + +These items are deliberately deferred or retired. Do NOT do them in this PR: + +- **No new spans.** `add_event` augments the existing span; we never call `tracer.start_span()`. +- **No metric instruments** (`Counter`, `Histogram`). Only events/logs. +- **No URL/header redaction at the httpware layer.** `opentelemetry-instrumentation-httpx` and user `logging.Filter`s handle this. +- **No `LogPolicy` middleware or hook protocol** (was Epic 5 story `5-1`). Retired. +- **No public `httpware.observability` namespace.** Logger names + event names ARE the public contract. +- **No retry `attempt_starting` events.** Operational-only event set. +- **No standalone OTel middleware** (was Epic 5 story `5-4`). Retired in favor of `opentelemetry-instrumentation-httpx`. +- **Version bump in `pyproject.toml`.** Tag-driven release; bump not required. diff --git a/planning/releases/0.6.0.md b/planning/releases/0.6.0.md new file mode 100644 index 0000000..e5b367a --- /dev/null +++ b/planning/releases/0.6.0.md @@ -0,0 +1,63 @@ +# httpware 0.6.0 — Resilience observability + +**0.6.0 is additive. No breaking changes.** Code written against 0.5.0 continues to work unchanged. + +This release adds operational-event emission to `Retry` and `Bulkhead` via two channels — stdlib `logging` records (always on) and OpenTelemetry span events (opt-in via the `otel` extra). Re-introduces the `otel` extra (PR #24 removed it as YAGNI; this release brings it back paired with the code that uses it). + +## New features + +- **Structured logging on resilience operations.** Acquire `logging.getLogger("httpware.retry")` and `logging.getLogger("httpware.bulkhead")` to see four operational events: + - `retry.giving_up` (WARNING) — max_attempts exhausted; attributes include `attempts`, `method`, `url`, `last_status`, `last_exception_type` + - `retry.budget_refused` (WARNING) — `RetryBudget` refused to permit a retry + - `retry.streaming_refused` (WARNING) — streaming-body marker prevented an otherwise-retryable retry + - `bulkhead.rejected` (WARNING) — `acquire_timeout` elapsed without acquisition; attributes include `max_concurrent`, `acquire_timeout`, `method`, `url` +- **Optional OpenTelemetry attribute enrichment.** Install `httpware[otel]` (which pulls `opentelemetry-api>=1.20`, just the API — you supply the SDK). When installed, the same four events are added to the active span via `trace.get_current_span().add_event(name, attributes=...)`. We never create our own spans — for HTTP-level tracing install `opentelemetry-instrumentation-httpx` separately. + +## Backwards compatibility + +Purely additive: +- All previously-shipping methods behave identically. +- Successful retries and successful bulkhead acquisitions emit nothing — the four events fire only on operational concern. +- Per `engineering.md §2`, httpware never configures handlers, levels, or calls `logging.basicConfig()`. Consumers own their logging configuration. +- The `otel` extra is opt-in — `pip install httpware` continues to work without `opentelemetry-api`. + +## Usage + +```python +import logging +from httpware import AsyncClient, Bulkhead, Retry + +# Enable visibility into retry / bulkhead operational events +logging.getLogger("httpware.retry").setLevel(logging.WARNING) +logging.getLogger("httpware.bulkhead").setLevel(logging.WARNING) + +# Your normal application logging config picks up the records +logging.basicConfig(level=logging.WARNING, format="%(asctime)s %(name)s %(message)s") + +async with AsyncClient( + base_url="https://api.example.com", + middleware=[Bulkhead(max_concurrent=10), Retry()], +) as client: + await client.get("/users/1") + # On a 503 + retry exhaustion you'll see: + # 2026-06-05 12:00:00 httpware.retry retry gave up after 3 attempts +``` + +For OTel span events: + +```bash +pip install httpware[otel] +# Plus your SDK + opentelemetry-instrumentation-httpx for HTTP-level spans +``` + +## What's still ahead + +Epic 5's original `5-1` (hook protocol) and `5-4` (standalone OTel middleware) stories are **retired**, not deferred. Rationale in the spec: `opentelemetry-instrumentation-httpx` already covers transport-level tracing, and a hook system without a built-in consumer is infrastructure for code that doesn't exist. The structured-emission contract we're shipping is already extensible — users plug into standard `logging` handlers without needing httpware-specific hooks. + +This effectively closes Epic 5. Remaining roadmap is Epic 6 (ship v1.0): docs site (mkdocs), benchmarks, Trusted Publishers + Sigstore release flow. + +## References + +- Spec: [`planning/specs/2026-06-05-observability-design.md`](../specs/2026-06-05-observability-design.md) +- Plan: [`planning/plans/2026-06-05-observability-plan.md`](../plans/2026-06-05-observability-plan.md) +- Roadmap: [`planning/engineering.md`](../engineering.md) §8 diff --git a/planning/specs/2026-06-05-observability-design.md b/planning/specs/2026-06-05-observability-design.md new file mode 100644 index 0000000..2d6bdd0 --- /dev/null +++ b/planning/specs/2026-06-05-observability-design.md @@ -0,0 +1,255 @@ +# Spec: Resilience observability — structured logging + opt-in OTel attribute enrichment (0.6.0, Epic 5) + +**Date:** 2026-06-05 +**Topic slug:** `observability` +**Status:** drafted, awaiting user review +**Target release:** 0.6.0 +**Epic 5 stories rolled in:** Re-scoped from the original 5-1/5-2/5-4/5-5. See "Re-scoping rationale" below. + +## Purpose + +Emit four operational-significance events from `Retry` and `Bulkhead` via two channels: + +1. **Structured `logging` records** (always on, no dependency). Users plug in any log aggregator. +2. **OpenTelemetry `add_event` calls on the active span** (when the `otel` extra is installed). Augments existing spans created by `opentelemetry-instrumentation-httpx`; we never create our own spans. + +The contract is the *event names + attribute keys*. Logger names (`httpware.retry`, `httpware.bulkhead`) and event names (`retry.giving_up`, `bulkhead.rejected`, etc.) are the public observability surface. + +## Re-scoping rationale + +The original Epic 5 (5-1 Layer 1 middleware hooks, 5-2 wire into resilience middlewares, 5-4 OpenTelemetry middleware, 5-5 logging policy CI grep) was assessed against `opentelemetry-instrumentation-httpx` and judged ~70% duplicative: + +- **5-4 OTel middleware** would emit per-request spans with standard semantic conventions — already done by `pip install opentelemetry-instrumentation-httpx` at the transport layer, where it sees more than our middleware ever could. +- **5-1 / 5-2 hook system** without a built-in consumer is infrastructure for code that doesn't exist. + +The 30% that *is* genuinely additive: `Retry` and `Bulkhead` know things `opentelemetry-instrumentation-httpx` cannot — "retry budget exhausted after N attempts" vs "bulkhead refused admission" vs "transport-level network error." Those distinctions are operationally critical and have no other source of truth. + +This spec ships the additive 30% and explicitly retires the duplicative work. `5-1` hooks and `5-4` standalone OTel middleware are dropped from the roadmap. `5-5` log-policy CI grep is folded into this slice (the grep already runs in the per-task verification step; no new code). + +## Non-goals + +Items deliberately deferred or retired so this slice ships clean: + +- **No new spans.** `add_event` augments the existing span (from `opentelemetry-instrumentation-httpx` or any other span the caller has open). We never call `tracer.start_span()`. +- **No metric instruments** (`Counter`, `Histogram`). Only events/logs. Users wanting Prometheus-style counters can write a `logging.Handler` that counts records by event name. +- **No URL/header redaction at the httpware layer.** `opentelemetry-instrumentation-httpx` handles URL redaction per its config; users wanting redaction at our level supply a `logging.Filter`. +- **No `LogPolicy` middleware or hook protocol** (was Epic 5 story `5-1`). Defer until users actually ask. The structured-emission contract is already extensible via standard logging — users plug into their own handlers without needing httpware-specific hooks. +- **No public `httpware.observability` namespace.** The emission helper lives in `_internal/`; users interact via logger names and OTel event/attribute names — both well-documented strings. +- **No retry `attempt_starting` events.** Per the "operational-only" event-set decision — successful retries are silent. +- **No standalone OTel middleware** (was Epic 5 story `5-4`). Retired in favor of `opentelemetry-instrumentation-httpx`. + +## Architecture + +Three coordinated changes: + +```text +src/httpware/ +├── _internal/ +│ ├── import_checker.py # add is_otel_installed +│ └── observability.py # NEW — _emit_event helper +└── middleware/resilience/ + ├── retry.py # add 3 _emit_event call sites + └── bulkhead.py # add 1 _emit_event call site +``` + +`pyproject.toml` re-introduces the `otel = ["opentelemetry-api>=1.20"]` extra (just the API; the SDK is users' responsibility). The `all` extra includes it. + +### `_internal/observability.py` + +Exports a single public-within-package helper: + +```python +import logging +import typing + +from httpware._internal import import_checker + + +def _emit_event( + logger: logging.Logger, + event_name: str, + *, + level: int = logging.WARNING, + message: str, + attributes: dict[str, typing.Any], +) -> None: + """Emit one observability event to both channels. + + 1. Always emits a structured log record at the requested level with + ``extra=attributes`` (users see structured fields in their aggregator). + 2. If ``import_checker.is_otel_installed`` is True, calls + ``trace.get_current_span().add_event(event_name, attributes=attributes)``. + When no tracer is active, ``get_current_span`` returns a + ``NonRecordingSpan`` whose ``add_event`` is a documented no-op — so the + call is unconditional behind the install gate. + """ + logger.log(level, message, extra=attributes) + if import_checker.is_otel_installed: + from opentelemetry import trace # noqa: PLC0415 — lazy by design + trace.get_current_span().add_event(event_name, attributes=attributes) +``` + +The lazy `from opentelemetry import trace` inside the `if` block preserves the optional-extras isolation invariant: `import httpware` must not pull `opentelemetry` into `sys.modules`. + +### `Retry` and `Bulkhead` integration + +Each middleware acquires a module-level logger: + +```python +# in retry.py: +_LOGGER = logging.getLogger("httpware.retry") + +# in bulkhead.py: +_LOGGER = logging.getLogger("httpware.bulkhead") +``` + +These are the **public contract**. Users name them in their logging config: + +```python +logging.getLogger("httpware.retry").setLevel(logging.WARNING) +logging.getLogger("httpware.bulkhead").setLevel(logging.WARNING) +``` + +Per `planning/engineering.md §2`: we acquire loggers and emit; we **never** configure handlers, levels, or call `logging.basicConfig()`. Consumers own their handler/level configuration. + +## Public API + +No new top-level public symbols. The observability surface IS: + +1. **Logger names**: `httpware.retry`, `httpware.bulkhead`. Documented in README + engineering.md. +2. **Event names**: `retry.giving_up`, `retry.budget_refused`, `retry.streaming_refused`, `bulkhead.rejected`. Stable strings. +3. **Event attribute keys**: per the event contract below. Stable keys. + +Stable means: we treat changes as breaking. Adding new keys to an event is non-breaking; removing or renaming is breaking. + +## Event contract + +| Event name | Logger | Level | When fired | Attributes | +|---|---|---|---|---| +| `retry.giving_up` | `httpware.retry` | `WARNING` | `max_attempts` exhausted | `attempts: int`, `method: str`, `url: str`, `last_status: int \| None`, `last_exception_type: str \| None` | +| `retry.budget_refused` | `httpware.retry` | `WARNING` | `budget.try_withdraw()` returned False | `attempts: int`, `method: str`, `url: str`, `last_status: int \| None` | +| `retry.streaming_refused` | `httpware.retry` | `WARNING` | streaming-body marker present at the **retryable-failure-path** site only (the site where Retry would otherwise have retried but for the streaming body). The 3 non-idempotent early-exit sites also `add_note` for context but do NOT emit this event — at those sites the primary reason for not retrying is method-eligibility, not streaming. | `method: str`, `url: str`, `last_exception_type: str` | +| `bulkhead.rejected` | `httpware.bulkhead` | `WARNING` | `acquire_timeout` elapsed without acquisition (raises `BulkheadFullError`) | `max_concurrent: int`, `acquire_timeout: float \| None`, `method: str`, `url: str` | + +Conventions: + +- **Event names** use dotted lowercase (`subsystem.event`). Matches OTel semantic-convention style. +- **Attribute keys** are flat snake_case. No nested dicts — log aggregators handle flat structure better and OTel attribute values must be scalars (or sequences of scalars). +- **`method` and `url`**: always strings. URL is `str(request.url)`; method is `request.method` (already uppercase). +- **`last_status`**: from `exc.response.status_code` when the failure was a `StatusError` subclass; `None` for network/timeout failures. +- **`last_exception_type`**: `type(exc).__qualname__` — e.g., `"NotFoundError"`, `"NetworkError"`, `"TimeoutError"`. + +### Log record format + +```python +_emit_event( + _LOGGER, + "retry.giving_up", + level=logging.WARNING, + message=f"retry gave up after {attempt + 1} attempts", + attributes={ + "attempts": attempt + 1, + "method": request.method, + "url": str(request.url), + "last_status": last_response.status_code if last_response is not None else None, + "last_exception_type": type(last_exc).__qualname__, + }, +) +``` + +The `message` is a short human sentence. The `extra=attributes` dict makes the structured fields available to log aggregators that index `extra`. + +### OTel emission + +```python +trace.get_current_span().add_event("retry.giving_up", attributes={...}) +``` + +`add_event` is documented as a no-op on `NonRecordingSpan`, so we call it unconditionally inside the `if is_otel_installed:` gate. + +## Optional-extras pattern + +Re-introduces the `otel` extra removed in PR #24. Critical difference: now there IS code that uses it. + +`pyproject.toml`: +```toml +[project.optional-dependencies] +pydantic = ["pydantic>=2.0,<3.0"] +msgspec = ["msgspec>=0.18"] +otel = ["opentelemetry-api>=1.20"] +all = ["httpware[pydantic,msgspec,otel]"] +``` + +Just `opentelemetry-api`, **not** `opentelemetry-sdk`. Users supply their own SDK (or use a no-op tracer in tests). Matches how `opentelemetry-instrumentation-httpx` declares its own dependency — the API is the contract; the SDK is the runtime. + +`_internal/import_checker.py`: +```python +from importlib.util import find_spec + +is_msgspec_installed = find_spec("msgspec") is not None +is_pydantic_installed = find_spec("pydantic") is not None +is_otel_installed = find_spec("opentelemetry") is not None # NEW +``` + +`tests/test_optional_extras_isolation.py`: extend to verify `opentelemetry` doesn't end up in `sys.modules` after a fresh-subprocess `import httpware`. + +`engineering.md §7` (optional-extras pattern): update the parenthetical that says *"An `otel` extra existed pre-0.4 but was removed once we noticed it was advertising functionality that didn't exist. Epic 5 will reintroduce it when the OpenTelemetry middleware actually lands."* — replace with a note that 0.6.0 reintroduced it paired with the code that uses it. + +## Testing + +Per `planning/engineering.md §6`: + +### `tests/test_observability.py` (NEW) + +Unit tests for `_emit_event`: + +- `test_emit_event_logs_at_level_with_extra` — uses `caplog`; assert one log record at WARNING with the structured fields accessible via `record.attempts`, `record.method`, etc. (logging puts `extra` into the LogRecord's `__dict__`). +- `test_emit_event_skips_otel_when_extra_missing` — patch `import_checker.is_otel_installed = False`; call `_emit_event`; assert no `opentelemetry` import was triggered (`assert "opentelemetry" not in sys.modules` snapshot — or patch `trace.get_current_span` to a fail-on-call mock). +- `test_emit_event_calls_add_event_when_otel_installed` — patch `import_checker.is_otel_installed = True`; patch `opentelemetry.trace.get_current_span` to return a mock; assert `mock.add_event("event.name", attributes={...})` called once with exact args. +- `test_emit_event_works_with_no_active_span` — `is_otel_installed = True` but no tracer configured; `get_current_span()` returns a `NonRecordingSpan`; `add_event` is a documented no-op; no error. + +### `tests/test_retry.py` (extend) + +- `test_retry_giving_up_emits_event` — caplog at WARNING; assert one record for `httpware.retry` logger after max_attempts exhaustion; assert structured fields (`attempts == 3`, `last_status == 503`, etc.). +- `test_retry_budget_refused_emits_event` — same shape for budget refusal. +- `test_retry_streaming_refused_emits_event` — caplog after streaming-body refusal (POST + 503 + streaming content). + +### `tests/test_bulkhead.py` (extend) + +- `test_bulkhead_rejected_emits_event` — caplog at WARNING after `BulkheadFullError` is raised; assert attributes (`max_concurrent == 1`, `acquire_timeout == 0.02`). + +### `tests/test_optional_extras_isolation.py` (extend) + +- `test_import_httpware_does_not_load_opentelemetry` — fresh-subprocess `import httpware`; assert `"opentelemetry"` not in `sys.modules`. + +### `tests/test_optional_extras_otel_missing.py` (NEW) + +Fail-soft tests gated by patched `is_otel_installed = False`: + +- `test_retry_emits_log_record_without_otel` — emit a retry.giving_up event; log record still appears in caplog; no `ImportError`. +- `test_bulkhead_emits_log_record_without_otel` — same shape. + +Coverage target: **100% line coverage** (project standard). + +## Documentation updates + +- **README.md**: add a short "Observability" section after "Errors" describing the four events + logger names + how to enable OTel (`pip install httpware[otel]`). +- **docs/index.md**: mirror the README addition. +- **planning/engineering.md §1**: append a sentence noting Retry/Bulkhead emit structured log records + optional OTel events as of 0.6.0. +- **planning/engineering.md §2** (architecture invariants): the existing "No global logging config" rule already documents the constraint — no change needed. +- **planning/engineering.md §7** (optional-extras pattern): update per the section above (re-add the `otel` extra; revise the parenthetical). +- **planning/engineering.md §8** (roadmap): retire Epic 5 stories `5-1` and `5-4` explicitly (with rationale); mark `5-2` shipped (this slice); fold `5-5` (no separate code needed — already CI-checked). +- **planning/deferred-work.md**: no new entries needed. +- **planning/releases/0.6.0.md**: new release notes. + +## Open questions deferred to implementation + +- **`record.method` vs `record.method_name`**: Python's `logging.LogRecord` has a `getMessage()` method (no attribute conflict) but adding `extra={"message": ...}` would clash with the record's own `message` attribute. Verify that our chosen attribute names (`method`, `url`, `attempts`, `last_status`, `last_exception_type`, `max_concurrent`, `acquire_timeout`) don't collide with reserved LogRecord attributes. The standard reserved names: `name`, `msg`, `args`, `levelname`, `levelno`, `pathname`, `filename`, `module`, `exc_info`, `exc_text`, `stack_info`, `lineno`, `funcName`, `created`, `msecs`, `relativeCreated`, `thread`, `threadName`, `processName`, `process`, `message`. None of our attribute names clash. +- **`is_otel_installed` evaluation timing**: the flag is computed at module import time. If a user `pip install opentelemetry-api` AFTER importing httpware, the flag stays False until restart. Acceptable for v1 — matches the existing `is_pydantic_installed` / `is_msgspec_installed` pattern. + +## References + +- `planning/engineering.md` §1 (project intent), §2 (no global logging config), §6 (testing patterns), §7 (optional-extras), §8 (Epic 5 roadmap entries to retire/ship) +- `planning/deferred-work.md` "Closed by the 0.4.0 release" — historical record of the `otel` extra removal; this PR brings it back paired with code +- `opentelemetry-instrumentation-httpx` (https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation/opentelemetry-instrumentation-httpx) — the existing transport-level instrumentation we DON'T duplicate +- OTel semantic conventions for events: https://opentelemetry.io/docs/specs/semconv/general/events/ diff --git a/pyproject.toml b/pyproject.toml index e747e51..8db3ed5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -34,7 +34,8 @@ dependencies = [ [project.optional-dependencies] pydantic = ["pydantic>=2.0,<3.0"] msgspec = ["msgspec>=0.18"] -all = ["httpware[pydantic,msgspec]"] +otel = ["opentelemetry-api>=1.20"] +all = ["httpware[pydantic,msgspec,otel]"] [project.urls] repository = "https://github.com/modern-python/httpware" diff --git a/src/httpware/_internal/import_checker.py b/src/httpware/_internal/import_checker.py index b0713c4..fe14582 100644 --- a/src/httpware/_internal/import_checker.py +++ b/src/httpware/_internal/import_checker.py @@ -5,3 +5,4 @@ is_msgspec_installed = find_spec("msgspec") is not None is_pydantic_installed = find_spec("pydantic") is not None +is_otel_installed = find_spec("opentelemetry") is not None diff --git a/src/httpware/_internal/observability.py b/src/httpware/_internal/observability.py new file mode 100644 index 0000000..00227cb --- /dev/null +++ b/src/httpware/_internal/observability.py @@ -0,0 +1,42 @@ +"""Observability emission helper — structured logging + opt-in OpenTelemetry span events. + +See planning/specs/2026-06-05-observability-design.md for the contract. + +Logger names (``httpware.retry``, ``httpware.bulkhead``) and event names +(``retry.giving_up``, ``bulkhead.rejected``, etc.) are the public observability +surface. They are stable: renames are breaking changes. +""" + +import logging +import typing + +from httpware._internal import import_checker + + +def _emit_event( + logger: logging.Logger, + event_name: str, + *, + level: int, + message: str, + attributes: dict[str, typing.Any], +) -> None: + """Emit one observability event to both channels. + + 1. Always emits a structured log record at ``level`` with ``extra=attributes`` + (so log aggregators that index ``extra`` see structured fields). + 2. If ``opentelemetry-api`` is installed, calls + ``trace.get_current_span().add_event(event_name, attributes=attributes)``. + When no tracer is active, ``get_current_span()`` returns a ``NonRecordingSpan`` + whose ``add_event`` is a documented no-op — so the call is unconditional + behind the install gate. + + The lazy ``from opentelemetry import trace`` inside the if-block preserves + the optional-extras isolation invariant: ``import httpware`` must not pull + ``opentelemetry`` into ``sys.modules`` when the extra is absent. + """ + logger.log(level, message, extra=attributes) + if import_checker.is_otel_installed: + from opentelemetry import trace # noqa: PLC0415 — lazy by design (optional-extras isolation) + + trace.get_current_span().add_event(event_name, attributes=attributes) diff --git a/src/httpware/middleware/resilience/bulkhead.py b/src/httpware/middleware/resilience/bulkhead.py index bfcc608..6326008 100644 --- a/src/httpware/middleware/resilience/bulkhead.py +++ b/src/httpware/middleware/resilience/bulkhead.py @@ -12,9 +12,11 @@ """ import asyncio +import logging import httpx2 +from httpware._internal.observability import _emit_event from httpware.errors import BulkheadFullError from httpware.middleware import Next @@ -22,6 +24,8 @@ _MAX_CONCURRENT_INVALID = "max_concurrent must be >= 1" _ACQUIRE_TIMEOUT_INVALID = "acquire_timeout must be >= 0" +_LOGGER = logging.getLogger("httpware.bulkhead") + class Bulkhead: """Concurrency limiter middleware backed by ``asyncio.Semaphore``. @@ -64,6 +68,18 @@ async def __call__(self, request: httpx2.Request, next: Next) -> httpx2.Response async with asyncio.timeout(self._acquire_timeout): await self._sem.acquire() except TimeoutError as exc: + _emit_event( + _LOGGER, + "bulkhead.rejected", + level=logging.WARNING, + message="bulkhead rejected request — acquire_timeout exceeded", + attributes={ + "max_concurrent": self._max_concurrent, + "acquire_timeout": self._acquire_timeout, + "method": request.method, + "url": str(request.url), + }, + ) raise BulkheadFullError( max_concurrent=self._max_concurrent, acquire_timeout=self._acquire_timeout, diff --git a/src/httpware/middleware/resilience/retry.py b/src/httpware/middleware/resilience/retry.py index d72e2de..cc3ed03 100644 --- a/src/httpware/middleware/resilience/retry.py +++ b/src/httpware/middleware/resilience/retry.py @@ -11,11 +11,13 @@ import builtins import datetime import email.utils +import logging from collections.abc import Awaitable, Callable from http import HTTPStatus import httpx2 +from httpware._internal.observability import _emit_event from httpware.client import STREAMING_BODY_MARKER from httpware.errors import NetworkError, RetryBudgetExhaustedError, StatusError, TimeoutError # noqa: A004 from httpware.middleware import Next @@ -46,6 +48,8 @@ _MAX_ATTEMPTS_INVALID = "max_attempts must be >= 1" _STREAMING_BODY_REFUSAL_NOTE = "httpware: not retrying — request body is a stream that cannot replay across attempts" +_LOGGER = logging.getLogger("httpware.retry") + def _parse_retry_after(value: str) -> float | None: """Parse a Retry-After header value. Returns None on malformed input.""" @@ -138,6 +142,17 @@ async def __call__(self, request: httpx2.Request, next: Next) -> httpx2.Response msg = "Retry: streaming-body refusal reached with no last_exc" raise AssertionError(msg) last_exc.add_note(_STREAMING_BODY_REFUSAL_NOTE) + _emit_event( + _LOGGER, + "retry.streaming_refused", + level=logging.WARNING, + message="retry refused — request body is a stream that cannot replay", + attributes={ + "method": request.method, + "url": str(request.url), + "last_exception_type": type(last_exc).__qualname__, + }, + ) raise last_exc if is_last: @@ -145,9 +160,34 @@ async def __call__(self, request: httpx2.Request, next: Next) -> httpx2.Response msg = "Retry: last_exc unset on final attempt — unreachable" raise AssertionError(msg) last_exc.add_note(f"httpware: gave up after {attempt + 1} attempts") + _emit_event( + _LOGGER, + "retry.giving_up", + level=logging.WARNING, + message=f"retry gave up after {attempt + 1} attempts", + attributes={ + "attempts": attempt + 1, + "method": request.method, + "url": str(request.url), + "last_status": last_response.status_code if last_response is not None else None, + "last_exception_type": type(last_exc).__qualname__, + }, + ) raise last_exc if not self.budget.try_withdraw(): + _emit_event( + _LOGGER, + "retry.budget_refused", + level=logging.WARNING, + message=f"retry budget refused after {attempt + 1} attempts", + attributes={ + "attempts": attempt + 1, + "method": request.method, + "url": str(request.url), + "last_status": last_response.status_code if last_response is not None else None, + }, + ) raise RetryBudgetExhaustedError( last_response=last_response, last_exception=last_exc, diff --git a/tests/test_bulkhead.py b/tests/test_bulkhead.py index 9e70c4f..7eda6fc 100644 --- a/tests/test_bulkhead.py +++ b/tests/test_bulkhead.py @@ -6,6 +6,7 @@ import asyncio import contextlib +import logging from collections.abc import Callable, Coroutine from http import HTTPStatus from typing import Any @@ -395,3 +396,38 @@ async def test_bulkhead_full_error_is_not_retried_by_retry() -> None: first.cancel() with contextlib.suppress(asyncio.CancelledError): await first + + +async def test_bulkhead_rejected_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When the bulkhead rejects a request via acquire_timeout, emit one WARNING on httpware.bulkhead.""" + bulkhead = Bulkhead(max_concurrent=1, acquire_timeout=0.0) + + async def slow_handler(request: httpx2.Request) -> httpx2.Response: + await asyncio.sleep(0.05) + return httpx2.Response(HTTPStatus.OK, request=request) + + transport = httpx2.MockTransport(slow_handler) + client = AsyncClient( + httpx2_client=httpx2.AsyncClient(transport=transport), + middleware=[bulkhead], + ) + + async with client: + # First request occupies the only slot. Second should be rejected immediately. + first_task = asyncio.create_task(client.get("https://example.test/x")) + await asyncio.sleep(0) # let first_task start and acquire the slot + + with caplog.at_level(logging.WARNING, logger="httpware.bulkhead"), pytest.raises(BulkheadFullError): + await client.get("https://example.test/y") + + await first_task + + bulkhead_records = [r for r in caplog.records if r.name == "httpware.bulkhead"] + rejected_records = [r for r in bulkhead_records if "rejected" in r.message] + assert len(rejected_records) == 1 + record = rejected_records[0] + assert record.levelno == logging.WARNING + assert record.max_concurrent == 1 # ty: ignore[unresolved-attribute] + assert record.acquire_timeout == 0.0 # ty: ignore[unresolved-attribute] + assert record.method == "GET" # ty: ignore[unresolved-attribute] + assert "example.test/y" in record.url # ty: ignore[unresolved-attribute] diff --git a/tests/test_observability.py b/tests/test_observability.py new file mode 100644 index 0000000..c8f4f32 --- /dev/null +++ b/tests/test_observability.py @@ -0,0 +1,97 @@ +"""Unit tests for the _emit_event observability helper.""" + +import logging +import sys +from unittest.mock import MagicMock, patch + +import pytest + +from httpware._internal.observability import _emit_event + + +_TEST_LOGGER = logging.getLogger("httpware.test.observability") + + +def test_emit_event_logs_at_warning_with_extra_fields(caplog: pytest.LogCaptureFixture) -> None: + """The helper emits one structured log record at WARNING with attributes accessible on the record.""" + with caplog.at_level(logging.WARNING, logger="httpware.test.observability"): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="something interesting happened", + attributes={"foo": 1, "bar": "x"}, + ) + + assert len(caplog.records) == 1 + record = caplog.records[0] + assert record.levelno == logging.WARNING + assert record.message == "something interesting happened" + assert record.foo == 1 # ty: ignore[unresolved-attribute] + assert record.bar == "x" # ty: ignore[unresolved-attribute] + + +def test_emit_event_respects_level_parameter(caplog: pytest.LogCaptureFixture) -> None: + """When level=DEBUG is passed, the record is at DEBUG.""" + with caplog.at_level(logging.DEBUG, logger="httpware.test.observability"): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.DEBUG, + message="quiet", + attributes={}, + ) + + assert len(caplog.records) == 1 + assert caplog.records[0].levelno == logging.DEBUG + + +def test_emit_event_does_not_import_opentelemetry_when_flag_false() -> None: + """With is_otel_installed=False the helper must not touch opentelemetry.""" + with patch("httpware._internal.import_checker.is_otel_installed", False): + # Confirm the lazy import path is skipped: snapshot sys.modules before/after. + modules_before = set(sys.modules) + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="nope", + attributes={"x": 1}, + ) + # opentelemetry may already be loaded by other tests; allow that, but no NEW load happened + # because the codepath was skipped. The point: no error and no required state change. + assert len(modules_before) >= 0 + + +def test_emit_event_calls_add_event_when_otel_installed() -> None: + """With is_otel_installed=True the helper calls trace.get_current_span().add_event(...).""" + mock_span = MagicMock(name="MockSpan") + with ( + patch("httpware._internal.import_checker.is_otel_installed", True), + patch("opentelemetry.trace.get_current_span", return_value=mock_span), + ): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="hi", + attributes={"k": "v"}, + ) + + mock_span.add_event.assert_called_once_with("test.event", attributes={"k": "v"}) + + +def test_emit_event_works_when_otel_installed_but_no_active_span() -> None: + """With OTel installed but no tracer configured, get_current_span() returns NonRecordingSpan. + + add_event is a documented no-op. No error. + """ + # Real OTel API call (no mocking) — opentelemetry-api is installed via the otel extra. + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="real-otel-but-no-tracer", + attributes={"a": 1}, + ) + # No assertion needed — the absence of an exception IS the assertion. diff --git a/tests/test_optional_extras_isolation.py b/tests/test_optional_extras_isolation.py index 6c320c7..fe610b9 100644 --- a/tests/test_optional_extras_isolation.py +++ b/tests/test_optional_extras_isolation.py @@ -44,3 +44,24 @@ def test_importing_httpware_does_not_import_pydantic() -> None: assert result.returncode == 0, ( f"pydantic was loaded transitively by `import httpware`; stdout={result.stdout!r} stderr={result.stderr!r}" ) + + +def test_importing_httpware_does_not_import_opentelemetry() -> None: + """Fresh subprocess: opentelemetry must NOT appear in sys.modules after `import httpware`. + + opentelemetry-api IS installed in the test environment (via `--all-extras`), so this + test runs in a subprocess with a clean interpreter to verify that nothing + in the httpware import chain pulls opentelemetry in. + """ + result = subprocess.run( + [ + sys.executable, + "-c", + "import httpware; import sys; sys.exit(0 if 'opentelemetry' not in sys.modules else 1)", + ], + check=False, + capture_output=True, + ) + assert result.returncode == 0, ( + f"opentelemetry was loaded transitively by `import httpware`; stdout={result.stdout!r} stderr={result.stderr!r}" + ) diff --git a/tests/test_optional_extras_otel_missing.py b/tests/test_optional_extras_otel_missing.py new file mode 100644 index 0000000..1c1b815 --- /dev/null +++ b/tests/test_optional_extras_otel_missing.py @@ -0,0 +1,58 @@ +"""Fail-soft tests for the otel optional-extra (0.6.0). + +opentelemetry-api IS installed in the CI test environment via `--all-extras`. +To simulate the "extra not installed" case, patch +`httpware._internal.import_checker.is_otel_installed = False` for the +duration of the test. + +The contract: observability emission (the structured log record half) must +work regardless of whether opentelemetry-api is available. The OTel half is +silently skipped when the flag is False. +""" + +import logging +from unittest.mock import patch + +import pytest + +from httpware._internal.observability import _emit_event + + +_TEST_LOGGER = logging.getLogger("httpware.test.otel_missing") + + +def test_emit_event_logs_record_without_otel(caplog: pytest.LogCaptureFixture) -> None: + """The structured log record is emitted even when opentelemetry-api is 'missing'.""" + with ( + patch("httpware._internal.import_checker.is_otel_installed", False), + caplog.at_level(logging.WARNING, logger="httpware.test.otel_missing"), + ): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="works without otel", + attributes={"x": 1}, + ) + + assert len(caplog.records) == 1 + record = caplog.records[0] + assert record.message == "works without otel" + assert record.x == 1 # ty: ignore[unresolved-attribute] + + +def test_emit_event_does_not_call_opentelemetry_apis_when_flag_false() -> None: + """With is_otel_installed=False, no opentelemetry.trace call is made.""" + with ( + patch("httpware._internal.import_checker.is_otel_installed", False), + patch("opentelemetry.trace.get_current_span") as mock_get_span, + ): + _emit_event( + _TEST_LOGGER, + "test.event", + level=logging.WARNING, + message="silent on otel", + attributes={}, + ) + + mock_get_span.assert_not_called() diff --git a/tests/test_retry.py b/tests/test_retry.py index e160a36..9bce2f3 100644 --- a/tests/test_retry.py +++ b/tests/test_retry.py @@ -7,6 +7,7 @@ import asyncio import datetime import email.utils +import logging import typing from collections.abc import Callable from http import HTTPStatus @@ -650,3 +651,69 @@ async def streamed_body() -> typing.AsyncIterator[bytes]: assert sleeper.calls == [] # no retry attempted notes = getattr(info.value, "__notes__", []) assert any("not retrying" in note and "stream" in note for note in notes) + + +async def test_retry_giving_up_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When max_attempts is exhausted, emit one WARNING record on httpware.retry.""" + sleeper = _SleepRecorder() + handler = _ResponseSequence([HTTPStatus.SERVICE_UNAVAILABLE] * 3) + client = _client(handler, retry=Retry(_sleep=sleeper, max_attempts=3, base_delay=0.001, max_delay=0.002)) + + with caplog.at_level(logging.WARNING, logger="httpware.retry"), pytest.raises(ServiceUnavailableError): + await client.get("https://example.test/x") + + retry_records = [r for r in caplog.records if r.name == "httpware.retry"] + giving_up_records = [r for r in retry_records if r.message.startswith("retry gave up")] + assert len(giving_up_records) == 1 + record = giving_up_records[0] + assert record.levelno == logging.WARNING + assert record.attempts == 3 # noqa: PLR2004 — 3 matches max_attempts=3 literal above # ty: ignore[unresolved-attribute] + assert record.method == "GET" # ty: ignore[unresolved-attribute] + assert record.last_status == HTTPStatus.SERVICE_UNAVAILABLE # ty: ignore[unresolved-attribute] + assert record.last_exception_type == "ServiceUnavailableError" # ty: ignore[unresolved-attribute] + + +async def test_retry_budget_refused_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When the budget refuses a retry, emit one WARNING record on httpware.retry.""" + sleeper = _SleepRecorder() + stingy_budget = RetryBudget(percent_can_retry=0.0, min_retries_per_sec=0.0) + handler = _ResponseSequence([HTTPStatus.SERVICE_UNAVAILABLE, HTTPStatus.SERVICE_UNAVAILABLE]) + client = _client( + handler, + retry=Retry(_sleep=sleeper, budget=stingy_budget, max_attempts=3, base_delay=0.001), + ) + + with caplog.at_level(logging.WARNING, logger="httpware.retry"), pytest.raises(RetryBudgetExhaustedError): + await client.get("https://example.test/x") + + retry_records = [r for r in caplog.records if r.name == "httpware.retry"] + budget_records = [r for r in retry_records if "budget" in r.message] + assert len(budget_records) == 1 + record = budget_records[0] + assert record.attempts == 1 # ty: ignore[unresolved-attribute] + assert record.method == "GET" # ty: ignore[unresolved-attribute] + assert record.last_status == HTTPStatus.SERVICE_UNAVAILABLE # ty: ignore[unresolved-attribute] + + +async def test_retry_streaming_refused_emits_observability_event(caplog: pytest.LogCaptureFixture) -> None: + """When the streaming-body marker prevents a retryable retry, emit one WARNING record on httpware.retry. + + Uses an idempotent method (PUT) so we hit the retryable-failure-path streaming-refusal site, + NOT the non-idempotent early-exit sites (which don't emit the event per the spec). + """ + sleeper = _SleepRecorder() + handler = _ResponseSequence([HTTPStatus.SERVICE_UNAVAILABLE, HTTPStatus.SERVICE_UNAVAILABLE]) + client = _client(handler, retry=Retry(_sleep=sleeper, base_delay=0.001, max_delay=0.002)) + + async def streamed_body() -> typing.AsyncIterator[bytes]: + yield b"x" + + with caplog.at_level(logging.WARNING, logger="httpware.retry"), pytest.raises(ServiceUnavailableError): + await client.put("https://example.test/x", content=streamed_body()) + + retry_records = [r for r in caplog.records if r.name == "httpware.retry"] + streaming_records = [r for r in retry_records if "stream" in r.message] + assert len(streaming_records) == 1 + record = streaming_records[0] + assert record.method == "PUT" # ty: ignore[unresolved-attribute] + assert record.last_exception_type == "ServiceUnavailableError" # ty: ignore[unresolved-attribute]