Skip to content

feat(v0.6): resilience observability — structured logging + opt-in OTel events#27

Merged
lesnik512 merged 8 commits into
mainfrom
feat/v0.6-observability
Jun 5, 2026
Merged

feat(v0.6): resilience observability — structured logging + opt-in OTel events#27
lesnik512 merged 8 commits into
mainfrom
feat/v0.6-observability

Conversation

@lesnik512

@lesnik512 lesnik512 commented Jun 5, 2026

Copy link
Copy Markdown
Member

Summary

Epic 5, re-scoped. Retry and Bulkhead now emit four operational events via two channels — stdlib logging records (always on) and OpenTelemetry span events on the active span (opt-in via the otel extra).

  • Logger names + event names are the public contract: httpware.retry / httpware.bulkhead; events retry.giving_up, retry.budget_refused, retry.streaming_refused, bulkhead.rejected.
  • otel extra reintroduced, paired with the code that uses it. PR chore(deps): drop otel optional extra (YAGNI) #24 had removed it as YAGNI; 0.6.0 brings it back with opentelemetry-api>=1.20 only (no SDK — users supply their own).
  • Optional-extras isolation preserved: the lazy from opentelemetry import trace lives inside an if import_checker.is_otel_installed: gate, and tests/test_optional_extras_isolation.py now verifies import httpware does not pull opentelemetry into sys.modules.
  • Out of scope (per spec): no new spans, no metric instruments, no URL redaction, no LogPolicy hooks, no public httpware.observability namespace, no standalone OTel middleware. Original Epic 5 stories 5-1 and 5-4 retired (rationale in spec).

Spec: planning/specs/2026-06-05-observability-design.md
Plan: planning/plans/2026-06-05-observability-plan.md
Release notes: planning/releases/0.6.0.md

Test Plan

  • `just test` — 251 passed, 100% coverage (was 240, +11 new tests)
  • `just lint-ci` — eof-fixer + ruff format + ruff check + ty all clean
  • Architecture invariants (CI-enforced greps) — no `httpx2._`, no `future` annotations, no `print()`, no global logging, no `# type:`/`# mypy:` ignores
  • Optional-extras isolation — all 3 subprocess tests pass (pydantic, msgspec, opentelemetry)
  • `mkdocs build --strict` — clean
  • Reviewer: confirm `retry.streaming_refused` fires only at the retryable-failure-path site, NOT at the 3 non-idempotent early-exit sites
  • Reviewer: spot-check that the 4 events have flat scalar attributes only (log-aggregator + OTel attribute-conventions friendly)

🤖 Generated with Claude Code

lesnik512 and others added 8 commits June 5, 2026 20:56
…ped)

Re-scopes Epic 5 from the original 4-story plan. The OTel-instrumentation-httpx
package already covers ~70% of what the original 5-4 OTel middleware would
have done at the transport layer. This spec ships only the additive 30%:
Retry/Bulkhead emit 4 operational events via two channels — structured
log records (always on, no dep) + opentelemetry add_event calls on the
active span (when the otel extra is installed).

Re-introduces the otel extra removed in PR #24, now paired with code that
uses it. opentelemetry-api only (no SDK).

4 events: retry.giving_up, retry.budget_refused, retry.streaming_refused,
bulkhead.rejected. All WARNING level. Successful paths emit nothing.

Retires Epic 5 stories 5-1 (hook protocol) and 5-4 (standalone OTel
middleware) with documented rationale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 TDD tasks on feat/v0.6-observability. Task 1 re-adds the otel extra
(opentelemetry-api only) + is_otel_installed flag + isolation test
extension. Task 2 lands the _emit_event helper in _internal/observability.py
with 4 unit tests. Tasks 3-4 wire 3 retry events + 1 bulkhead event +
emission tests. Task 5 adds fail-soft tests for the otel-missing case.
Task 6 syncs README + docs/index.md + engineering.md (§1, §7, §8) +
drafts 0.6.0 release notes. Task 7 verifies + pushes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s it

PR #24 removed the otel extra as YAGNI (it advertised functionality that
didn't exist). 0.6.0 brings it back: structured-logging observability
in Retry / Bulkhead with opt-in OTel attribute enrichment lands in
the next commits.

otel = ['opentelemetry-api>=1.20'] only — no SDK. Users supply their
own SDK (or use a no-op tracer in tests). Matches how
opentelemetry-instrumentation-httpx declares its dep.

import_checker gains is_otel_installed alongside the existing flags.
Isolation test extended to verify import httpware does not pull
opentelemetry into sys.modules.
New _internal/observability.py with a single _emit_event helper. Always
emits a structured log record at the requested level; if opentelemetry-api
is installed, calls trace.get_current_span().add_event(name, attributes=...)
on the active span.

The lazy 'from opentelemetry import trace' inside the if is_otel_installed
gate preserves the optional-extras isolation invariant (import httpware
does not pull opentelemetry when the extra is absent).

Logger names and event names are the public observability surface; the
helper itself lives in _internal/ so users interact only with the
strings, not Python imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three event sites:
- retry.giving_up (WARNING): max_attempts exhausted
- retry.budget_refused (WARNING): budget.try_withdraw() refused
- retry.streaming_refused (WARNING): streaming-body marker prevented an
  otherwise-retryable retry (retryable-failure-path site only — the 3
  non-idempotent early-exit sites still add the note but do NOT emit
  this event, since at those sites method-eligibility is the primary
  reason for not retrying).

All four events have flat, scalar attributes (method, url, attempts,
last_status, last_exception_type) so they index cleanly in log
aggregators and serialize cleanly as OTel attributes.
…gger + OTel

One event site:
- bulkhead.rejected (WARNING): the acquire_timeout expired before a slot
  became available. Emitted just before raising BulkheadFullError so the
  raise path is unchanged.

Attributes: max_concurrent, acquire_timeout, method, url. Flat scalars so
they index cleanly in log aggregators and serialize cleanly as OTel
attributes.
Mirrors the existing test_optional_extras_pydantic_missing.py pattern:
patches httpware._internal.import_checker.is_otel_installed to False
to simulate the 'extra not installed' case. Verifies that the
structured-log half of _emit_event still works and that no
opentelemetry.trace.get_current_span call is attempted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- README + docs/index.md: add 'Observability' section + update [all]
  install line to include otel; drop stale 'streaming and observability
  not shipped' status note
- planning/engineering.md §1 + §7 + §8: mention observability in
  project intent; update otel-extra parenthetical to reflect reintroduction;
  mark Epic 5 SHIPPED in roadmap with rationale for retiring 5-1 / 5-4
- planning/releases/0.6.0.md: new release notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lesnik512 lesnik512 self-assigned this Jun 5, 2026
@lesnik512 lesnik512 merged commit 7cf653b into main Jun 5, 2026
5 checks passed
@lesnik512 lesnik512 deleted the feat/v0.6-observability branch June 5, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant