feat(v0.6): resilience observability — structured logging + opt-in OTel events#27
Merged
Conversation
…ped) Re-scopes Epic 5 from the original 4-story plan. The OTel-instrumentation-httpx package already covers ~70% of what the original 5-4 OTel middleware would have done at the transport layer. This spec ships only the additive 30%: Retry/Bulkhead emit 4 operational events via two channels — structured log records (always on, no dep) + opentelemetry add_event calls on the active span (when the otel extra is installed). Re-introduces the otel extra removed in PR #24, now paired with code that uses it. opentelemetry-api only (no SDK). 4 events: retry.giving_up, retry.budget_refused, retry.streaming_refused, bulkhead.rejected. All WARNING level. Successful paths emit nothing. Retires Epic 5 stories 5-1 (hook protocol) and 5-4 (standalone OTel middleware) with documented rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 TDD tasks on feat/v0.6-observability. Task 1 re-adds the otel extra (opentelemetry-api only) + is_otel_installed flag + isolation test extension. Task 2 lands the _emit_event helper in _internal/observability.py with 4 unit tests. Tasks 3-4 wire 3 retry events + 1 bulkhead event + emission tests. Task 5 adds fail-soft tests for the otel-missing case. Task 6 syncs README + docs/index.md + engineering.md (§1, §7, §8) + drafts 0.6.0 release notes. Task 7 verifies + pushes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s it PR #24 removed the otel extra as YAGNI (it advertised functionality that didn't exist). 0.6.0 brings it back: structured-logging observability in Retry / Bulkhead with opt-in OTel attribute enrichment lands in the next commits. otel = ['opentelemetry-api>=1.20'] only — no SDK. Users supply their own SDK (or use a no-op tracer in tests). Matches how opentelemetry-instrumentation-httpx declares its dep. import_checker gains is_otel_installed alongside the existing flags. Isolation test extended to verify import httpware does not pull opentelemetry into sys.modules.
New _internal/observability.py with a single _emit_event helper. Always emits a structured log record at the requested level; if opentelemetry-api is installed, calls trace.get_current_span().add_event(name, attributes=...) on the active span. The lazy 'from opentelemetry import trace' inside the if is_otel_installed gate preserves the optional-extras isolation invariant (import httpware does not pull opentelemetry when the extra is absent). Logger names and event names are the public observability surface; the helper itself lives in _internal/ so users interact only with the strings, not Python imports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three event sites: - retry.giving_up (WARNING): max_attempts exhausted - retry.budget_refused (WARNING): budget.try_withdraw() refused - retry.streaming_refused (WARNING): streaming-body marker prevented an otherwise-retryable retry (retryable-failure-path site only — the 3 non-idempotent early-exit sites still add the note but do NOT emit this event, since at those sites method-eligibility is the primary reason for not retrying). All four events have flat, scalar attributes (method, url, attempts, last_status, last_exception_type) so they index cleanly in log aggregators and serialize cleanly as OTel attributes.
…gger + OTel One event site: - bulkhead.rejected (WARNING): the acquire_timeout expired before a slot became available. Emitted just before raising BulkheadFullError so the raise path is unchanged. Attributes: max_concurrent, acquire_timeout, method, url. Flat scalars so they index cleanly in log aggregators and serialize cleanly as OTel attributes.
Mirrors the existing test_optional_extras_pydantic_missing.py pattern: patches httpware._internal.import_checker.is_otel_installed to False to simulate the 'extra not installed' case. Verifies that the structured-log half of _emit_event still works and that no opentelemetry.trace.get_current_span call is attempted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- README + docs/index.md: add 'Observability' section + update [all] install line to include otel; drop stale 'streaming and observability not shipped' status note - planning/engineering.md §1 + §7 + §8: mention observability in project intent; update otel-extra parenthetical to reflect reintroduction; mark Epic 5 SHIPPED in roadmap with rationale for retiring 5-1 / 5-4 - planning/releases/0.6.0.md: new release notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Epic 5, re-scoped.
RetryandBulkheadnow emit four operational events via two channels — stdlibloggingrecords (always on) and OpenTelemetry span events on the active span (opt-in via theotelextra).httpware.retry/httpware.bulkhead; eventsretry.giving_up,retry.budget_refused,retry.streaming_refused,bulkhead.rejected.otelextra reintroduced, paired with the code that uses it. PR chore(deps): drop otel optional extra (YAGNI) #24 had removed it as YAGNI; 0.6.0 brings it back withopentelemetry-api>=1.20only (no SDK — users supply their own).from opentelemetry import tracelives inside anif import_checker.is_otel_installed:gate, andtests/test_optional_extras_isolation.pynow verifiesimport httpwaredoes not pullopentelemetryintosys.modules.httpware.observabilitynamespace, no standalone OTel middleware. Original Epic 5 stories5-1and5-4retired (rationale in spec).Spec: planning/specs/2026-06-05-observability-design.md
Plan: planning/plans/2026-06-05-observability-plan.md
Release notes: planning/releases/0.6.0.md
Test Plan
🤖 Generated with Claude Code