feat(#163): Add baseline observability#276
Merged
Merged
Conversation
mvillmow
added a commit
that referenced
this pull request
Jun 29, 2026
Telemachy CI pins setup-pixi to v0.67.2, which can only read pixi lock-format v6 and fails with "Lock-file version 7 is newer than supported; Maximum supported version: 6" on any v7 lock. Open PRs #263/#271/#273/#276 intentionally ship v7 locks (multi-platform macOS/Windows + large mcp/otel dep trees), so their entire pipeline dies at `pixi install`. Bump all 9 pixi-version pins (_required.yml x7, release.yml x2) to v0.70.2, matching sibling repo Agamemnon which already runs v0.70.2 on main with v7 locks. v0.70.2 is backward-compatible: `pixi install --locked` against main's current v6 lock succeeds (warn-only, lock not rewritten), so this does not red main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
mvillmow
added a commit
that referenced
this pull request
Jun 29, 2026
Telemachy CI pins setup-pixi to v0.67.2, which can only read pixi lock-format v6 and fails with "Lock-file version 7 is newer than supported; Maximum supported version: 6" on any v7 lock. Open PRs #263/#271/#273/#276 intentionally ship v7 locks (multi-platform macOS/Windows + large mcp/otel dep trees), so their entire pipeline dies at `pixi install`. Bump all 9 pixi-version pins (_required.yml x7, release.yml x2) to v0.70.2, matching sibling repo Agamemnon which already runs v0.70.2 on main with v7 locks. v0.70.2 is backward-compatible: `pixi install --locked` against main's current v6 lock succeeds (warn-only, lock not rewritten), so this does not red main. Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
43f8a7e to
aa13850
Compare
…acing Implement issue #163: Add observability to ProjectTelemachy with: - Correlation IDs: Every log record carries per-execution workflow_id via contextvars for end-to-end tracing across async boundaries - Structured logging: JSON or plain-text formatters via LOG_FORMAT setting - Prometheus metrics: Workflow completion, task outcomes, HTTP latency exposed via /metrics endpoint (opt-in via METRICS_ENABLED) - OpenTelemetry tracing: Spans for each workflow phase via get_tracer() lazy factory; console exporter only (OTLP planned follow-up) Architecture: - New telemetry.py: logging filters, JSON/plain formatters, metrics, tracing setup with idempotent initialization and thread safety - config.py: New observability settings with validation (single source) - cli.py: _setup_logging rewritten to attach filter, set formatters, start tracing/metrics before httpx clients instantiated - executor.py: Contextvars set/reset in _run, spans around each phase, metrics for workflow/task outcomes - agamemnon_client.py: Endpoint label normalization, per-attempt metrics Tests (26 new): - Correlation ID propagation into gather children - Log record defaults for missing filter - Metrics idempotency and thread safety - Metrics increments on success/failure - Task terminal state transitions - Tracer factory behavior - Endpoint label normalization License audit: Added 4 new packages (prometheus-client, opentelemetry-*), all Apache-2.0 compatible with MIT distribution. All 70 tests pass with 78.41% coverage (target 75%). Lint and type checking clean (ruff, mypy). Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
No follow-up items discovered during implementation that qualify under strict scope rules (core defects, security findings, safety hazards, or critical bugs). Contextvars cleanup verified, thread safety confirmed, idempotency guaranteed, and test coverage exceeds target. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement issue #163: Add baseline observability to ProjectTelemachy with correlation IDs, structured logging, Prometheus metrics, and OpenTelemetry tracing.
Changes
New observability layer (telemetry.py)
workflow_idvia contextvars for end-to-end tracing across async boundariesLOG_FORMATsetting; safe formatters that don't crash without filter/metricsendpoint (opt-in)get_tracer()lazy factory; console exporter only (OTLP planned)Architecture
telemetry.py: Logging filters, JSON/plain formatters, metrics singletons, tracing setup with idempotent initialization and thread safetyconfig.py: New observability settings (LOG_FORMAT,METRICS_ENABLED,METRICS_PORT,OTEL_ENABLED,OTEL_SERVICE_NAME,OTEL_EXPORTER) with single-source validationcli.py:_setup_loggingrewritten to attach filter, set formatters, start tracing/metrics before httpx clients instantiatedexecutor.py: Contextvars set/reset in_run, spans around each phase (provision, teams, monitor, teardown), metrics for workflow/task outcomesagamemnon_client.py: Endpoint label normalization to control cardinality, per-attempt metricsTesting
Documentation
.env.examplewith observability settingsCLAUDE.mdwith new environment variables and observability subsectiondocs/license-audit.mdwith 4 new packages (all Apache-2.0, compatible with MIT distribution)Closes #163