Skip to content

[AIG-632] Add OpenTelemetry tracing#45

Merged
blackms merged 4 commits into
mainfrom
codex/aig-632-opentelemetry-tracing
May 29, 2026
Merged

[AIG-632] Add OpenTelemetry tracing#45
blackms merged 4 commits into
mainfrom
codex/aig-632-opentelemetry-tracing

Conversation

@blackms
Copy link
Copy Markdown
Owner

@blackms blackms commented May 28, 2026

Summary

  • Add opt-in OpenTelemetry tracing configuration and helper APIs.
  • Instrument agent spawn/execution, handoff/task assignment, LLM calls, MCP tool calls, memory store/search, review-loop phases, and consensus checkpoints/decisions.
  • Support observability.otel with endpoint alias plus observability.tracing.otlpEndpoint for programmatic/backward-compatible config.
  • Document OTLP/console exporter setup, Jaeger docker-compose, Honeycomb, Datadog, Phoenix, and privacy constraints for emitted span attributes.

Validation

  • npm run typecheck
  • npx vitest run tests/unit/observability/tracing.test.ts tests/unit/config.test.ts tests/unit/consensus-service.test.ts tests/unit/memory-agent-scoping.test.ts
  • npm run test:unit
  • npx vitest run --config vitest.integration.config.ts --reporter=dot
  • npm run build
  • npm run lint (passes with existing warnings outside this change)
  • git diff --check

Summary by CodeRabbit

  • New Features

    • Opt-in OpenTelemetry tracing for orchestration: agent lifecycle, LLM calls, memory ops, tool execution, review/consensus flows
    • Exposed tracing helpers for initialization, shutdown, and traced sync/async blocks
    • Configurable OTLP/HTTP export with common backend examples
  • Documentation

    • New observability guide with configuration, examples, and privacy guidance
  • Tests

    • Added unit tests for tracing helpers and observability config normalization
  • Chores

    • Added OpenTelemetry runtime dependencies and config schema support

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 631c7c0b-b8f4-4c50-8efd-113b099297df

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc6754 and ace90a5.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (4)
  • package.json
  • src/observability/tracing.ts
  • src/utils/config.ts
  • tests/unit/config.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • package.json
  • tests/unit/config.test.ts
  • src/utils/config.ts
  • src/observability/tracing.ts

📝 Walkthrough

Walkthrough

This PR adds comprehensive opt-in OpenTelemetry tracing to aistack: a new observability module and re-exports, config types and Zod schemas, OpenTelemetry dependencies, span wrappers (sync/async), and instrumentation across agent lifecycle, review loops, memory operations, consensus gates, and MCP tool handlers, plus docs and tests.

Changes

OpenTelemetry Tracing Implementation

Layer / File(s) Summary
Tracing infrastructure and OpenTelemetry helpers
src/observability/tracing.ts, src/observability/index.ts
Core OpenTelemetry SDK setup with singleton lifecycle (initializeTracing, shutdownTracing), lazy module loading, exporter selection (console or OTLP HTTP), attribute sanitization, and traceAsync/traceSync wrappers.
Observability configuration and types
src/types.ts, src/utils/config.ts
Adds observability to AgentStackConfig, new ObservabilityConfig/TracingConfig/OTelConfig types, and Zod schemas that normalize observability.tracing and observability.otel with defaults (service name/version, exporter, OTLP endpoint/headers, sampling).
Public API re-exports
src/index.ts, src/observability/index.ts
Re-exports tracing helpers/types: initializeTracing, shutdownTracing, traceAsync, traceSync, isTracingEnabled, and sanitizeSpanAttributes.
Agent lifecycle tracing
src/agents/spawner.ts
Wraps spawnAgent in traceSync and executeAgent in traceAsync; instruments provider chat calls to capture model and token usage attributes while preserving prior status/flow.
Review loop phase tracing
src/coordination/review-loop.ts
Instruments start(), generateInitialCode(), performReview(), and fixCode() with traceAsync, recording verdicts, issue counts, agent durations, and LLM model attributes while keeping semaphore concurrency.
Memory operations tracing
src/memory/index.ts
Wraps store, storeShared, and search in traceAsync spans, recording namespace/agent context, shared flag, vector search usage, and result counts while preserving access validation and deduplication logic.
Consensus and handoff tracing
src/tasks/consensus-service.ts, src/mcp/tools/task-tools.ts
Adds traceSync around consensus methods (requiresConsensus, createCheckpoint, submitDecision) and traceAsync around MCP task handlers (task_create, task_assign) to record decision/hand-off metadata.
MCP tool call tracing
src/mcp/server.ts
Runs selected tool handler inside traceAsync and sets mcp.tool.success span attribute on success.
Documentation and test coverage
docs/OBSERVABILITY.md, package.json, tests/unit/config.test.ts, tests/unit/observability/tracing.test.ts
Adds user-facing observability docs, OpenTelemetry runtime deps, and unit tests for config defaults/normalization and tracing helper behavior (disabled-mode execution, attribute sanitization).

Sequence Diagram

sequenceDiagram
  participant Client
  participant traceAsync as traceAsync(config,name,attrs,fn)
  participant NodeSDK
  participant Tracer
  participant Span
  participant Exporter
  Client->>traceAsync: invoke traced operation
  traceAsync->>traceAsync: check if tracing enabled
  alt Tracing Enabled
    traceAsync->>NodeSDK: ensure initialized
    NodeSDK->>Tracer: get tracer
    Tracer->>Span: start span with sanitized attrs
    Span->>Client: execute callback within span context
    Client-->>Span: return result or throw error
    Span->>Exporter: record/export span
    Span-->>traceAsync: end span
  else Tracing Disabled
    traceAsync->>Client: execute callback directly (no span)
  end
  traceAsync-->>Client: return result or propagate error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • blackms/aistack#37: Related edits to agent lifecycle and memory paths that may interact with tracing instrumentation.
  • blackms/aistack#20: Changes to MCP task tools and task handoff logic that overlap with traced handlers here.
  • blackms/aistack#16: Prior modifications to agent execution paths that are closely related to the new agent tracing hooks.

Poem

🐰 I hopped through code to leave a trace,
Each span a breadcrumb in the race,
From agent spawn to memory's keep,
Observability dreams no longer sleep!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[AIG-632] Add OpenTelemetry tracing' clearly and concisely summarizes the main change: adding OpenTelemetry tracing support throughout the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/aig-632-opentelemetry-tracing

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

tests/unit/config.test.ts

Parsing error: "parserOptions.project" has been provided for @typescript-eslint/parser.
The file was not found in any of the provided project(s): tests/unit/config.test.ts


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@blackms blackms marked this pull request as ready for review May 29, 2026 23:11
@blackms
Copy link
Copy Markdown
Owner Author

blackms commented May 29, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
package.json (1)

58-62: ⚖️ Poor tradeoff

Consider making OTel SDK packages opt-in via optionalDependencies + lazy import.

Tracing is disabled by default, but @opentelemetry/sdk-node (and its transitive tree) is a sizable hard dependency shipped to every consumer of this published library. The repo already uses optionalDependencies for similarly heavy/opt-in features (@e2b/code-interpreter, @xenova/transformers). If src/observability/tracing.ts imports these lazily only when tracing is enabled, moving the SDK packages to optionalDependencies would keep @opentelemetry/api (lightweight, always-safe) as the only hard dep.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@package.json` around lines 58 - 62, Move the heavy OpenTelemetry SDK packages
from "dependencies" to "optionalDependencies" in package.json and change
src/observability/tracing.ts to lazily import the SDK only when tracing is
enabled; keep `@opentelemetry/api` as a regular dependency, update package.json
entries "`@opentelemetry/exporter-trace-otlp-http`", "`@opentelemetry/resources`",
"`@opentelemetry/sdk-node`", and "`@opentelemetry/sdk-trace-base`" to
optionalDependencies, then modify functions in tracing.ts that initialize
tracing (e.g., the tracer initialization/bootstrap function) to use dynamic
import(...) for the SDK modules and guard execution behind the existing
tracing-enabled flag so consumers who don't opt into optional deps won't
download or execute the heavy SDK.
tests/unit/config.test.ts (1)

277-315: ⚡ Quick win

Consider adding a case for when both otel and tracing blocks are supplied.

Current cases cover otel-only and tracing-only, but not precedence when both are present — exactly the path affected by the merge concern flagged in src/utils/config.ts. A combined-block assertion would lock in the intended precedence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/config.test.ts` around lines 277 - 315, Add a unit test in
tests/unit/config.test.ts that calls validateConfig with both
observability.tracing and observability.otel present and assert the intended
precedence behavior (i.e., supply conflicting values like tracing.exporter =
'otlp' with tracing.otlpEndpoint = 'http://tracing:4318/v1/traces' and
otel.endpoint = 'http://otel:4318/v1/traces' and then expect validateConfig to
be valid and that the resulting merged config uses the tracing values);
reference validateConfig, observability.tracing and observability.otel in the
test so the merge path in src/utils/config.ts is exercised and locked in.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/observability/tracing.ts`:
- Around line 95-134: Wrap the synchronous setup in initializeTracing with a
try/catch and add a terminal failure flag (e.g., tracingFailed) checked at the
top to short-circuit future calls; specifically, check tracingFailed early and
return false, then try resolving resolveTracingConfig, creating the exporter via
createSpanExporter, constructing the NodeSDK and calling sdk.start() inside the
try block, and on any thrown error log the error, call shutdownTracing() (or
shutdown any partially-initialized sdk), set tracingFailed = true and ensure
sdkStarted remains false, then return false so we don't repeatedly retry or spam
logs; keep existing sdkStarted and shutdownRegistered logic but only set them
after successful startup.

In `@src/utils/config.ts`:
- Around line 267-276: ObservabilityConfigSchema currently spreads
TracingConfigSchema.parse({}) first, causing Tracing defaults to overwrite
explicit observability.otel values; change the transform to merge values so otel
and explicit tracing win and then run TracingConfigSchema.parse on that merged
object — e.g., build a mergedTracing object using ...(config.otel ?? {}) then
...(config.tracing ?? {}) (so otel and tracing override defaults), and return
tracing: TracingConfigSchema.parse(mergedTracing) while keeping the outer
transform structure; update the transform in ObservabilityConfigSchema
accordingly to ensure defaults are applied by TracingConfigSchema.parse only
after merging.

---

Nitpick comments:
In `@package.json`:
- Around line 58-62: Move the heavy OpenTelemetry SDK packages from
"dependencies" to "optionalDependencies" in package.json and change
src/observability/tracing.ts to lazily import the SDK only when tracing is
enabled; keep `@opentelemetry/api` as a regular dependency, update package.json
entries "`@opentelemetry/exporter-trace-otlp-http`", "`@opentelemetry/resources`",
"`@opentelemetry/sdk-node`", and "`@opentelemetry/sdk-trace-base`" to
optionalDependencies, then modify functions in tracing.ts that initialize
tracing (e.g., the tracer initialization/bootstrap function) to use dynamic
import(...) for the SDK modules and guard execution behind the existing
tracing-enabled flag so consumers who don't opt into optional deps won't
download or execute the heavy SDK.

In `@tests/unit/config.test.ts`:
- Around line 277-315: Add a unit test in tests/unit/config.test.ts that calls
validateConfig with both observability.tracing and observability.otel present
and assert the intended precedence behavior (i.e., supply conflicting values
like tracing.exporter = 'otlp' with tracing.otlpEndpoint =
'http://tracing:4318/v1/traces' and otel.endpoint = 'http://otel:4318/v1/traces'
and then expect validateConfig to be valid and that the resulting merged config
uses the tracing values); reference validateConfig, observability.tracing and
observability.otel in the test so the merge path in src/utils/config.ts is
exercised and locked in.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 01042444-9d8d-45b1-926d-8a3d4a6d081f

📥 Commits

Reviewing files that changed from the base of the PR and between f096996 and 4dc6754.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (15)
  • docs/OBSERVABILITY.md
  • package.json
  • src/agents/spawner.ts
  • src/coordination/review-loop.ts
  • src/index.ts
  • src/mcp/server.ts
  • src/mcp/tools/task-tools.ts
  • src/memory/index.ts
  • src/observability/index.ts
  • src/observability/tracing.ts
  • src/tasks/consensus-service.ts
  • src/types.ts
  • src/utils/config.ts
  • tests/unit/config.test.ts
  • tests/unit/observability/tracing.test.ts

Comment thread src/observability/tracing.ts
Comment thread src/utils/config.ts Outdated
@blackms blackms merged commit 89c8319 into main May 29, 2026
6 checks passed
@blackms blackms deleted the codex/aig-632-opentelemetry-tracing branch May 29, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant