DataDog · rochdev · May 29, 2026 · May 12, 2026 · May 20, 2026 · May 20, 2026
diff --git a/.agents/skills/llmobs-integration/SKILL.md b/.agents/skills/llmobs-integration/SKILL.md
@@ -1,65 +1,31 @@
 ---
 name: llmobs-integration
 description: |
-  This skill should be used when the user asks to "add LLMObs support", "create an LLMObs plugin",
-  "instrument an LLM library", "add LLM Observability", "add llmobs", "add llm observability",
-  "instrument chat completions", "instrument streaming", "instrument embeddings",
-  "instrument agent runs", "instrument orchestration", "instrument LLM",
-  "LLMObsPlugin", "LlmObsPlugin", "getLLMObsSpanRegisterOptions", "setLLMObsTags",
-  "tagLLMIO", "tagEmbeddingIO", "tagRetrievalIO", "tagTextIO", "tagMetrics", "tagMetadata",
-  "tagSpanTags", "tagPrompt", "LlmObsCategory", "LlmObsSpanKind",
-  "span kind llm", "span kind workflow", "span kind agent", "span kind embedding",
-  "span kind tool", "span kind retrieval",
-  "openai llmobs", "anthropic llmobs", "genai llmobs", "google llmobs",
-  "langchain llmobs", "langgraph llmobs", "ai-sdk llmobs",
-  "llm span", "llmobs span event", "model provider", "model name",
-  "CompositePlugin llmobs", "llmobs tracing", "VCR cassettes",
-  or needs to build, modify, or debug an LLMObs plugin for any LLM library in dd-trace-js.
+  Use when adding, debugging, or modifying LLMObs plugins for an LLM library
+  in dd-trace-js. Triggers: "add LLMObs support", "instrument chat
+  completions / streaming / embeddings / agent runs / orchestration / tool
+  calls / retrieval", "LLMObsPlugin", "getLLMObsSpanRegisterOptions",
+  "setLLMObsTags", "LlmObsCategory", "LlmObsSpanKind", any provider tag
+  ("openai" / "anthropic" / "genai" / "google" / "langchain" / "langgraph" /
+  "ai-sdk" llmobs), "VCR cassettes".
 ---
 
 # LLM Observability Integration Skill
 
-## Purpose
-
-This skill helps you create LLMObs plugins that instrument LLM library operations and emit proper span events for LLM observability in dd-trace-js. Supported operation types include:
-
-- **Chat completions** — standard request/response LLM calls
-- **Streaming chat completions** — streamed token-by-token responses
-- **Embeddings** — vector embedding generation
-- **Agent runs** — autonomous LLM agent execution loops
-- **Orchestration** — multi-step workflow and graph execution (langgraph, etc.)
-- **Tool calls** — tool/function invocations
-- **Retrieval** — vector DB / RAG operations
-
-## When to Use
-
-- Creating a new LLMObs plugin for an LLM library
-- Adding LLMObs support to an existing tracing integration
-- Understanding LLMObsPlugin architecture and patterns
-- Determining how to instrument a new LLM package
+This skill covers creating LLMObs plugins that instrument LLM library operations and emit span events. Supported operations: chat completions (streaming and non-streaming), embeddings, agent runs, orchestration (workflows / graphs), tool calls, retrieval (RAG / vector DB).
 
 ## Core Concepts
 
 ### 1. LLMObsPlugin Base Class
 
-All LLMObs plugins extend the `LLMObsPlugin` base class, which provides the core instrumentation framework.
-
-**Key responsibilities:**
-- **Span registration**: Define span metadata (model provider, model name, span kind)
-- **Tag extraction**: Extract and tag LLM-specific data (messages, metrics, metadata)
-- **Context management**: Handle span lifecycle and parent context
+All LLMObs plugins extend `LLMObsPlugin`. Two methods must be implemented:
 
-**Required methods to implement:**
-- `getLLMObsSpanRegisterOptions(ctx)` - Returns span registration options (modelProvider, modelName, kind, name)
-- `setLLMObsTags(ctx)` - Extracts and tags LLM data (input/output messages, metrics, metadata)
+- `getLLMObsSpanRegisterOptions(ctx)` — returns `{ modelProvider, modelName, kind, name }`.
+- `setLLMObsTags(ctx)` — extracts and tags input / output messages, token metrics, and model metadata.
 
-**Plugin lifecycle:**
-1. `start(ctx)` - Registers span with LLMObs, captures context
-2. Operation executes (chat completion call)
-3. `asyncEnd(ctx)` - Calls `setLLMObsTags()` to extract and tag data
-4. `end(ctx)` - Restores parent context
+Lifecycle: `start(ctx)` registers the span and captures context; the wrapped operation runs; `asyncEnd(ctx)` calls `setLLMObsTags()`; `end(ctx)` restores the parent.
 
-See [references/plugin-architecture.md](references/plugin-architecture.md) for complete implementation details.
+See [references/plugin-architecture.md](references/plugin-architecture.md) for the full implementation surface.
 
 ### 2. Package Category System
 
@@ -166,15 +132,6 @@ See [references/message-extraction.md](references/message-extraction.md) for pro
 
 See [references/plugin-architecture.md](references/plugin-architecture.md) for step-by-step implementation guide.
 
-## Common Patterns
-
-Based on category:
-
-- **LLM_CLIENT**: Messages in array, straightforward extraction from `result.choices[0]` or equivalent
-- **MULTI_PROVIDER**: Handle multiple provider formats with provider detection logic
-- **ORCHESTRATION**: May use `'workflow'` span kind instead of `'llm'`, focus on lifecycle events
-- **INFRASTRUCTURE**: Protocol-specific instrumentation, may not have traditional messages
-
 ## Plugin Registration
 
 All plugins must export an array:
@@ -192,12 +149,3 @@ For detailed information, see:
 - [references/category-detection.md](references/category-detection.md) - Package classification heuristics and detection process
 - [references/message-extraction.md](references/message-extraction.md) - Provider-specific message format patterns
 - [references/reference-implementations.md](references/reference-implementations.md) - Working plugin examples (Anthropic, Google GenAI)
-
-## Key Principles
-
-1. **Category determines approach** - Always detect category first using decision tree
-2. **Use enum values** - Reference `LlmObsCategory` and `LlmObsSpanKind` enums from models
-3. **Standard message format** - Always convert to `[{content, role}]` format
-4. **Complete metadata** - Extract all available model parameters and token metrics
-5. **Error handling** - Handle failures gracefully (empty messages on error)
-6. **Test strategy follows category** - VCR for clients, pure functions for orchestration
diff --git a/.agents/skills/llmobs-testing/SKILL.md b/.agents/skills/llmobs-testing/SKILL.md
@@ -1,56 +1,27 @@
 ---
 name: llmobs-testing
 description: |
-  This skill should be used when the user asks to "write LLMObs tests", "add tests for LLM Observability",
-  "test an LLMObs plugin", "llmobs test", "llmobs spec", "test llm observability",
-  "assertLlmObsSpanEvent", "useLlmObs", "getEvents",
-  "MOCK_STRING", "MOCK_NOT_NULLISH", "MOCK_NUMBER", "MOCK_OBJECT",
-  "VCR cassette", "record cassette", "replay cassette", "vcr proxy", "llmobs cassette",
-  "test chat completions", "test streaming", "test embeddings", "test agent runs",
-  "test orchestration", "test workflow", "llmobs span event",
-  "LLMObs test strategy", "LlmObsCategory test",
-  "LLM_CLIENT test", "MULTI_PROVIDER test", "ORCHESTRATION test", "INFRASTRUCTURE test",
-  "span kind llm test", "span kind workflow test",
-  "inputMessages", "outputMessages", "token metrics", "llmobs span validation",
-  "cassette not generated", "re-record cassette", "127.0.0.1:9126",
-  or needs to write, modify, or debug tests for any LLMObs plugin in dd-trace-js.
+  Use when writing, modifying, or debugging tests for an LLMObs plugin in
+  dd-trace-js. Triggers: "write LLMObs tests", "test an LLMObs plugin",
+  "assertLlmObsSpanEvent", "useLlmObs", "getEvents", any MOCK_* matcher
+  ("MOCK_STRING" / "MOCK_NOT_NULLISH" / "MOCK_NUMBER" / "MOCK_OBJECT"),
+  "VCR cassette", "vcr proxy", "127.0.0.1:9126", any LlmObsCategory test
+  ("LLM_CLIENT" / "MULTI_PROVIDER" / "ORCHESTRATION" / "INFRASTRUCTURE").
 ---
 
 # LLM Observability Testing Skill
 
-## ⚠️ CRITICAL: Read This First ⚠️
+## Determine the package category first
 
-**BEFORE writing any test, you MUST determine the package category.**
+**Before writing any test, determine the package's `LlmObsCategory`.** Category picks the test strategy (VCR or not), the span kind, and the test structure. The wrong category produces tests that pass against the wrong contract — VCR cassettes for a workflow library produce empty recordings; pure-function tests for an HTTP-call wrapper miss the network surface entirely.
 
-**The category determines EVERYTHING:**
-- Whether to use VCR or not
-- What spanKind to use
-- What test structure to follow
-- What examples to study
+Quick check:
 
-**IF YOU USE THE WRONG CATEGORY STRATEGY, THE TEST WILL FAIL.**
+- Direct HTTP calls to an LLM provider? → `LLM_CLIENT` or `MULTI_PROVIDER` — VCR.
+- Workflow / graph orchestration with state? → `ORCHESTRATION` — no VCR, pure functions, real LLM as the orchestration node.
+- Protocol / server implementation? → `INFRASTRUCTURE` — mock server.
 
-**Categories are defined in the `LlmObsCategory` enum.**
-
-**Quick check:**
-- Does package make HTTP calls to LLM APIs? → `LLM_CLIENT` or `MULTI_PROVIDER` (use VCR)
-- Does package orchestrate workflows/graphs? → `ORCHESTRATION` (NO VCR, pure functions)
-- Does package implement protocols/servers? → `INFRASTRUCTURE` (mock servers)
-
-**See [references/category-strategies.md](references/category-strategies.md) for FORBIDDEN vs REQUIRED patterns per category.**
-
----
-
-## Purpose
-
-This skill helps you write comprehensive LLMObs tests that validate span events, messages, tokens, and metadata using category-appropriate strategies.
-
-## When to Use
-
-- Writing tests for a new LLMObs plugin (ALWAYS check category first)
-- Understanding category-specific test strategies
-- Learning VCR cassettes (for LLM_CLIENT/MULTI_PROVIDER only)
-- Learning assertion patterns for LLMObs spans
+See [references/category-strategies.md](references/category-strategies.md) for the FORBIDDEN-vs-REQUIRED matrix per category.
 
 ## Core Testing Concepts
 
@@ -98,52 +69,13 @@ See [references/vcr-cassettes.md](references/vcr-cassettes.md) for recording pro
 
 ### 3. Category-Specific Test Strategies
 
-Test strategy is determined by the `LlmObsCategory` enum.
-
-#### LlmObsCategory.LLM_CLIENT & LlmObsCategory.MULTI_PROVIDER
-
-**Strategy:** VCR with real API calls via proxy
-
-**Characteristics:**
-- Use VCR proxy baseURL
-- Record cassettes with real API keys
-- Tests make actual HTTP calls (recorded once)
-- Validate LLM-specific data (messages, tokens, model info)
-
-**Span kind:** Usually `'llm'` for chat completions
+The category-determination block at the top maps category to strategy. Non-obvious bits per category:
 
-See [references/category-strategies.md](references/category-strategies.md) for detailed patterns.
+- **LLM_CLIENT / MULTI_PROVIDER**: VCR proxy baseURL is `http://127.0.0.1:9126/vcr/{provider}`. Span kind: `'llm'`. Cassettes record once with real API keys; CI replays them.
+- **ORCHESTRATION**: Span kind: `'workflow'` or `'agent'`, never `'llm'`. No VCR, no real API calls — the orchestrator itself doesn't make HTTP calls, it coordinates libraries that do. Mock LLM responses as plain return values from the node so the test exercises the workflow execution, not the provider API.
+- **INFRASTRUCTURE**: Mock server, protocol-specific validation, no VCR.
 
-#### LlmObsCategory.ORCHESTRATION
-
-**Strategy:** Pure function tests, NO VCR, NO real API calls
-
-**Characteristics:**
-- NO VCR cassettes
-- NO HTTP calls to LLM providers
-- Use library's native APIs with mock/test LLM responses
-- Focus on workflow lifecycle, not API calls
-- **CRITICAL:** Still test with actual LLM as orchestration node (not mocked completely)
-
-**Span kind:** Usually `'workflow'` or `'agent'`, NOT `'llm'`
-
-**Example concept:**
-- LangGraph invokes nodes that call LLMs
-- LangGraph itself doesn't make HTTP calls
-- Test LangGraph's workflow execution, not the underlying LLM API
-
-See [references/category-strategies.md](references/category-strategies.md) for orchestration test patterns.
-
-#### LlmObsCategory.INFRASTRUCTURE
-
-**Strategy:** Mock server tests
-
-**Characteristics:**
-- Mock server implementation
-- Protocol-specific validation
-- NO VCR
-
-See [references/category-strategies.md](references/category-strategies.md) for infrastructure test patterns.
+See [references/category-strategies.md](references/category-strategies.md) for per-category patterns.
 
 ### 4. Assertion Patterns
 
@@ -221,38 +153,6 @@ On errors, validate:
 - Error object exists: `error: MOCK_OBJECT`
 - Span still created (not dropped)
 
-## Common Patterns by Category
-
-### LLM_CLIENT / MULTI_PROVIDER Pattern
-- Use VCR proxy baseURL
-- Test chat completions with various parameters
-- Validate real API response structure
-- Test streaming (if supported)
-- Test error responses
-
-### ORCHESTRATION Pattern
-- NO VCR
-- Test workflow lifecycle methods (invoke, stream, run)
-- Use mock LLM responses within workflow
-- Focus on workflow span, not LLM spans
-- Validate workflow-specific metadata (state, nodes, edges)
-
-### INFRASTRUCTURE Pattern
-- Mock server setup
-- Protocol-specific validation
-- Connection/transport testing
-
-## Best Practices
-
-1. **Use MOCK_* for non-deterministic values** - Output text, token counts, error objects
-2. **Use exact values for inputs** - You control input messages and parameters
-3. **Always validate spanKind** - Required for every span
-4. **Match category to test strategy** - VCR for clients, pure functions for orchestration
-5. **Test error paths** - Verify empty outputs and error objects on failures
-6. **Group by method** - Organize tests by instrumented method
-7. **Load modules fresh** - Use beforeEach() to avoid state leakage
-8. **Cover edge cases** - Empty messages, missing metadata, streaming
-
 ## References
 
 For detailed information, see:
@@ -261,11 +161,3 @@ For detailed information, see:
 - [references/vcr-cassettes.md](references/vcr-cassettes.md) - VCR recording process, cassette management, troubleshooting
 - [references/assertion-helpers.md](references/assertion-helpers.md) - Complete assertLlmObsSpanEvent API, matchers, patterns
 - [references/category-strategies.md](references/category-strategies.md) - Detailed test strategies for each LlmObsCategory
-
-## Key Principles
-
-1. **Category determines strategy** - Always check `LlmObsCategory` to pick test approach
-2. **Orchestrators don't use VCR** - They don't make direct API calls
-3. **Use matchers for variance** - Real API responses vary, use MOCK_* matchers
-4. **Validate message format** - Always check `{content, role}` structure
-5. **Test with real behavior** - For orchestrators, use actual LLM as node (not fully mocked)
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -201,6 +201,7 @@
 /packages/dd-trace/src/remote_config/ @DataDog/apm-sdk-capabilities-js
 /packages/dd-trace/test/remote_config/ @DataDog/apm-sdk-capabilities-js
 /packages/dd-trace/src/baggage.js @DataDog/apm-sdk-capabilities-js
+/packages/dd-trace/test/baggage.spec.js @DataDog/apm-sdk-capabilities-js
 /packages/dd-trace/src/sampler.js @DataDog/apm-sdk-capabilities-js
 /packages/dd-trace/test/sampler.spec.js @DataDog/apm-sdk-capabilities-js
 /packages/dd-trace/src/priority_sampler.js @DataDog/apm-sdk-capabilities-js
@@ -238,6 +239,8 @@
 /.github/actions/dd-sts-app-key/action.yml @Datadog/lang-platform-js
 /.github/actions/dd-sts-api-key/action.yml @Datadog/lang-platform-js
 /.github/actions/push_to_test_optimization/ @DataDog/ci-app-libraries
+/.github/playwright/ @DataDog/ci-app-libraries
+/.github/selenium/ @DataDog/ci-app-libraries
 /.github/actions/upload-node-reports/action.yml @Datadog/lang-platform-js
 /.github/chainguard @DataDog/sdlc-security
 /.github/codeql_config.yml @DataDog/sdlc-security
@@ -259,6 +262,7 @@
 /.github/workflows/serverless.yml @DataDog/serverless-aws @DataDog/apm-serverless
 /.github/workflows/llmobs.yml @DataDog/ml-observability
 /.github/workflows/openfeature.yml @DataDog/feature-flagging-and-experimentation-sdk
+/.github/workflows/pr-title.yml @DataDog/lang-platform-js
 /.github/workflows/profiling.yml @DataDog/profiling-js
 /.github/workflows/system-tests.yml @DataDog/asm-js
 /.github/workflows/test-optimization.yml @DataDog/ci-app-libraries

@@ -16,40 +16,21 @@ inputs:
 runs:
   using: composite
   steps:
-    - name: Verify coverage output
-      shell: bash
-      run: node scripts/verify-coverage.js --flags "${{ inputs.flags }}"
-
-    # `master-coverage` is the flag .codecov.yml gates codecov/patch on. Attach
-    # it only on PRs targeting master so release-branch PRs auto-pass.
-    - name: Compute Codecov flags
-      id: codecov-flags
-      shell: bash
-      env:
-        JOB_FLAGS: ${{ inputs.flags }}
-        EVENT_NAME: ${{ github.event_name }}
-        BASE_REF: ${{ github.base_ref }}
-      run: |
-        flags="$JOB_FLAGS"
-        if [ "$EVENT_NAME" = "pull_request" ] && [ "$BASE_REF" = "master" ]; then
-          flags="${flags:+$flags,}master-coverage"
-        fi
-        echo "value=$flags" >> "$GITHUB_OUTPUT"
-
-    - name: Upload coverage to Codecov
-      uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
-      with:
-        flags: ${{ steps.codecov-flags.outputs.value }}
-
-    - name: Install datadog-ci
-      if: always()
-      uses: ./.github/actions/datadog-ci
-
-    - name: Upload coverage to Datadog
-      if: always()
+    # Retry once on failure to work around transient issues (e.g. flaky
+    # Codecov upload network calls).
+    - id: attempt
+      uses: ./.github/actions/coverage/upload
       continue-on-error: true
+      with:
+        flags: ${{ inputs.flags }}
+        report-dir: ${{ inputs.report-dir }}
+        dd_api_key: ${{ inputs.dd_api_key }}
+    - if: steps.attempt.outcome == 'failure'
       shell: bash
-      run: datadog-ci coverage upload ${FLAGS:+--flags "$FLAGS"} .
-      env:
-        DD_API_KEY: ${{ inputs.dd_api_key }}
-        FLAGS: ${{ inputs.flags }}
+      run: sleep 60
+    - if: steps.attempt.outcome == 'failure'
+      uses: ./.github/actions/coverage/upload
+      with:
+        flags: ${{ inputs.flags }}
+        report-dir: ${{ inputs.report-dir }}
+        dd_api_key: ${{ inputs.dd_api_key }}