Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
ae012f5
refactor(span)!: gate addLink(spanContext, attributes) legacy overloa…
BridgeAR May 12, 2026
d091e46
chore(deps): bump the runtime-minor-and-patch-dependencies group acro…
dependabot[bot] May 20, 2026
6afb8b4
chore: update dependabot and support ranges (#8337)
BridgeAR May 20, 2026
047ed18
ci: avoid Yarn quarantine for Datadog packages (#8577)
IlyasShabi May 20, 2026
6e4f40f
fix(test): retry topic creation on UNKNOWN_TOPIC_OR_PARTITION in kafk…
wconti27 May 20, 2026
9105fdd
ci: fix node version cache path on Windows (#8578)
rochdev May 21, 2026
b7fed1c
fix(electron): guard find() result and increase startApp timeout (#8559)
rochdev May 21, 2026
033eca8
chore(eslint): require messages on boolean test assertions (#8537)
watson May 21, 2026
855bdc4
[test optimization] normalize seed suffix in test names in `jest` (#8…
juan-fernandez May 21, 2026
490c42e
bump native-iast-taint-tracking (#8591)
IlyasShabi May 21, 2026
4ff6367
bump datadog/pprof (#8565)
IlyasShabi May 21, 2026
ab2fbf3
perf(http-server): reuse request ctx and cache config in plugin start…
BridgeAR May 21, 2026
8fe430d
ci(verify-tests): flag specs no CI invocation reaches (#8543)
BridgeAR May 21, 2026
870590a
ci: structured retry and longer network-timeout for CI installs (#8566)
BridgeAR May 21, 2026
77c3bcd
ci: only run SSI tests on master, release proposals, and labeled PRs …
rochdev May 21, 2026
d91d86c
fix(electron): increase assertSomeTraces timeout for IPC window tests…
rochdev May 21, 2026
9c1d8a3
ci: work around actions/cache windows flakiness (#8584)
rochdev May 21, 2026
40952a1
chore(deps): bump the ai-and-llm group across 1 directory with 8 upda…
dependabot[bot] May 22, 2026
e71fec2
chore(deps): bump uuid from 9.0.1 to 14.0.0 in /benchmark/sirun/start…
dependabot[bot] May 22, 2026
0afd4e8
chore(deps-dev): bump the dev-minor-and-patch-dependencies group acro…
dependabot[bot] May 22, 2026
fc01ded
ci(test-optimization): build versioned Playwright Docker image in GHC…
rochdev May 22, 2026
6676a1f
test(profiling): bump OOM extension size to 20MB for Node 22+ headroo…
szegedi May 22, 2026
aae2548
feat(azure/cosmos): add Azure CosmosDB integration (#7943)
rithikanarayan May 22, 2026
6df730a
fix(ci): restore azure-cosmos lint and fix electron packaging on Node…
rochdev May 22, 2026
60701e5
add openai error type (#8605)
sabrenner May 22, 2026
9249fde
fix(ci): fix azure-functions cosmosdb test regressions (#8610)
rochdev May 22, 2026
18df39a
fix(ci): replace setup-bun with npm install to avoid GitHub rate limi…
rochdev May 22, 2026
2630d70
chore(deps): bump qs from 6.15.1 to 6.15.2 in /benchmark/sirun/startu…
dependabot[bot] May 22, 2026
5ae0a52
chore(ci): update dd-sts-action to v1.0.3 (#8603)
rochdev May 22, 2026
6ca9d60
ci(coverage): patch istanbul-lib-coverage's getLineCoverage in postin…
BridgeAR May 22, 2026
041a3ed
chore(deps): bump qs from 6.15.0 to 6.15.2 (#8612)
dependabot[bot] May 22, 2026
f4e4950
fix(ci): always write flakiness report and fire Slack notification (#…
rochdev May 22, 2026
2022ea4
chore(log): drop the orphaned StructuredLogPlugin subclass (#8579)
BridgeAR May 23, 2026
2871997
chore(deps): bump the test-versions group across 1 directory with 2 u…
dependabot[bot] May 25, 2026
5362cd5
test(test-optimization): replace nock with direct stub in git_metadat…
rochdev May 25, 2026
a184d67
chore: deactivate eslint-require-boolean-assert-message (#8620)
BridgeAR May 25, 2026
edecbd9
chore(deps): bump the ai-and-llm group across 1 directory with 10 upd…
dependabot[bot] May 25, 2026
69f7de1
fix(eslint): skip autofix on ${} in string literal (#8627)
watson May 25, 2026
1178938
[test optimization] support playwright 1.60 with rewriter hooks (#8590)
juan-fernandez May 26, 2026
1dedf58
chore(deps): bump the ai-and-llm group across 1 directory with 4 upda…
dependabot[bot] May 26, 2026
adcb076
perf(plugin): drop per-publish storage lookup and handler rest-spread…
BridgeAR May 26, 2026
3010f4d
chore(test): bump mongodb to 7.2.0 and mongoose to 9.6.2 (#8533)
BridgeAR May 26, 2026
961b130
fix(graphql): fix field-type tag, release contexts WeakMap, and more …
BridgeAR May 26, 2026
86ce8b0
fix(aws-sdk): hook @smithy/core/client.Client.send for >=3.1046 clien…
BridgeAR May 26, 2026
f77c597
chore(ci) update one-pipeline (#8636)
gh-worker-campaigns-3e9aa4[bot] May 26, 2026
a91929a
chore(deps): bump the test-optimization group across 1 directory with…
dependabot[bot] May 26, 2026
6007d91
test-optimization(feat): Add cypress command spans (analog to playwri…
cbasitodx May 26, 2026
b2fbf7b
chore(deps): bump @datadog/datadog-ci from 5.16.0 to 5.17.0 in /.gith…
dependabot[bot] May 26, 2026
ffd74c3
feat(oracledb): inject DBM SQL comment (#8481)
bojbrook May 26, 2026
69b15c6
feat(opentracing): tag accessor API on span context + lint rule (#8491)
bengl May 26, 2026
e261ee0
perf(shimmer): reuse name and length descriptor literals (#8515)
BridgeAR May 26, 2026
3931a6e
perf(propagation): cheap extract on carriers without propagation cont…
BridgeAR May 26, 2026
9e9dc8f
perf(router): consolidate per-request state, drop redundant ALS read …
BridgeAR May 26, 2026
b295ab7
perf(mongodb): fast path sanitiseAndStringify for flat-primitive filt…
BridgeAR May 26, 2026
5c5920d
feat(kafkajs): instrument producer.sendBatch (#8403)
BridgeAR May 26, 2026
d7abcff
feat(dns): instrument dns.promises API (#8404)
BridgeAR May 26, 2026
b84aaca
perf(span): fast-path setTag for the common non-sampling case (#8640)
pabloerhard May 26, 2026
04c00ae
perf(profiler): skip redundant setContext under AsyncContextFrame (#8…
szegedi May 26, 2026
bea0e25
test(http2): avoid port reuse in server tests (#8641)
rochdev May 26, 2026
e3554e0
test(http2): fix flaky cancelled-request span assertion (#8642)
rochdev May 26, 2026
908cc03
ci(node): replace version cache with pinned versions from test/plugin…
rochdev May 26, 2026
7fbb53f
add workflow to validate pull request title and sync labels (#8196)
rochdev May 27, 2026
8366855
feat(dbm): add dynamic_service propagation mode (#8592)
amarziali May 27, 2026
4ab33ec
chore(deps): bump the serverless group across 1 directory with 13 upd…
dependabot[bot] May 27, 2026
a52d6ba
chore(deps-dev): bump the dev-minor-and-patch-dependencies group acro…
dependabot[bot] May 27, 2026
137e1db
chore(deps): bump oxc-parser from 0.130.0 to 0.132.0 in the runtime-m…
dependabot[bot] May 27, 2026
d820757
chore(deps): bump the gh-actions-packages group across 3 directories …
dependabot[bot] May 27, 2026
8b8ba72
chore(deps-dev): bump eslint-plugin-jsdoc from 62.9.0 to 63.0.0 (#8648)
dependabot[bot] May 27, 2026
78c9b37
fix(plugins): scope extractIp per-plugin instead of module-level (#8508)
BridgeAR May 27, 2026
78031ac
fix(dbm): rename _dd.dbm.propagation_hash to _dd.propagated_hash (#8643)
tlhunter May 27, 2026
850440d
fix(ci): add unzip to Playwright Docker image (#8615)
rochdev May 27, 2026
8fb9e63
ci(playwright): install libatomic for Node 26 (#8657)
juan-fernandez May 27, 2026
b466fbe
feat: add Node.js 26 support (#8429)
BridgeAR May 27, 2026
a529f60
test(profiling): stabilize Poisson sampling filter spec (#8659)
szegedi May 27, 2026
268a1e0
chore(release): replace semver-major exclusion with only-land-on-next…
rochdev May 27, 2026
554574f
perf(fastify): fast-path addHook wrapper when no parser channels have…
BridgeAR May 27, 2026
f433e44
perf(span): write tags directly on _tags in setTag and addTags (#8507)
BridgeAR May 27, 2026
6a5ba3c
chore: update protobufjs, ttlcache, and code-transformer (#8656)
BridgeAR May 27, 2026
534246b
perf(graphql): tighten resolver execute hot path (#8498)
BridgeAR May 27, 2026
4e89777
feat(http,http2): apply http.endpoint and queryStringObfuscation to c…
BridgeAR May 27, 2026
efefbad
feat(tracing): stamp manual spans through span.finish() resolution (#…
pabloerhard May 27, 2026
ca3f8c6
perf(plugins/util/web): trim request-lifecycle helper work (#8492)
BridgeAR May 27, 2026
5456cee
fix(llmobs): cover every LLMObs span registration with OTel bridge ta…
ZStriker19 May 27, 2026
5cb6e85
perf(pino): inject dd into the JSON line, skip the Proxy view (#8501)
BridgeAR May 27, 2026
ee38a8d
ci: pin all Windows runners to windows-2022 (#8675)
rochdev May 27, 2026
4fa0a61
fix(ci): install Playwright browser dependencies (#8671)
juan-fernandez May 28, 2026
03903ee
[test optimization] report TIA line coverage totals in jest (#8541)
juan-fernandez May 28, 2026
fb2ae63
[test optimization] report ITR line coverage totals in mocha (#8450)
juan-fernandez May 28, 2026
3a2e0f2
[test optimization] prevent payload loss (#8658)
cbasitodx May 28, 2026
1f28a09
chore(ci): Download authanywhere binary over https (#8688)
rithikanarayan May 28, 2026
e8279cf
[test optimization] report ITR line coverage totals in cucumber (#8452)
juan-fernandez May 28, 2026
b19c179
feat(aiguard): evaluate openai SDK calls automatically (#8053)
avara1986 May 28, 2026
a33c7a1
ci(pr-title): allow `bench` as a Conventional Commits type (#8683)
BridgeAR May 28, 2026
afb169c
bench(encode): make the encoding bench reflect a real Node.js HTTP re…
BridgeAR May 28, 2026
1728bf7
ci(test-optimization): install Chrome in Docker image for Selenium te…
rochdev May 28, 2026
5a4505d
fix(oracledb): keep caller SQL when tracing is suppressed (#8685)
BridgeAR May 28, 2026
921d2cf
ci: install gpg before Codecov upload to fix intermittent failures (#…
rochdev May 28, 2026
033efdc
fix(debugger): generalize @-prefix ref desugaring (#8628)
watson May 28, 2026
a0e5385
ci(profiling): capture Windows crash dumps via WER LocalDumps (#8593)
szegedi May 28, 2026
65976ab
feat(nats): experimental support for @nats-io/nats-core / @nats-io/tr…
tlhunter May 28, 2026
7b5dde0
docs(llmobs): drop restated category rules from the LLMObs skills (#8…
BridgeAR May 28, 2026
65ba153
fix(ts): add interface DatabaseInstrumentation into v5 ts file (#8690)
pabloerhard May 28, 2026
96bdfeb
chore(ci): fold codeowners-audit and verify-exercised-tests into npm …
BridgeAR May 28, 2026
e90008c
feat(openfeature): add FFE span enrichment for APM traces (#8343)
sameerank May 28, 2026
6bbb774
test(debugger): fix zombie processes causing flaky redact tests on No…
rochdev May 28, 2026
b0f2d58
fix(ci): cancel running workflows on all-green timeout, reduce retrie…
rochdev May 28, 2026
1378e05
ci: simplify pr-title workflow triggers and condition (#8695)
rochdev May 28, 2026
498829d
test(appsec): drain preload span before RASP SSRF axios tests (#8652)
rochdev May 28, 2026
79d1fcd
fix(ci): rerun only failed jobs for cancelled workflows in all-green …
rochdev May 28, 2026
1b6824b
ci(test-optimization): fix flaky cypress@latest before-hook timeout (…
rochdev May 28, 2026
b3c0579
chore(deps): update @apm-js-collab/code-transformer to 0.13.0 (#8631)
rochdev May 28, 2026
e597254
fix(hono): set resource name for single-handler routes (#8100)
wconti27 May 28, 2026
7a3e0d9
docs(types): add missing properties into v5 ts file (#8692)
pabloerhard May 28, 2026
008031a
ci: add retry with 60s delay to coverage, dd-sts-api-key, and node ac…
rochdev May 28, 2026
f69fb12
feat(cypress): report TIA line coverage totals in cypress (#8453)
juan-fernandez May 29, 2026
f4eddad
fix(jest): gate coverage backfill by jest version (#8700)
juan-fernandez May 29, 2026
b336e43
fix(jest): report coverage metric without skipped suites (#8702)
juan-fernandez May 29, 2026
ed6fa39
chore(cypress): bump latest test version (#8701)
juan-fernandez May 29, 2026
0d83b45
fix(mongodb): unify obfuscateQuery sanitizer and speed up query taggi…
BridgeAR May 29, 2026
a8993f2
perf(format): split addTag into typed helpers to kill throwaway {} al…
BridgeAR May 29, 2026
dd3af5e
perf(encode): consolidate the msgpack hot path (#8504)
BridgeAR May 29, 2026
0924965
revert: feat(http,http2): apply http.endpoint and queryStringObfuscat…
BridgeAR May 29, 2026
51fb46d
chore(deps): bump axios from 1.15.2 to 1.16.0 in /integration-tests/w…
dependabot[bot] May 29, 2026
ce8167c
ci(project): remove supported integrations push jobs (#8707)
juan-fernandez May 29, 2026
19c06ca
v5.105.0
BridgeAR May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
78 changes: 13 additions & 65 deletions .agents/skills/llmobs-integration/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,31 @@
---
name: llmobs-integration
description: |
This skill should be used when the user asks to "add LLMObs support", "create an LLMObs plugin",
"instrument an LLM library", "add LLM Observability", "add llmobs", "add llm observability",
"instrument chat completions", "instrument streaming", "instrument embeddings",
"instrument agent runs", "instrument orchestration", "instrument LLM",
"LLMObsPlugin", "LlmObsPlugin", "getLLMObsSpanRegisterOptions", "setLLMObsTags",
"tagLLMIO", "tagEmbeddingIO", "tagRetrievalIO", "tagTextIO", "tagMetrics", "tagMetadata",
"tagSpanTags", "tagPrompt", "LlmObsCategory", "LlmObsSpanKind",
"span kind llm", "span kind workflow", "span kind agent", "span kind embedding",
"span kind tool", "span kind retrieval",
"openai llmobs", "anthropic llmobs", "genai llmobs", "google llmobs",
"langchain llmobs", "langgraph llmobs", "ai-sdk llmobs",
"llm span", "llmobs span event", "model provider", "model name",
"CompositePlugin llmobs", "llmobs tracing", "VCR cassettes",
or needs to build, modify, or debug an LLMObs plugin for any LLM library in dd-trace-js.
Use when adding, debugging, or modifying LLMObs plugins for an LLM library
in dd-trace-js. Triggers: "add LLMObs support", "instrument chat
completions / streaming / embeddings / agent runs / orchestration / tool
calls / retrieval", "LLMObsPlugin", "getLLMObsSpanRegisterOptions",
"setLLMObsTags", "LlmObsCategory", "LlmObsSpanKind", any provider tag
("openai" / "anthropic" / "genai" / "google" / "langchain" / "langgraph" /
"ai-sdk" llmobs), "VCR cassettes".
---

# LLM Observability Integration Skill

## Purpose

This skill helps you create LLMObs plugins that instrument LLM library operations and emit proper span events for LLM observability in dd-trace-js. Supported operation types include:

- **Chat completions** — standard request/response LLM calls
- **Streaming chat completions** — streamed token-by-token responses
- **Embeddings** — vector embedding generation
- **Agent runs** — autonomous LLM agent execution loops
- **Orchestration** — multi-step workflow and graph execution (langgraph, etc.)
- **Tool calls** — tool/function invocations
- **Retrieval** — vector DB / RAG operations

## When to Use

- Creating a new LLMObs plugin for an LLM library
- Adding LLMObs support to an existing tracing integration
- Understanding LLMObsPlugin architecture and patterns
- Determining how to instrument a new LLM package
This skill covers creating LLMObs plugins that instrument LLM library operations and emit span events. Supported operations: chat completions (streaming and non-streaming), embeddings, agent runs, orchestration (workflows / graphs), tool calls, retrieval (RAG / vector DB).

## Core Concepts

### 1. LLMObsPlugin Base Class

All LLMObs plugins extend the `LLMObsPlugin` base class, which provides the core instrumentation framework.

**Key responsibilities:**
- **Span registration**: Define span metadata (model provider, model name, span kind)
- **Tag extraction**: Extract and tag LLM-specific data (messages, metrics, metadata)
- **Context management**: Handle span lifecycle and parent context
All LLMObs plugins extend `LLMObsPlugin`. Two methods must be implemented:

**Required methods to implement:**
- `getLLMObsSpanRegisterOptions(ctx)` - Returns span registration options (modelProvider, modelName, kind, name)
- `setLLMObsTags(ctx)` - Extracts and tags LLM data (input/output messages, metrics, metadata)
- `getLLMObsSpanRegisterOptions(ctx)` — returns `{ modelProvider, modelName, kind, name }`.
- `setLLMObsTags(ctx)` — extracts and tags input / output messages, token metrics, and model metadata.

**Plugin lifecycle:**
1. `start(ctx)` - Registers span with LLMObs, captures context
2. Operation executes (chat completion call)
3. `asyncEnd(ctx)` - Calls `setLLMObsTags()` to extract and tag data
4. `end(ctx)` - Restores parent context
Lifecycle: `start(ctx)` registers the span and captures context; the wrapped operation runs; `asyncEnd(ctx)` calls `setLLMObsTags()`; `end(ctx)` restores the parent.

See [references/plugin-architecture.md](references/plugin-architecture.md) for complete implementation details.
See [references/plugin-architecture.md](references/plugin-architecture.md) for the full implementation surface.

### 2. Package Category System

Expand Down Expand Up @@ -166,15 +132,6 @@ See [references/message-extraction.md](references/message-extraction.md) for pro

See [references/plugin-architecture.md](references/plugin-architecture.md) for step-by-step implementation guide.

## Common Patterns

Based on category:

- **LLM_CLIENT**: Messages in array, straightforward extraction from `result.choices[0]` or equivalent
- **MULTI_PROVIDER**: Handle multiple provider formats with provider detection logic
- **ORCHESTRATION**: May use `'workflow'` span kind instead of `'llm'`, focus on lifecycle events
- **INFRASTRUCTURE**: Protocol-specific instrumentation, may not have traditional messages

## Plugin Registration

All plugins must export an array:
Expand All @@ -192,12 +149,3 @@ For detailed information, see:
- [references/category-detection.md](references/category-detection.md) - Package classification heuristics and detection process
- [references/message-extraction.md](references/message-extraction.md) - Provider-specific message format patterns
- [references/reference-implementations.md](references/reference-implementations.md) - Working plugin examples (Anthropic, Google GenAI)

## Key Principles

1. **Category determines approach** - Always detect category first using decision tree
2. **Use enum values** - Reference `LlmObsCategory` and `LlmObsSpanKind` enums from models
3. **Standard message format** - Always convert to `[{content, role}]` format
4. **Complete metadata** - Extract all available model parameters and token metrics
5. **Error handling** - Handle failures gracefully (empty messages on error)
6. **Test strategy follows category** - VCR for clients, pure functions for orchestration
144 changes: 18 additions & 126 deletions .agents/skills/llmobs-testing/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,27 @@
---
name: llmobs-testing
description: |
This skill should be used when the user asks to "write LLMObs tests", "add tests for LLM Observability",
"test an LLMObs plugin", "llmobs test", "llmobs spec", "test llm observability",
"assertLlmObsSpanEvent", "useLlmObs", "getEvents",
"MOCK_STRING", "MOCK_NOT_NULLISH", "MOCK_NUMBER", "MOCK_OBJECT",
"VCR cassette", "record cassette", "replay cassette", "vcr proxy", "llmobs cassette",
"test chat completions", "test streaming", "test embeddings", "test agent runs",
"test orchestration", "test workflow", "llmobs span event",
"LLMObs test strategy", "LlmObsCategory test",
"LLM_CLIENT test", "MULTI_PROVIDER test", "ORCHESTRATION test", "INFRASTRUCTURE test",
"span kind llm test", "span kind workflow test",
"inputMessages", "outputMessages", "token metrics", "llmobs span validation",
"cassette not generated", "re-record cassette", "127.0.0.1:9126",
or needs to write, modify, or debug tests for any LLMObs plugin in dd-trace-js.
Use when writing, modifying, or debugging tests for an LLMObs plugin in
dd-trace-js. Triggers: "write LLMObs tests", "test an LLMObs plugin",
"assertLlmObsSpanEvent", "useLlmObs", "getEvents", any MOCK_* matcher
("MOCK_STRING" / "MOCK_NOT_NULLISH" / "MOCK_NUMBER" / "MOCK_OBJECT"),
"VCR cassette", "vcr proxy", "127.0.0.1:9126", any LlmObsCategory test
("LLM_CLIENT" / "MULTI_PROVIDER" / "ORCHESTRATION" / "INFRASTRUCTURE").
---

# LLM Observability Testing Skill

## ⚠️ CRITICAL: Read This First ⚠️
## Determine the package category first

**BEFORE writing any test, you MUST determine the package category.**
**Before writing any test, determine the package's `LlmObsCategory`.** Category picks the test strategy (VCR or not), the span kind, and the test structure. The wrong category produces tests that pass against the wrong contract — VCR cassettes for a workflow library produce empty recordings; pure-function tests for an HTTP-call wrapper miss the network surface entirely.

**The category determines EVERYTHING:**
- Whether to use VCR or not
- What spanKind to use
- What test structure to follow
- What examples to study
Quick check:

**IF YOU USE THE WRONG CATEGORY STRATEGY, THE TEST WILL FAIL.**
- Direct HTTP calls to an LLM provider? → `LLM_CLIENT` or `MULTI_PROVIDER` — VCR.
- Workflow / graph orchestration with state? → `ORCHESTRATION` — no VCR, pure functions, real LLM as the orchestration node.
- Protocol / server implementation? → `INFRASTRUCTURE` — mock server.

**Categories are defined in the `LlmObsCategory` enum.**

**Quick check:**
- Does package make HTTP calls to LLM APIs? → `LLM_CLIENT` or `MULTI_PROVIDER` (use VCR)
- Does package orchestrate workflows/graphs? → `ORCHESTRATION` (NO VCR, pure functions)
- Does package implement protocols/servers? → `INFRASTRUCTURE` (mock servers)

**See [references/category-strategies.md](references/category-strategies.md) for FORBIDDEN vs REQUIRED patterns per category.**

---

## Purpose

This skill helps you write comprehensive LLMObs tests that validate span events, messages, tokens, and metadata using category-appropriate strategies.

## When to Use

- Writing tests for a new LLMObs plugin (ALWAYS check category first)
- Understanding category-specific test strategies
- Learning VCR cassettes (for LLM_CLIENT/MULTI_PROVIDER only)
- Learning assertion patterns for LLMObs spans
See [references/category-strategies.md](references/category-strategies.md) for the FORBIDDEN-vs-REQUIRED matrix per category.

## Core Testing Concepts

Expand Down Expand Up @@ -98,52 +69,13 @@ See [references/vcr-cassettes.md](references/vcr-cassettes.md) for recording pro

### 3. Category-Specific Test Strategies

Test strategy is determined by the `LlmObsCategory` enum.

#### LlmObsCategory.LLM_CLIENT & LlmObsCategory.MULTI_PROVIDER

**Strategy:** VCR with real API calls via proxy

**Characteristics:**
- Use VCR proxy baseURL
- Record cassettes with real API keys
- Tests make actual HTTP calls (recorded once)
- Validate LLM-specific data (messages, tokens, model info)

**Span kind:** Usually `'llm'` for chat completions
The category-determination block at the top maps category to strategy. Non-obvious bits per category:

See [references/category-strategies.md](references/category-strategies.md) for detailed patterns.
- **LLM_CLIENT / MULTI_PROVIDER**: VCR proxy baseURL is `http://127.0.0.1:9126/vcr/{provider}`. Span kind: `'llm'`. Cassettes record once with real API keys; CI replays them.
- **ORCHESTRATION**: Span kind: `'workflow'` or `'agent'`, never `'llm'`. No VCR, no real API calls — the orchestrator itself doesn't make HTTP calls, it coordinates libraries that do. Mock LLM responses as plain return values from the node so the test exercises the workflow execution, not the provider API.
- **INFRASTRUCTURE**: Mock server, protocol-specific validation, no VCR.

#### LlmObsCategory.ORCHESTRATION

**Strategy:** Pure function tests, NO VCR, NO real API calls

**Characteristics:**
- NO VCR cassettes
- NO HTTP calls to LLM providers
- Use library's native APIs with mock/test LLM responses
- Focus on workflow lifecycle, not API calls
- **CRITICAL:** Still test with actual LLM as orchestration node (not mocked completely)

**Span kind:** Usually `'workflow'` or `'agent'`, NOT `'llm'`

**Example concept:**
- LangGraph invokes nodes that call LLMs
- LangGraph itself doesn't make HTTP calls
- Test LangGraph's workflow execution, not the underlying LLM API

See [references/category-strategies.md](references/category-strategies.md) for orchestration test patterns.

#### LlmObsCategory.INFRASTRUCTURE

**Strategy:** Mock server tests

**Characteristics:**
- Mock server implementation
- Protocol-specific validation
- NO VCR

See [references/category-strategies.md](references/category-strategies.md) for infrastructure test patterns.
See [references/category-strategies.md](references/category-strategies.md) for per-category patterns.

### 4. Assertion Patterns

Expand Down Expand Up @@ -221,38 +153,6 @@ On errors, validate:
- Error object exists: `error: MOCK_OBJECT`
- Span still created (not dropped)

## Common Patterns by Category

### LLM_CLIENT / MULTI_PROVIDER Pattern
- Use VCR proxy baseURL
- Test chat completions with various parameters
- Validate real API response structure
- Test streaming (if supported)
- Test error responses

### ORCHESTRATION Pattern
- NO VCR
- Test workflow lifecycle methods (invoke, stream, run)
- Use mock LLM responses within workflow
- Focus on workflow span, not LLM spans
- Validate workflow-specific metadata (state, nodes, edges)

### INFRASTRUCTURE Pattern
- Mock server setup
- Protocol-specific validation
- Connection/transport testing

## Best Practices

1. **Use MOCK_* for non-deterministic values** - Output text, token counts, error objects
2. **Use exact values for inputs** - You control input messages and parameters
3. **Always validate spanKind** - Required for every span
4. **Match category to test strategy** - VCR for clients, pure functions for orchestration
5. **Test error paths** - Verify empty outputs and error objects on failures
6. **Group by method** - Organize tests by instrumented method
7. **Load modules fresh** - Use beforeEach() to avoid state leakage
8. **Cover edge cases** - Empty messages, missing metadata, streaming

## References

For detailed information, see:
Expand All @@ -261,11 +161,3 @@ For detailed information, see:
- [references/vcr-cassettes.md](references/vcr-cassettes.md) - VCR recording process, cassette management, troubleshooting
- [references/assertion-helpers.md](references/assertion-helpers.md) - Complete assertLlmObsSpanEvent API, matchers, patterns
- [references/category-strategies.md](references/category-strategies.md) - Detailed test strategies for each LlmObsCategory

## Key Principles

1. **Category determines strategy** - Always check `LlmObsCategory` to pick test approach
2. **Orchestrators don't use VCR** - They don't make direct API calls
3. **Use matchers for variance** - Real API responses vary, use MOCK_* matchers
4. **Validate message format** - Always check `{content, role}` structure
5. **Test with real behavior** - For orchestrators, use actual LLM as node (not fully mocked)
4 changes: 4 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@
/packages/dd-trace/src/remote_config/ @DataDog/apm-sdk-capabilities-js
/packages/dd-trace/test/remote_config/ @DataDog/apm-sdk-capabilities-js
/packages/dd-trace/src/baggage.js @DataDog/apm-sdk-capabilities-js
/packages/dd-trace/test/baggage.spec.js @DataDog/apm-sdk-capabilities-js
/packages/dd-trace/src/sampler.js @DataDog/apm-sdk-capabilities-js
/packages/dd-trace/test/sampler.spec.js @DataDog/apm-sdk-capabilities-js
/packages/dd-trace/src/priority_sampler.js @DataDog/apm-sdk-capabilities-js
Expand Down Expand Up @@ -238,6 +239,8 @@
/.github/actions/dd-sts-app-key/action.yml @Datadog/lang-platform-js
/.github/actions/dd-sts-api-key/action.yml @Datadog/lang-platform-js
/.github/actions/push_to_test_optimization/ @DataDog/ci-app-libraries
/.github/playwright/ @DataDog/ci-app-libraries
/.github/selenium/ @DataDog/ci-app-libraries
/.github/actions/upload-node-reports/action.yml @Datadog/lang-platform-js
/.github/chainguard @DataDog/sdlc-security
/.github/codeql_config.yml @DataDog/sdlc-security
Expand All @@ -259,6 +262,7 @@
/.github/workflows/serverless.yml @DataDog/serverless-aws @DataDog/apm-serverless
/.github/workflows/llmobs.yml @DataDog/ml-observability
/.github/workflows/openfeature.yml @DataDog/feature-flagging-and-experimentation-sdk
/.github/workflows/pr-title.yml @DataDog/lang-platform-js
/.github/workflows/profiling.yml @DataDog/profiling-js
/.github/workflows/system-tests.yml @DataDog/asm-js
/.github/workflows/test-optimization.yml @DataDog/ci-app-libraries
Expand Down
51 changes: 16 additions & 35 deletions .github/actions/coverage/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,40 +16,21 @@ inputs:
runs:
using: composite
steps:
- name: Verify coverage output
shell: bash
run: node scripts/verify-coverage.js --flags "${{ inputs.flags }}"

# `master-coverage` is the flag .codecov.yml gates codecov/patch on. Attach
# it only on PRs targeting master so release-branch PRs auto-pass.
- name: Compute Codecov flags
id: codecov-flags
shell: bash
env:
JOB_FLAGS: ${{ inputs.flags }}
EVENT_NAME: ${{ github.event_name }}
BASE_REF: ${{ github.base_ref }}
run: |
flags="$JOB_FLAGS"
if [ "$EVENT_NAME" = "pull_request" ] && [ "$BASE_REF" = "master" ]; then
flags="${flags:+$flags,}master-coverage"
fi
echo "value=$flags" >> "$GITHUB_OUTPUT"

- name: Upload coverage to Codecov
uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
with:
flags: ${{ steps.codecov-flags.outputs.value }}

- name: Install datadog-ci
if: always()
uses: ./.github/actions/datadog-ci

- name: Upload coverage to Datadog
if: always()
# Retry once on failure to work around transient issues (e.g. flaky
# Codecov upload network calls).
- id: attempt
uses: ./.github/actions/coverage/upload
continue-on-error: true
with:
flags: ${{ inputs.flags }}
report-dir: ${{ inputs.report-dir }}
dd_api_key: ${{ inputs.dd_api_key }}
- if: steps.attempt.outcome == 'failure'
shell: bash
run: datadog-ci coverage upload ${FLAGS:+--flags "$FLAGS"} .
env:
DD_API_KEY: ${{ inputs.dd_api_key }}
FLAGS: ${{ inputs.flags }}
run: sleep 60
- if: steps.attempt.outcome == 'failure'
uses: ./.github/actions/coverage/upload
with:
flags: ${{ inputs.flags }}
report-dir: ${{ inputs.report-dir }}
dd_api_key: ${{ inputs.dd_api_key }}
Loading
Loading