Skip to content

init#11411

Draft
mhlidd wants to merge 20 commits into
masterfrom
mhlidd/otlp_runtime_metrics_follow_up
Draft

init#11411
mhlidd wants to merge 20 commits into
masterfrom
mhlidd/otlp_runtime_metrics_follow_up

Conversation

@mhlidd
Copy link
Copy Markdown
Contributor

@mhlidd mhlidd commented May 18, 2026

What Does This Do

Follow-up to the parent PR for maximo/otlp-runtime-metrics that expands the OTLP JVM runtime metrics surface and gates Development-status metrics behind a new opt-out flag.

New config

  • dd.metrics.otel.experimental.enabled (default: true) — mirrors OTel's otel.instrumentation.runtime-telemetry.emit-experimental-metrics. When false, only metrics marked Stable in the OTel JVM semantic conventions are emitted; Development-status metrics are suppressed.

Metrics added or reclassified (all under the datadog.jvm.runtime scope, OTel-native names)

Metric OTel status When emitted
jvm.memory.used_after_last_gc Stable Always (moved into the always-on memory group)
jvm.memory.init Development Only when experimental flag is on
jvm.buffer.memory.used / limit / count Development Only when experimental flag is on
jvm.system.cpu.utilization Development Only when experimental flag is on
jvm.system.cpu.load_1m Development Only when experimental flag is on
jvm.file_descriptor.count / limit Development Only when experimental flag is on, and only on Unix-like JVMs (UnixOperatingSystemMXBean)

Value-guard alignment with OTel reference implementation

  • jvm.memory.limit and jvm.memory.init now skip recording only when getMax() / getInit() returns the documented -1 sentinel (was > 0, which incorrectly also skipped legitimate 0 values).
  • All other per-metric guards (>= 0, null checks) match the corresponding callbacks in io.opentelemetry.instrumentation.runtimetelemetry.internal.*.

Misc

  • Extracted sunOsBean() helper to remove duplicated instanceof OperatingSystemMXBean cast logic between registerCpuMetrics() and the new registerSystemCpuMetrics().
  • Added debug logs when an MXBean isn't available so it's obvious why a metric didn't show up.
  • Test coverage extended to assert all newly added metric names are registered, with platform-conditional checks for the Unix-only file descriptor metrics.

Motivation

The parent PR established the OTLP JVM runtime metrics pipeline but only emitted a subset of the OTel JVM semantic conventions. This follow-up brings the surface in line with what opentelemetry-java-instrumentation's runtime-telemetry library emits, and adds the standard experimental-metrics opt-out so users who want only the Stable subset (smaller cardinality, fewer dashboard surprises) can disable Development metrics without losing the integration entirely.

Aligning the value guards with OTel's reference implementation prevents two real-world divergences:

  1. Without the 0 vs -1 fix, uncapped non-heap pools (where getMax() == 0 on some JVM/version combos) would silently produce no jvm.memory.limit data point — they should publish 0 to indicate "no limit observed."
  2. The experimental gate ensures dashboards built against OTel's stable-only output won't differ between OTel SDK collection and DD-agent collection.

Additional Notes

  • No change to JMXFetch behavior beyond passing the new flag through JvmOtlpRuntimeMetrics.start(...). The OTLP_JMX_CONFIG-skip path is unchanged.
  • The OTel-spec env var otel.instrumentation.runtime-telemetry.emit-experimental-metrics is captured in OtelEnvironmentConfigSource so an unmodified OTel-style config picks up the flag automatically.

Contributor Checklist

Jira ticket: [PROJ-IDENT]

@mhlidd
Copy link
Copy Markdown
Contributor Author

mhlidd commented May 18, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62d9b50d1d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

capture(
METRICS_OTEL_EXPERIMENTAL_ENABLED,
getOtelProperty(
"otel.instrumentation.runtime-telemetry.emit-experimental-metrics",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor the documented OTel experimental flag

For users configuring OTLP metrics with the documented OpenTelemetry runtime option (OTEL_INSTRUMENTATION_RUNTIME_TELEMETRY_EMIT_EXPERIMENTAL_TELEMETRY, see https://docs.datadoghq.com/opentelemetry/integrations/runtime_metrics/), this lookup never sees the value because it checks otel.instrumentation.runtime-telemetry.emit-experimental-metrics instead. Since Config falls back to DEFAULT_METRICS_OTEL_EXPERIMENTAL_ENABLED = true, setting the documented OTel flag to false still emits the experimental JVM metrics gated by this change; map the telemetry property (or support both aliases) so the opt-out works.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs need to be updated.

Base automatically changed from maximo/otlp-runtime-metrics to master May 19, 2026 18:23
@datadog-prod-us1-5
Copy link
Copy Markdown
Contributor

datadog-prod-us1-5 Bot commented May 19, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 98 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-java | agent_integration_tests   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). 236 tests failed due to IllegalArgumentException thrown in ConfigHelper.java:162

DataDog/apm-reliability/dd-trace-java | test_base: [11, 2/4]   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). 632 tests failed due to IllegalArgumentException at ConfigHelper.java:162

DataDog/apm-reliability/dd-trace-java | test_base: [17, 3/4]   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). IllegalArgumentException thrown in multiple tests due to invalid parameters causing significant failures.

View all 98 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 52e3a26 | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants