Skip to content

benchmarks: reduce validation timeout to 4s#5797

Draft
p-datadog wants to merge 1 commit into
masterfrom
reduce-benchmark-validation-timeout
Draft

benchmarks: reduce validation timeout to 4s#5797
p-datadog wants to merge 1 commit into
masterfrom
reduce-benchmark-validation-timeout

Conversation

@p-datadog
Copy link
Copy Markdown
Member

What does this PR do?

Tightens expect_in_fork(timeout_seconds: ...) in every validate_benchmarks_spec.rb to 4 seconds. Six files touched; current values range from no explicit timeout (defaults to 10s in expect_in_fork) up to 20s.

Motivation:

Observed validation durations on master @ 07e4d7bc92 (Unit Tests run 26178282614, Ruby 3.4): max is 2.41s (tracing_trace), with most under 1s. The 10–20s timeouts have ~5–10x headroom — far too loose to catch bitrot quickly. A 4s cap is ~1.5–2x over the slowest legitimate validation, tightens the bitrot signal, and still gives ample margin.

symbol_database_baseline_matrix regularly runs 6–8s — but it is being removed in a separate PR (#5795), so it does not need to fit under the new ceiling here. If 5795 lands first there will be no conflict (the SymDB spec file is deleted by 5795); if this PR lands first, the SymDB validation will start timing out — which is the desired signal that the spec needs to go.

Change log entry

None.

Additional Notes:

Draft — needs CI to confirm the 4s ceiling holds across all Rubies. Some benchmarks may run slower on Ruby 2.5/2.6 than on 3.4; if any cross 4s on those, the per-benchmark timeout can be selectively raised.

How to test the change?

CI will run the validate specs across the full Ruby matrix. Watch for any expect_in_fork timeouts and adjust per-benchmark if needed.

Tighten the expect_in_fork timeout for all validate_benchmarks_spec.rb
tests to 4 seconds. Recent observed validation durations on master are
below 2.5s (excluding the SymDB baseline_matrix outlier at 6-7s, which
is being removed separately). A 4s ceiling is ~2x headroom over the
slowest legitimate validation, catching bitrot regressions faster while
keeping safe margin.

- spec/validate_benchmarks_spec.rb: add explicit timeout_seconds: 4
- spec/datadog/di/validate_benchmarks_spec.rb: 10/20 → 4 (single value, drops the conditional)
- spec/datadog/tracing/validate_benchmarks_spec.rb: 20 → 4
- spec/datadog/profiling/validate_benchmarks_spec.rb: 15 → 4
- spec/datadog/error_tracking/validate_benchmarks_spec.rb: add explicit timeout_seconds: 4
- spec/datadog/symbol_database/validate_benchmarks_spec.rb: add explicit timeout_seconds: 4
@p-datadog p-datadog added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label May 20, 2026
@dd-octo-sts dd-octo-sts Bot added the dev/testing Involves testing processes (e.g. RSpec) label May 20, 2026
@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented May 20, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 21 Pipeline jobs failed

Test Nix | Test Nix (aarch64-linux, 24.05)   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). 1 failure in benchmark test: RuntimeError: Failure or timeout in `expect_in_fork`, STDOUT: `Current pid is 9471`, STDERR: ``

Test macOS | Test (macos-15, 3.0)   View in Datadog   GitHub Actions

🔄 Retry job. This looks flaky and may succeed on retry. 2 test failures in 'Symbol Database benchmarks'. Error: Wait time exhausted in 'expect_in_fork'.

Test macOS | Test (macos-15, 3.1)   View in Datadog   GitHub Actions

🔄 Retry job. This looks flaky and may succeed on retry. 3 failures in Symbol Database benchmarks due to timeout in `expect_in_fork` after 40 attempts.

View all 21 failed jobs.

🧪 1 Test failed in 1 job

Unit Tests | junit   GitHub Actions

All test failures are known flaky — job may pass on retry.

❄️ Known flaky: Symbol Database benchmarks symbol_database_baseline_matrix runs without raising errors from rspec   View in Datadog (Fix with Cursor)
Failure or timeout in \`expect_in_fork\`, STDOUT: \`Current pid is 25133
\`, STDERR: \`\`

Failure/Error: raise "Failure or timeout in \`expect_in_fork\`#{crash_note}, STDOUT: \`#{stdout}\`, STDERR: \`#{stderr}\`", cause: e

RuntimeError:
  Failure or timeout in \`expect_in_fork\`, STDOUT: \`Current pid is 25133
  \`, STDERR: \`\`
./spec/support/synchronization_helpers.rb:67:in 'SynchronizationHelpers#expect_in_fork'
./spec/datadog/symbol_database/validate_benchmarks_spec.rb:17:in 'block (4 levels) in <top (required)>'
...

Not introduced in this PR.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a0c4188 | Docs | Datadog PR Page | Give us feedback!

@ivoanjo
Copy link
Copy Markdown
Member

ivoanjo commented May 21, 2026

Uhhh... be careful with this one. In practice we've seen this be a source of flakiness and this doesn't actually make the tests any slower or faster, since the timeout is never supposed to be hit.

I would suggest instead making sure the only-for-validation run of the test itself is faster, since that's what would speed up CI.

@p-datadog
Copy link
Copy Markdown
Member Author

SymDB benchmarks were taking 7 seconds to validate which then caused test failures post merge in the validation as well. This PR is intended to verify SymDB benchmarks validate in about the same time as the existing benchmarks (which top out at ~2.7 seconds in the run I looked at).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos dev/testing Involves testing processes (e.g. RSpec)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants