Skip to content

fix(aggregate): clamp window duration to minimum of one second#1772

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 5 commits into
mainfrom
tobz/fix-subsecond-aggregate-window-panic
May 29, 2026
Merged

fix(aggregate): clamp window duration to minimum of one second#1772
gh-worker-dd-mergequeue-cf854d[bot] merged 5 commits into
mainfrom
tobz/fix-subsecond-aggregate-window-panic

Conversation

@tobz
Copy link
Copy Markdown
Member

@tobz tobz commented May 29, 2026

Summary

This PR fixes a bug where sub-second window durations would cause the Aggregate transform to panic during sample insertion.

Since we take the window duration as an actual Duration field, we allow for sub-second aggregation windows.. but we also use whole seconds when calculating which bucket to drop a sample into, which means that if our bucket window is sub-second, our whole second math breaks down quickly, leading to division by zero and ultimately panics.

In this PR, we've moved entirely to whole seconds for the window duration, and backed that up with switching to NonZeroU64. This solves both the fractional second and "less than one" problem in one fell swoop.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

How did you test this PR?

Existing tests: unit, integration, correctness, etc.

References

DADP-2

@tobz tobz requested a review from a team as a code owner May 29, 2026 19:48
@tobz tobz added the type/bug Bug fixes. label May 29, 2026
@dd-octo-sts dd-octo-sts Bot added area/components Sources, transforms, and destinations. transform/aggregate Aggregate transform. area/docs Reference documentation. labels May 29, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7ef81589a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/saluki-components/src/transforms/aggregate/mod.rs Outdated
Comment thread docs/agent-data-plane/configuration/dogstatsd.md Outdated
Comment thread lib/saluki-components/src/transforms/aggregate/mod.rs Outdated
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 29, 2026

Binary Size Analysis (Agent Data Plane)

Baseline: 7e8ab3a · Comparison: 2bad6b8 · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 37.86 MiB (baseline) vs 37.92 MiB (comparison)
Size Change: +66.88 KiB (+0.17%)

✅ Binary size difference within threshold

Changes by Module
Module File Size Symbols
figment +80.71 KiB 28
piecemeal +14.06 KiB 16
datadog_protos::trace_piecemeal_include::datadog -13.54 KiB 12
saluki_components::common::datadog -12.87 KiB 28
core -9.60 KiB 559
serde_core +5.14 KiB 55
[sections] +3.61 KiB 5
tonic_prost -3.17 KiB 6
http_body_util -2.37 KiB 59
saluki_components::encoders::datadog +2.27 KiB 23
async_compression +1.88 KiB 7
anon.fc3ad682caf7b2880497e7067237e7df.215.llvm.7916934013044133348 +1.77 KiB 1
anon.7ddb290166b94c74242026ab25815f7e.214.llvm.8073753518458178497 -1.77 KiB 1
saluki_core::data_model::event -1.33 KiB 4
saluki_components::sources::dogstatsd +1.32 KiB 10
anon.26d85bf119f7216053f17bbbaef4187a.22.llvm.17223761884900750783 -1.32 KiB 1
anon.15f69504f5f89847f8b5bae22fe5029b.162.llvm.8017536521535783053 +1.32 KiB 1
anon.9bbec571a13e52a171268c52b419281a.101.llvm.10432929176499966693 -1.26 KiB 1
anon.39d62031b870c797faa8180c679e11e6.95.llvm.10874588374297762649 -1.26 KiB 1
anon.16371e64e5582ace117c914db6f1e9cc.111.llvm.1587808802558188714 +1.26 KiB 1
Detailed Symbol Changes
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.5% +19.8Ki  +0.5% +14.7Ki    [4384 Others]
  [NEW] +12.3Ki  [NEW] +12.1Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::ha091f686e9e8b2f0
  [NEW] +11.7Ki  [NEW] +11.5Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h77de1a22118a46b1
  [NEW] +9.80Ki  [NEW] +9.64Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h9d925db3b6777b7f
  [NEW] +6.18Ki  [NEW] +6.05Ki    figment::value::de::_<impl figment::value::value::Value>::deserialize_from::h9560f60e882663df
  [NEW] +5.88Ki  [NEW] +5.79Ki    matchit::router::Router<T>::insert::h65258e54e686a8f4
  [NEW] +5.73Ki  [NEW] +5.58Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h0ae9cd1d860bd4db
  [NEW] +5.33Ki  [NEW] +5.18Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::ha3e379ed637e2cfa
  [NEW] +4.81Ki  [NEW] +4.66Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::he0c39d6f6ce5e6d6
  [NEW] +4.66Ki  [NEW] +4.50Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_any::h12a72a2c8408cdae
  [NEW] +4.62Ki  [NEW] +4.45Ki    _<core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold::h1210d20d3cee0b37
  [NEW] +4.39Ki  [NEW] +4.24Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h71f00a009e9e98f2
  [NEW] +4.33Ki  [NEW] +4.17Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h466cf63f96e46884
  [NEW] +4.07Ki  [NEW] +3.91Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h68640b8378e6d297
  [NEW] +3.94Ki  [NEW] +3.79Ki    core::ops::function::impls::_<impl core::ops::function::FnMut<A> for &mut F>::call_mut::hf357fe4921885f86
  [DEL] -3.79Ki  [DEL] -3.68Ki    core::iter::traits::iterator::Iterator::try_fold::h099a41e4a0fb80af
  [DEL] -4.32Ki  [DEL] -4.18Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::hb427eaca4ecb0104
  [DEL] -4.61Ki  [DEL] -4.45Ki    core::ops::function::impls::_<impl core::ops::function::FnMut<A> for &mut F>::call_mut::h6f18fe8453bdd4e1
  [DEL] -5.80Ki  [DEL] -5.71Ki    matchit::tree::Node<T>::insert::h6a0346f5d77d70d0
  [DEL] -6.15Ki  [DEL] -6.00Ki    datadog_protos::trace_piecemeal_include::datadog::trace::AgentPayloadBuilder<S>::add_tracer_payloads::hf02c88fe8ced87a4
 -96.4% -15.9Ki -97.0% -15.9Ki    saluki_components::common::datadog::apm::ApmConfig::from_configuration::hefa485e47c90916b
  +0.2% +66.9Ki  +0.2% +60.3Ki    TOTAL

@datadog-prod-us1-4

This comment has been minimized.

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 29, 2026

Regression Detector (Agent Data Plane)

Run ID: 2daeb04d-80e2-4f2b-85e4-e883f3f85b9f
Baseline: 7e8ab3a7 · Comparison: 2bad6b82 · diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment (35)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment goal Δ mean % links
dsd_uds_1mb_3k_contexts_cpu (erratic) cpu ⚪ +8.36 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_cpu (erratic) cpu ⚪ +2.99 metrics profiles logs
dsd_uds_100mb_3k_contexts_cpu (erratic) cpu ⚪ +2.06 metrics profiles logs
otlp_ingest_metrics_5mb_memory memory ⚪ +1.72 metrics profiles logs
otlp_ingest_metrics_5mb_cpu (erratic) cpu ⚪ +1.12 metrics profiles logs
dsd_uds_512kb_3k_contexts_cpu (erratic) cpu ⚪ +0.38 metrics profiles logs
dsd_uds_500mb_3k_contexts_memory memory ⚪ +0.30 metrics profiles logs
quality_gates_rss_idle memory ⚪ +0.21 metrics profiles logs
dsd_uds_100mb_3k_contexts_memory memory ⚪ +0.20 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_memory memory ⚪ +0.20 metrics profiles logs
quality_gates_rss_dsd_heavy memory ⚪ +0.13 metrics profiles logs
dsd_uds_10mb_3k_contexts_throughput throughput ⚪ -0.00 metrics profiles logs
dsd_uds_512kb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_1mb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
dsd_uds_100mb_3k_contexts_throughput throughput ⚪ +0.01 metrics profiles logs
otlp_ingest_logs_5mb_throughput (ignored) throughput ⚪ +0.01 metrics profiles logs
otlp_ingest_metrics_5mb_throughput throughput ⚪ +0.01 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_throughput throughput ⚪ +0.05 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_throughput throughput ⚪ +0.06 metrics profiles logs
dsd_uds_10mb_3k_contexts_memory memory ⚪ -0.07 metrics profiles logs
otlp_ingest_traces_5mb_throughput throughput ⚪ +0.11 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_memory memory ⚪ -0.16 metrics profiles logs
otlp_ingest_traces_5mb_memory memory ⚪ -0.18 metrics profiles logs
dsd_uds_500mb_3k_contexts_throughput throughput ⚪ +0.20 metrics profiles logs
dsd_uds_1mb_3k_contexts_memory memory ⚪ -0.25 metrics profiles logs
quality_gates_rss_dsd_low memory ⚪ -0.36 metrics profiles logs
quality_gates_rss_dsd_medium memory ⚪ -0.43 metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory ⚪ -0.43 metrics profiles logs
dsd_uds_512kb_3k_contexts_memory memory ⚪ -0.48 metrics profiles logs
otlp_ingest_logs_5mb_memory (ignored) memory ⚪ -0.52 metrics profiles logs
dsd_uds_500mb_3k_contexts_cpu (erratic) cpu ⚪ -0.75 metrics profiles logs
otlp_ingest_logs_5mb_cpu (ignored) cpu ⚪ -1.44 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_cpu (erratic) cpu ⚪ -1.84 metrics profiles logs
otlp_ingest_traces_5mb_cpu (erratic) cpu ⚪ -1.97 metrics profiles logs
dsd_uds_10mb_3k_contexts_cpu (erratic) cpu ⚪ -3.52 metrics profiles logs
Bounds Checks: ✅ Passed (5)
experiment check replicates observed links
quality_gates_rss_dsd_heavy memory_usage 10/10 ✅ 126 MiB ≤ 140 MiB metrics profiles logs
quality_gates_rss_dsd_low memory_usage 10/10 ✅ 40.1 MiB ≤ 50 MiB metrics profiles logs
quality_gates_rss_dsd_medium memory_usage 10/10 ✅ 60.4 MiB ≤ 75 MiB metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 ✅ 183 MiB ≤ 200 MiB metrics profiles logs
quality_gates_rss_idle memory_usage 10/10 ✅ 26.8 MiB ≤ 40 MiB metrics profiles logs
Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

| `aggregate_flush_open_windows` | Flush open windows on stop | |
| `aggregate_passthrough_idle_flush_timeout` | Passthrough buffer flush delay | |
| `aggregate_window_duration` | Aggregation window size | |
| `aggregate_window_duration_seconds` | Aggregation window size | |
Copy link
Copy Markdown
Collaborator

@jszwedko jszwedko May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, whitespace is misaligned now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/components Sources, transforms, and destinations. area/docs Reference documentation. mergequeue-status: done transform/aggregate Aggregate transform. type/bug Bug fixes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants