feat(trace_buffer)!: flush based on size of chunks in bytes#1953
feat(trace_buffer)!: flush based on size of chunks in bytes#1953paullegranddc wants to merge 1 commit into
Conversation
# Motivation When we buffer data, we want to put a bound on the memory we are going to use. Spans can contain a varying amount of data depending on the workload instrumented. For this reason, it is better to trigger flush and limit the trace buffer based on the amount of data stored, not the number of spans. This of doesn't take into account that a lot of payloads contain duplicate string that will be interned if a user uses v0.5 or v1 encoding, but it would make this a lot harder to compute # Changes * Add a BufferSize trait, with a byte_size method returning the number of bytes each item in the queue allocates * Use this method to compute the total size of each trace-chunk * Use the total size in bytes in the queue to flush and limit the size of the buffer
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c4702b7e5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| use libdd_trace_utils::span::v04::AttributeAnyValue; | ||
|
|
||
| // trace_id(16) + span_id(8) + parent_id(8) + start(8) + duration(8) + error(4) | ||
| let mut size: usize = 52; |
There was a problem hiding this comment.
Count per-span container memory in byte_size
For traces made of very small spans, this base size only accounts for encoded primitive fields and then adds string payload lengths, but each queued SpanBytes also owns the BytesString fields plus the HashMap/Vec headers even when they are empty. With the new default 5 MB cap, minimal spans are treated as ~52 bytes, so the buffer can accept roughly 96k spans before dropping, far exceeding the old 10k-span guard and the intended memory bound. Please include at least the fixed in-memory size of the span/containers (and ideally collection capacities) in this estimate.
Useful? React with 👍 / 👎.
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1953 +/- ##
==========================================
- Coverage 71.85% 71.78% -0.07%
==========================================
Files 434 434
Lines 70454 70643 +189
==========================================
+ Hits 50624 50713 +89
- Misses 19830 19930 +100
🚀 New features to boost your workflow:
|
|
🎯 Code Coverage (details) 🔗 Commit SHA: 6c4702b | Docs | Datadog PR Page | Give us feedback! |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
yannham
left a comment
There was a problem hiding this comment.
Do you have an idea of the impact of the new byte_size? Might not be on the hot path, but it's a bit sad we have to basically deeply traverse each span entirely to compute the estimate, with potential for a bit of pointer chasing. I suppose one possibility would be to try to encode the span and measure the actual size in bytes, but I suppose this happens at a later point, so we can't really rely on encoding when we decide to flush or not?
Another solution would be to make spans record their size on the go. I'm not sure any of this is required, but just wanted to mention the topic.
Yes, it is actually quite cheap. We only do pointer chasing on the Hashmaps, as the lengths are otherwise in the fat pointer metadata. Of course, of it's a trace chunk with 100 spans, that'd be a lot more expensive... |
Is 30% a typo? Sounds like a lot to me 😛 |
Motivation
When we buffer data, we want to put a bound on the memory we are going to use. Spans can contain a varying amount of data depending on the workload instrumented.
For this reason, it is better to trigger flush and limit the trace buffer based on the amount of data stored, not the number of spans.
This of doesn't take into account that a lot of payloads contain duplicate string that will be interned if a user uses v0.5 or v1 encoding, but it would make this a lot harder to compute
Changes