Update client-side stats to use light weight Hashtable#11382
Conversation
Standalone classes for swapping the consumer-side LRUCache<MetricKey, AggregateMetric> with a multi-key Hashtable in the next commit. No call sites use them yet. - AggregateEntry extends Hashtable.Entry, holds the canonical MetricKey, the mutable AggregateMetric, and copies of the 13 raw SpanSnapshot fields for matches(). The 64-bit lookup hash is computed via chained LongHashingUtils.addToHash calls (no varargs, no boxing of short/boolean). - AggregateTable wraps a Hashtable.Entry[] from Hashtable.Support.create. findOrInsert(SpanSnapshot) walks the bucket comparing raw fields, falling back to MetricKeys.fromSnapshot on a true miss. On cap overrun, it scans for an entry with hitCount==0 and unlinks it; if none, it returns null and the caller drops the data point. - MetricKeys.fromSnapshot extracts the canonicalization logic (DDCache lookups + UTF8 encoding) from Aggregator.buildMetricKey, so the helper can be called from AggregateTable on miss. This also commits Hashtable and LongHashingUtils (added earlier, previously uncommitted) and lifts Hashtable.Entry / Hashtable.Support visibility so client code outside datadog.trace.util can build higher-arity tables -- the case the javadoc describes but the original visibility didn't actually support. Specifically: Entry is now public abstract with a protected ctor; keyHash, next(), and setNext() are public; Support's create / clear / bucketIndex / bucketIterator / mutatingBucketIterator methods are public. Tests: AggregateTableTest covers hit, miss, distinct-by-spanKind, peer-tag identity (including null vs non-null), cap overrun with stale victim, cap overrun with no victim (returns null), expungeStaleAggregates, forEach, clear, and that the canonical MetricKey is built at insert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace LRUCache<MetricKey, AggregateMetric> with the AggregateTable added
in the prior commit. The hot path in Drainer.accept becomes:
AggregateMetric aggregate = aggregates.findOrInsert(snapshot);
if (aggregate != null) {
aggregate.recordOneDuration(snapshot.tagAndDuration);
dirty = true;
} else {
healthMetrics.onStatsAggregateDropped();
}
On the steady-state hit path the lookup is a 64-bit hash compute + bucket
walk + matches(snapshot) -- no MetricKey allocation, no SERVICE_NAMES /
SPAN_KINDS / PEER_TAGS_CACHE lookups. The canonical MetricKey is now built
once per unique key at insert time, in MetricKeys.fromSnapshot.
Behavioral change in the cap-overrun path
-----------------------------------------
The old LRUCache evicted least-recently-used: at cap, a new insert would
push out the oldest entry regardless of whether it was live or stale.
AggregateTable instead scans for a hitCount==0 entry to recycle, and drops
the new key if none exists. Practical impact: in the common case where
the table holds a stable set of recurring keys, an unrelated burst of new
keys is dropped (and reported via onStatsAggregateDropped) rather than
evicting the established keys. The existing test that asserted "service0
evicted in favor of service10" is updated to assert the new semantics.
The other cap-related test ("should not report dropped aggregate when
evicted entry was already flushed") still passes unchanged: after report()
clears all entries to hitCount=0, the next wave of inserts recycles them.
Threading fix
-------------
ConflatingMetricsAggregator.disable() used to call aggregator.clearAggregates()
and inbox.clear() directly from the Sink's IO event thread, racing with the
aggregator thread mid-write. The race was tolerable for LinkedHashMap; it
is not for AggregateTable (chain corruption can NPE or loop). disable()
now offers a ClearSignal to the inbox so the aggregator thread itself
performs the table clear and the inbox.clear(). Adds one SignalItem
subclass + one branch in Drainer.accept; preserves the single-writer
invariant for AggregateTable end-to-end.
Removed: LRUCache import, AggregateExpiry inner class, the static
buildMetricKey / materializePeerTags / encodePeerTag helpers (now in
MetricKeys).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MetricKey existed for two reasons -- the prior LRUCache key role (now handled by AggregateTable's Hashtable.Entry mechanics) and as the labels argument to MetricWriter.add. The first is gone; the second is the only thing keeping MetricKey alive. Fold its UTF8-encoded label fields onto AggregateEntry, change MetricWriter.add to take AggregateEntry directly, and delete MetricKey + MetricKeys. What AggregateEntry now holds ----------------------------- - 10 UTF8BytesString label fields (resource, service, operationName, serviceSource, type, spanKind, httpMethod, httpEndpoint, grpcStatusCode, and a List<UTF8BytesString> peerTags for serialization). - 3 primitives (httpStatusCode, synthetic, traceRoot). - AggregateMetric (the value being accumulated). - The raw String[] peerTagPairs is retained alongside the encoded peerTags -- matches() compares it positionally against the snapshot's pairs; the encoded form is only consumed by the writer. matches(SpanSnapshot) compares the entry's UTF8 forms to the snapshot's raw String / CharSequence fields via content-equality (UTF8BytesString.toString() returns the underlying String in O(1)). This closes a latent bug in the prior raw-vs-raw matches(): if one snapshot delivered a tag value as String and a later snapshot delivered the same content as UTF8BytesString, the old Objects.equals would return false and the table would split into two entries. Content-equality matching collapses them into one. Consolidated caches ------------------- The static UTF8 caches that used to live partly on MetricKey (RESOURCE_CACHE, OPERATION_CACHE, SERVICE_SOURCE_CACHE, TYPE_CACHE, KIND_CACHE, HTTP_METHOD_CACHE, HTTP_ENDPOINT_CACHE, GRPC_STATUS_CODE_CACHE, SERVICE_CACHE) and partly on ConflatingMetricsAggregator (SERVICE_NAMES, SPAN_KINDS, PEER_TAGS_CACHE) are all now on AggregateEntry. The split was duplicating work -- SERVICE_NAMES and SERVICE_CACHE both cached service-name to UTF8BytesString. One cache per field now. API change: MetricWriter.add ---------------------------- Was: add(MetricKey key, AggregateMetric aggregate) Now: add(AggregateEntry entry) The aggregate lives on the entry. Single-arg. SerializingMetricWriter reads the same UTF8 fields off AggregateEntry that it previously read off MetricKey; the wire format is byte-identical. Test impact ----------- AggregateEntry.of(...) takes the same 13 positional args new MetricKey(...) took, so test diffs are mostly mechanical: new MetricKey(args) -> AggregateEntry.of(args) writer.add(key, _) -> writer.add(entry) ValidatingSink in SerializingMetricWriterTest now iterates List<AggregateEntry> directly. ConflatingMetricAggregatorTest's Spock matchers (~36 sites) rely on AggregateEntry.equals comparing the 13 label fields (not the aggregate) so the mock matches by labels regardless of the aggregate state at call time; post-invocation closures verify aggregate state. Benchmarks (2 forks x 5 iter x 15s) ----------------------------------- The change is consumer-thread only; producer publish() is unchanged. SimpleSpan bench: 3.123 +- 0.025 us/op (prior: 3.119 +- 0.018) DDSpan bench: 2.412 +- 0.022 us/op (prior: 2.463 +- 0.041) Both within noise -- the win is structural (one less class, one less allocation per miss, one fewer cache layer) rather than benchmarked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
050a998 to
3738c85
Compare
The label fields and the mutable counters/histograms are 1:1 with each entry; carrying them on a separate object meant one extra allocation per unique key plus an indirection on every hot-path update. Merging them puts the counters directly on AggregateEntry, drops the entry.aggregate hop, and consolidates ERROR_TAG / TOP_LEVEL_TAG onto the same class the consumer uses to decode them. AggregateTable.findOrInsert now returns AggregateEntry. Callers in Aggregator and SerializingMetricWriter updated. Migrated AggregateMetricTest.groovy to AggregateEntryTest.java per project policy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a context-passing forEach(T, BiConsumer) overload to AggregateTable, mirroring TagMap's pattern. Aggregator.report now hands the writer in as context to a static BiConsumer so no fresh Consumer is allocated each report cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Now that Hashtable.Support exposes the parameterized forEach helpers, AggregateTable's own forEach methods can drop their duplicated loop body and the (AggregateEntry) cast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- findOrInsert: walks via Support.bucket(buckets, keyHash) instead of Hashtable.Entry + intermediate cast; bucketIndex is only computed on the miss path now. - evictOneStale / expungeStaleAggregates: chain variables typed as AggregateEntry from the head down, leveraging Entry.next()'s generic inference, so the per-iteration getHitCount() checks drop their (AggregateEntry) cast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| private static final long DEFAULT_SLEEP_MILLIS = 10; | ||
|
|
||
| /** Non-capturing -- the writer arrives via the forEach context arg. */ | ||
| private static final BiConsumer<MetricWriter, AggregateEntry> WRITE_AND_CLEAR = |
There was a problem hiding this comment.
The JVM is pretty good at reusing non-capturing lambdas, I think we can forego the static member until a profile proves it necessary.
| @Override | ||
| public void accept(InboxItem item) { | ||
| if (item instanceof SignalItem) { | ||
| if (item == ClearSignal.CLEAR) { |
There was a problem hiding this comment.
ClearSignal was introduced to avoid a thread-safety issue with the prior implementation
- Constructor sizing now uses Support.MAX_RATIO_NUMERATOR / _DENOMINATOR instead of an open-coded * 4 / 3. - findOrInsert delegates the chain-head splice to Support.insertHeadEntry. - evictOneStale and expungeStaleAggregates both rewritten in terms of Support.mutatingTableIterator. Drops the bespoke head-vs-mid-chain branching that read as more complicated than the operation actually is. Net -28 lines in AggregateTable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| /** Unlink the first entry whose {@code getHitCount() == 0}. */ | ||
| private boolean evictOneStale() { | ||
| for (MutatingTableIterator<AggregateEntry> it = Support.mutatingTableIterator(buckets); |
There was a problem hiding this comment.
I'd prefer to call this "iter" rather than "it"
- AggregateTable: switch to Support.create(maxAggregates, Support.MAX_RATIO) now that the load-factor scaling is a Support concern. - AggregateTable: replace open-coded "keyHash == X && matches(s)" with a new AggregateEntry.matches(long keyHash, SpanSnapshot) overload that bundles the hash gate. - AggregateTable: rename local iterator var "it" -> "iter". - Aggregator: drop WRITE_AND_CLEAR static field, inline as a non-capturing lambda; the JIT reuses non-capturing lambdas, no need for the static until a profile says otherwise. - Aggregator: comment the ClearSignal branch with the thread-safety rationale (single-writer invariant for AggregateTable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up the Support.insertHeadEntry(buckets, long keyHash, entry) overload added on the util-hashtable branch; saves the redundant Support.bucketIndex(buckets, keyHash) hop at the call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntry Use javax.annotation.Nullable (the codebase's convention -- see DDSpan, TagInterceptor, ScopeContext, etc.) on the four nullable label fields (serviceSource, httpMethod, httpEndpoint, grpcStatusCode), their getters, and the corresponding parameters of AggregateEntry.of. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Support.MAX_RATIO and the scaled create(int, float) overload already convey the 75% load-factor intent at the call site -- the inline comment was duplicating their self-documentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Style nit -- the equals() method had eight fully-qualified references to java.util.Objects.equals; add the import and drop the qualifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two design-review trade-offs that won't change in this PR but should be explicit at the call sites: - AggregateTable.evictOneStale: O(N) scan per call (vs LRUCache's O(1)), acceptable because the new policy drops the *new* key on cap-overrun rather than evicting an established one -- so eviction is expected to be rare. Cursor-caching is the future optimization if a workload runs persistently at cap. - ConflatingMetricsAggregator.disable: single inbox.offer(CLEAR) is best-effort. If the inbox is full the clear is dropped, but the system self-heals (supportsMetrics() is already false, the next report-sink-rejection retries disable). Worst case is one extra cycle of stale data, not a leak. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2e4477f78
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| new AggregateMetric().recordDurations(10, new AtomicLongArray(1, 1, 200, 2, 3, 4, 5, 6, 7, 8, 9)) | ||
| ) | ||
| def entry1 = AggregateEntry.of("resource1", "service1", "operation1", null, "sql", 0, false, true, "xyzzy", [UTF8BytesString.create("grault:quux")], null, null, null) | ||
| entry1.aggregate.recordDurations(5, new AtomicLongArray(2, 1, 2, 250, 4, 5)) |
There was a problem hiding this comment.
Record durations directly on AggregateEntry
AggregateEntry does not expose an aggregate property, so calling entry1.aggregate.recordDurations(...) raises a MissingPropertyException in Groovy at runtime. This causes the integration test to fail before writer.finishBucket() and stops it from validating the sink notification path. Call recordDurations(...) on the AggregateEntry itself instead.
Useful? React with 👍 / 👎.
| )) >> { AggregateEntry e -> | ||
| e.getHitCount() == 1 && e.getTopLevelCount() == 1 && e.getDuration() == 100 |
There was a problem hiding this comment.
Use argument constraints instead of stub closures for add() checks
This >> { ... } closure is now a stubbed response, not an argument matcher, so the boolean expression is ignored for this void method and no assertion is enforced on hitCount/duration contents. As a result, these tests can pass even when aggregates are wrong. Use a Spock argument constraint (e.g. writer.add({ AggregateEntry e -> ... })) to keep the validation behavior.
Useful? React with 👍 / 👎.
What Does This Do
Replaces the MetricKey based HashMap with a new AggregateTable based on the light Hashtable
Motivation
By using the light Hashtable, I'm able to avoid the biggest source of allocation in client-side stats: MetricKey
Hashtable provides utilities for searching the entries without constructing a new composite key object
First, the components are hashed together to find the corresponding bucket
Then the bucket can be traversed to see if the entries match the key components
Additionally, any amount of data / metadata can be stored in the entry as well
The end result is that both MetricKey and AggregateMetrics can be merged into a single class AggregateEntry that is only constructed when there's no existing matching entry
Additional Notes
Stacked on top of #11409 (
dougqh/util-hashtable) — review that first; the diff shown here is only the work that's new beyond that PR.Restructures the consumer-side aggregate store. Three logical commits, intended to be reviewed in order:
1. Add
AggregateTable+AggregateEntrybacked byHashtableIntroduces a multi-key hash table that lets the consumer thread look up the {labels → counters} entry directly from a
SpanSnapshot's raw fields — noMetricKeyallocation per snapshot, no per-snapshot UTF8 cache lookups, no CHM operations. Hot-path lookup iskeyHash compute→Hashtable.Support.bucket→ bucket walk →matches(keyHash, snapshot)→ returned entry has the counters to mutate in place.This commit is standalone — no call sites yet, only the new classes + unit tests for hit/miss/cap-overrun/expunge/clear behavior.
2. Swap
Aggregatorto useAggregateTable+ routedisable()clear through aClearSignalReplaces
LRUCache<MetricKey, AggregateMetric>withAggregateTableinAggregator. Drops theAggregateExpirylistener — drop reporting (onStatsAggregateDropped) moves to the cap-overrun path insideDrainer.accept.Threading fix bundled here:
ConflatingMetricsAggregator.disable()used to callaggregator.clearAggregates()andinbox.clear()directly from the Sink's IO callback thread, racing with the aggregator thread. That race was tolerable forLinkedHashMap(worst case = corrupted internal state right before everything got cleared anyway); it's not tolerable forHashtable(chain corruption can NPE or loop).disable()now offers aClearSignalto the inbox so the aggregator thread itself performs the clear — preserves the single-writer invariant forAggregateTableend-to-end. The offer is best-effort; the system self-heals on a subsequent downgrade cycle if the inbox happens to be full (commented at the call site).Cap-overrun semantic change: the old
LRUCacheevicted least-recently-used in O(1).AggregateTableinstead scans for ahitCount==0entry to recycle (O(N) worst-case), and drops the new key if none exists. Practical impact: in steady state, an unrelated burst of new keys gets dropped (and reported viaonStatsAggregateDropped) rather than evicting established keys. The cost trade-off is commented at the eviction site — eviction is expected rare because the cap is sized to the working set; cursor-caching is the future option if a workload runs persistently at cap. The existing test that asserted "service0 evicted in favor of service10" is updated to assert the new semantics; the other cap-related test ("evicted entry was already flushed") still passes unchanged.3. Fold
MetricKey+AggregateMetricintoAggregateEntryMetricKeyexisted for two reasons — being theLRUCachekey (replaced byAggregateTable's Hashtable mechanics) and being the labels arg toMetricWriter.add(the only thing left).AggregateMetricwas the counter/histogram counterpart. Folds both onto a singleAggregateEntry(10 UTF8 label fields + 3 primitives + counters + histograms), changesMetricWriter.add(MetricKey, AggregateMetric)→add(AggregateEntry), and deletesMetricKey.java+MetricKeys.java+AggregateMetric.java.The 12 UTF8 caches that used to be split between
MetricKey(9) andConflatingMetricsAggregator(3, with overlap) are consolidated onAggregateEntry. One cache per field type now.Latent bug fix: the prior
matches(SpanSnapshot)usedObjects.equalson raw fields. If the same logical key was delivered once asStringand once asUTF8BytesString(differentCharSequenceimpls of identical content),Objects.equalsreturned false and the table would split into two entries for the same key. The newmatchesuses content-equality (UTF8BytesString.toString()returns the underlyingStringin O(1)), collapsing them correctly.Test impact:
AggregateEntry.of(...)mirrors the priornew MetricKey(...)positional args, so test diffs are mostly mechanical. About 56 test sites migrated acrossConflatingMetricAggregatorTest,SerializingMetricWriterTest, andMetricsIntegrationTest.Review polish
Follow-up commits address review feedback:
Hashtable.Support.create(maxAggregates, Support.MAX_RATIO)+Support.bucket+Support.insertHeadEntry(buckets, keyHash, entry)+Support.mutatingTableIteratorto delegate to the helpers added on Add Hashtable and LongHashingUtils utilities #11409 — drops ~50 lines of bespoke bucket-array code.BiConsumerconstant (the JIT reuses non-capturing lambdas).AggregateEntry.matches(long keyHash, SpanSnapshot)overload that pre-checks the hash, so chain walks read as one call.@Nullable(javax.annotation) annotations on the four nullable label fields + their getters +of(...)parameters.Objects.equalsimport inAggregateEntry.equals()(no more fully-qualified refs).evictOneStale(O(N) scan rationale) anddisable()(best-effort offer rationale).Benchmarks
2 forks × 5 iter × 15s, producer publish() latency:
All within noise — this PR is a consumer-side refactor, so producer publish() shouldn't move much. The win is structural (one less class, no per-miss MetricKey allocation, no double-cache lookups, smaller per-entry footprint) plus higher consumer throughput that lets the inbox keep up at higher sustained producer rates before
onStatsInboxFullfires.Net code delta: +1280 / −903 = +377 lines across 16 files. The growth is dominated by new test coverage (
AggregateTableTest,AggregateEntryTest) plus the consolidated UTF8 caches landing onAggregateEntry; the production-code core (lessMetricKey+MetricKeys+AggregateMetricminusAggregateEntry's additions) is roughly flat.Test plan
./gradlew :dd-trace-core:test --tests 'datadog.trace.common.metrics.*'passes (incl. the newAggregateTableTestandAggregateEntryTest)./gradlew :dd-trace-core:compileJava :dd-trace-core:compileTestGroovy :dd-trace-core:compileJmhJava :dd-trace-core:compileTraceAgentTestGroovyall green./gradlew spotlessCheckcleanstats.dropped_aggregatessemantics at high cardinality (especially the new "drop new on cap overrun" path vs. the old "evict LRU" path)🤖 Generated with Claude Code