You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
com.cinchapi.common.profile.Benchmark is widely used across Cinchapi projects for measuring server-side and end-to-end timing. cinchapi/concourse alone has 17+ benchmark tests built on it, plus several test classes in concourse-server (e.g. TMapsTest, DatabaseTest, SegmentTest, StoresTest, ByteableCollectionsTest, CompiledInfingramTest).
The current API exposes only the arithmetic mean (average(int)) and total elapsed (run(int)), which limits its usefulness in noise-prone CI environments where outliers dominate single-shot runs and a single mean can be misleading. Concrete CI evidence: the same Concourse code produced 1952 - 4557 ops/sec (2.3x spread) for OpsPerSecondTest.testVerifyAndSwap across four CircleCI shards on the same commit.
This issue tracks improvements needed to make Benchmark a credible regression-detection primitive for the cross-version benchmark suite in cinchapi/concourse.
No median, no percentiles, no min/max, no stddev, no outlier rejection, no throughput-window mode, no rich result object.
Gaps blocking better testing
Statistical aggregation — median (p50), p95, p99, min, max, stddev. A single mean is fragile under GC pauses on shared CI infrastructure; we routinely see 2-2.5x variance on the same code across CI shards.
Throughput-window mode — for tests that measure ops/sec over a fixed time window rather than latency over a fixed iteration count (e.g. Concourse's TransactionThroughputTest, OpsPerSecondTest, AbstractTransporterThroughputTest). These tests roll their own loop with System.currentTimeMillis() today.
Rich result object — a BenchmarkResult (or similar) carrying min/max/mean/median/percentiles/iterations/totalElapsed in one object so callers do not have to run twice to print two stats.
Outlier trimming — drop top/bottom k samples before aggregating (trimmed mean), a cheap defense against single-GC-pause outliers.
Proposed API sketch
Existing API stays compatible. New chainable configuration on the builder:
BenchmarkResultresult = Benchmark.measure(() -> ...)
.in(TimeUnit.MILLISECONDS)
.warmups(3)
.iterations(10)
.reportPercentiles(50, 95, 99)
.trimOutliers(1, 1) // drop top 1 and bottom 1
.run(); // returns BenchmarkResult
New throughput-window mode for ops/sec measurement:
BenchmarkResultresult = Benchmark.measure(() -> ...)
.in(TimeUnit.SECONDS)
.warmups(2)
.runFor(Duration.ofSeconds(10));
// result.throughput() => ops/sec// result.iterations() => total ops completed// result.totalElapsed() => actual wall time
BenchmarkResult exposes:
min(), max(), mean(), median()
percentile(double p) for arbitrary percentile
stddev()
iterations(), totalElapsed(), throughput() (when runFor was used)
samples() for raw per-iteration samples so callers can do their own analysis
Implementation Plan
Introduce BenchmarkResult with the fields above. Keep it immutable.
Add iterations(int), reportPercentiles(int...), trimOutliers(int low, int high), runFor(Duration) to the builder.
Update the abstract-class instance API (Benchmark subclassing path) to support the same configuration via setter methods or by routing through the builder internally.
Make warmup runs explicitly excluded from the result aggregation (today warmups(int) exists on the builder but its semantics relative to average(int) are not documented).
Add unit tests that:
Verify percentile math on a known sample distribution
Verify runFor(Duration) runs at least N iterations within tolerance and reports the right throughput
Verify warmup samples are excluded from the result
Verify trimmed-mean math drops the requested top/bottom k samples
Update Javadoc to describe the new contract end-to-end.
Background
com.cinchapi.common.profile.Benchmarkis widely used across Cinchapi projects for measuring server-side and end-to-end timing.cinchapi/concoursealone has 17+ benchmark tests built on it, plus several test classes inconcourse-server(e.g.TMapsTest,DatabaseTest,SegmentTest,StoresTest,ByteableCollectionsTest,CompiledInfingramTest).The current API exposes only the arithmetic mean (
average(int)) and total elapsed (run(int)), which limits its usefulness in noise-prone CI environments where outliers dominate single-shot runs and a single mean can be misleading. Concrete CI evidence: the same Concourse code produced 1952 - 4557 ops/sec (2.3x spread) forOpsPerSecondTest.testVerifyAndSwapacross four CircleCI shards on the same commit.This issue tracks improvements needed to make
Benchmarka credible regression-detection primitive for the cross-version benchmark suite incinchapi/concourse.Current API surface
Benchmark.run()returns single-sample elapsed (long)Benchmark.run(int n)returns sum ofnelapsed times (long)Benchmark.average(int n)returns arithmetic mean (double)measure(Runnable).in(TimeUnit).warmups(int).async()?.run() / run(n) / average(n)No median, no percentiles, no min/max, no stddev, no outlier rejection, no throughput-window mode, no rich result object.
Gaps blocking better testing
TransactionThroughputTest,OpsPerSecondTest,AbstractTransporterThroughputTest). These tests roll their own loop withSystem.currentTimeMillis()today.BenchmarkResult(or similar) carrying min/max/mean/median/percentiles/iterations/totalElapsed in one object so callers do not have to run twice to print two stats.Proposed API sketch
Existing API stays compatible. New chainable configuration on the builder:
New throughput-window mode for ops/sec measurement:
BenchmarkResultexposes:min(),max(),mean(),median()percentile(double p)for arbitrary percentilestddev()iterations(),totalElapsed(),throughput()(whenrunForwas used)samples()for raw per-iteration samples so callers can do their own analysisImplementation Plan
BenchmarkResultwith the fields above. Keep it immutable.iterations(int),reportPercentiles(int...),trimOutliers(int low, int high),runFor(Duration)to the builder.Benchmarksubclassing path) to support the same configuration via setter methods or by routing through the builder internally.warmups(int)exists on the builder but its semantics relative toaverage(int)are not documented).runFor(Duration)runs at least N iterations within tolerance and reports the right throughputAcceptance Criteria
BenchmarkResultexposes min, max, mean, median, p95, p99, stddev, iterations, totalElapsed, throughput, samplesiterations(int),reportPercentiles(int...),trimOutliers(int low, int high),runFor(Duration)run(),run(int),average(int)continue to compile and behave unchangedOut of scope (file separately if pursued)
@Param)Caller using this work
cinchapi/concoursewill depend on this in its perf-test hardening effort.