kv0/enc=false/nodes=3/cpu=32 and a few related variants regressed on Feb 4th:

This regression is only observed on AWS and not on GCE. It's also only observed on the 32 vCPU variant and not on the 8 vCPU variant.
I've determined that this was a result of #94165. When I run the benchmark and switch the kv.raft_log.non_blocking_synchronization.enabled cluster setting midway through to disable async storage writes, throughput increases.

Interestingly, log commit latency also drops.

One possibility is that fsyncs on AWS instance stores (with nobarrier) are fast enough that we are exceeding the Pebble SyncConcurrency of 512. This would cause the async log writes to become synchronous. I don't know why this is worse than the non-async storage write configuration. One thought is that we might be observing the overhead of the asynchronous write path (two goroutine hops) without the benefit (because we're still blocking before entering it) and so we see the throughput regression.
Another thought is that async storage writes trade-off reduced interference between writes on the same range for reduced batching of writes on the same range. In a system where the p50 fsync latency is .10ms, is this the right trade-off?
Next steps:
- experiment with a larger
SyncConcurrency
- experiment with
nobarrier disabled
- experiment with EBS volumes
Jira issue: CRDB-24341
kv0/enc=false/nodes=3/cpu=32and a few related variants regressed on Feb 4th:This regression is only observed on AWS and not on GCE. It's also only observed on the 32 vCPU variant and not on the 8 vCPU variant.
I've determined that this was a result of #94165. When I run the benchmark and switch the
kv.raft_log.non_blocking_synchronization.enabledcluster setting midway through to disable async storage writes, throughput increases.Interestingly, log commit latency also drops.
One possibility is that fsyncs on AWS instance stores (with
nobarrier) are fast enough that we are exceeding the PebbleSyncConcurrencyof 512. This would cause the async log writes to become synchronous. I don't know why this is worse than the non-async storage write configuration. One thought is that we might be observing the overhead of the asynchronous write path (two goroutine hops) without the benefit (because we're still blocking before entering it) and so we see the throughput regression.Another thought is that async storage writes trade-off reduced interference between writes on the same range for reduced batching of writes on the same range. In a system where the p50 fsync latency is .10ms, is this the right trade-off?
Next steps:
SyncConcurrencynobarrierdisabledJira issue: CRDB-24341