Problem
The KV cache benchmark uses Storage Throughput for storage_throughput_tokens_per_sec, calculated as:
total_tokens_generated / total_storage_io_latency
That name is confusing because elsewhere storage clearly refers to the disk/NVMe tier, e.g. tier_storage_kv_bytes_read_gb, tier_storage_kv_bytes_written_gb, tier_storage_read_bandwidth_gbps, and tier_storage_write_bandwidth_gbps.
As a result, readers naturally interpret Storage Throughput as raw disk/NVMe throughput, but it is really token throughput through the cache I/O path. CPU RAM hits can increase this metric even while raw disk pressure drops, which makes the discovery note hard to understand:
Storage Throughput shows only 1.1x at cpu_mem=0GB but 2.2x at cpu_mem=4GB
Proposed fix
Rename the displayed/JSON metric to something like:
or, if keeping backwards compatibility, add the new name as an alias and mark storage_throughput_tokens_per_sec deprecated in docs/output.
Reserve Storage Throughput wording for actual disk-tier metrics:
tier_storage_read_bandwidth_gbps
tier_storage_write_bandwidth_gbps
tier_storage_kv_bytes_read_gb
tier_storage_kv_bytes_written_gb
Why this matters
The current name makes it look like increasing cpu_mem makes the disk faster. What is actually happening is that CPU RAM reduces cache I/O latency, so tokens / cache_io_latency increases. Raw disk saturation should be judged from tier storage bandwidth/bytes or iostat, not this token metric.
Problem
The KV cache benchmark uses Storage Throughput for
storage_throughput_tokens_per_sec, calculated as:That name is confusing because elsewhere storage clearly refers to the disk/NVMe tier, e.g.
tier_storage_kv_bytes_read_gb,tier_storage_kv_bytes_written_gb,tier_storage_read_bandwidth_gbps, andtier_storage_write_bandwidth_gbps.As a result, readers naturally interpret Storage Throughput as raw disk/NVMe throughput, but it is really token throughput through the cache I/O path. CPU RAM hits can increase this metric even while raw disk pressure drops, which makes the discovery note hard to understand:
Proposed fix
Rename the displayed/JSON metric to something like:
or, if keeping backwards compatibility, add the new name as an alias and mark
storage_throughput_tokens_per_secdeprecated in docs/output.Reserve Storage Throughput wording for actual disk-tier metrics:
Why this matters
The current name makes it look like increasing
cpu_memmakes the disk faster. What is actually happening is that CPU RAM reduces cache I/O latency, sotokens / cache_io_latencyincreases. Raw disk saturation should be judged from tier storage bandwidth/bytes or iostat, not this token metric.