Split KV Cache readme and fmean instead of sum by dslik · Pull Request #482 · mlcommons/storage

dslik · 2026-06-19T22:22:35Z

This PR splits the KV cache readme into two parts, one that provides basic instructions on how to run the benchmark for the purpose of a submission, and a second document that provides the detailed architecture, design and open parameters.

This PR also replaces the summation of results across the three runs with the fmean calculation across the three runs, fixing an inaccuracy in the calculated results.

github-actions · 2026-06-19T22:22:45Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

FileSystemGuy

@dslik I didn't see that the unit tests were failing....

FileSystemGuy · 2026-06-19T23:53:19Z

@dslik Here's Claude's review notes of the unit tests on this PR:

PR #482 has 8 unit-test failures, grouped into three distinct issues. The diff is small (4 lines in kvcache.py, 2
lines in mlperf_wrapper.py).

OPTION_PARAMS → WORKLOAD_PARAMS rename left tests stale (5 failures)

The PR renamed the module-level constant OPTION_PARAMS to WORKLOAD_PARAMS in
kv_cache_benchmark/mlperf_wrapper.py:22 (and its sole use at line 135), but didn't update
tests/unit/test_mlperf_wrapper.py, which still references the old name:

AttributeError: module 'mlperf_wrapper' has no attribute 'OPTION_PARAMS'

Affected tests: test_option_keys_are_1_2_3, test_option1_model_is_8b, test_option3_model_is_70b,
test_generation_mode_always_none, test_option2_cpu_mem_gb_is_4. All trivially fixable by renaming the references
in the test file.

fmean over flattened (trial × rank) under-counts bandwidth (2 failures)

The PR changes mlpstorage_py/benchmarks/kvcache.py:400-403 from sum(...) to fmean(...) for read/write bandwidth
and avg/storage throughput. But all_read_bw etc. are populated by the nested for trial_dir / for rank_idx loops at
lines 370-371, so they hold one value per (trial × rank) — not per trial.

The two failing tests assert the old "additive across ranks" semantics:

test_sums_bandwidth_across_ranks: assert 2.0 == 4.0 (sum of 4 ranks of 1.0 each → 4.0; fmean → 1.0)
test_aggregates_across_multiple_trials: assert 1.0 == 2.0

The PR description says the goal is "fmean across the three runs," but the code does fmean across the flattened
(trial × rank) list, which is not the same thing. Bandwidth across ranks is genuinely additive (each rank touches
a disjoint slice of storage); only the across-trials reduction should switch from sum to mean. The correct fix is
two-stage: sum across ranks within each trial, then fmean across trials.

fmean([]) raises where sum([]) returned 0 (1 failure)

test_none_p95_when_no_successful_reads
statistics.StatisticsError: fmean requires at least one data point

When every rank/trial result file is missing, all_read_bw ends up empty. The old sum([]) returned 0 silently;
fmean([]) raises. Needs an if all_read_bw else 0.0 guard (or compute only when populated). The P95 path at line
404 already has this guard (max(...) if all_p95_latency else None) — it should be replicated for the four fmean()
calls.

Side note on the PR's stated goal

The PR description says it "replaces the summation of results across the three runs with the fmean calculation
across the three runs, fixing an inaccuracy" — but P95 was never summed (it was max()), and the PR doesn't touch
the P95 line. So the original concern that triggered this PR (P95 aggregation) is not addressed by these changes.
The PR is actually changing bandwidth/throughput aggregation, which is a different metric — and arguably the
previous sum() was correct there.

Worth raising with the author whether (a) only the across-trials dimension should change (sum→mean across trials,
sum stays across ranks), and (b) whether P95 also needs to change from max() to something else.

dslik · 2026-06-20T01:39:11Z

That's a pretty bad code smell for the test to be reaching into a local scope variable and creating a dependency like this, but, looks like that's a Python thing. I'll update the tests.

Claude is completely wrong in it's "side note", I hope its code is better than its reasoning here.

The variable name obscures its shape, so I'll fix that issue too.

dslik · 2026-06-20T02:07:54Z

@FileSystemGuy All tests passing now.

dslik added 6 commits April 21, 2026 16:13

Minimal changes to get KV Cache benchmark running

8fe853f

Merge branch 'mlcommons:main' into main

9b9e4c9

Revert KV tests in wrapper to the three agreed-upon tests

70bcffa

Merge branch 'main' of https://github.com/dslik/storage

7644718

Split readme, fmean results

ceed104

Merge branch 'mlcommons:main' into main

7c3d2d6

dslik requested a review from a team June 19, 2026 22:22

dslik added the KVCache TF label Jun 19, 2026

FileSystemGuy approved these changes Jun 19, 2026

View reviewed changes

FileSystemGuy requested changes Jun 19, 2026

View reviewed changes

Merge branch 'main' into main

3e9abd5

dslik added 3 commits June 19, 2026 18:43

Update constant name in tests

b85c27b

Fixed summation/mean, split shape

1b9d60b

Fixed tests to work with fmean

ed9155a

dslik requested a review from FileSystemGuy June 20, 2026 02:09

Merge branch 'main' into main

b047fcc

FileSystemGuy approved these changes Jun 20, 2026

View reviewed changes

FileSystemGuy merged commit 645e428 into mlcommons:main Jun 20, 2026
3 checks passed

github-actions Bot locked and limited conversation to collaborators Jun 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split KV Cache readme and fmean instead of sum#482

Split KV Cache readme and fmean instead of sum#482
FileSystemGuy merged 11 commits into
mlcommons:mainfrom
dslik:main

dslik commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

FileSystemGuy left a comment

Uh oh!

FileSystemGuy commented Jun 19, 2026

Uh oh!

dslik commented Jun 20, 2026

Uh oh!

dslik commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dslik commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FileSystemGuy left a comment

Choose a reason for hiding this comment

Uh oh!

FileSystemGuy commented Jun 19, 2026

Uh oh!

dslik commented Jun 20, 2026

Uh oh!

dslik commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 19, 2026 •

edited

Loading