Split KV Cache readme and fmean instead of sum#482
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
FileSystemGuy
left a comment
There was a problem hiding this comment.
@dslik I didn't see that the unit tests were failing....
|
@dslik Here's Claude's review notes of the unit tests on this PR: PR #482 has 8 unit-test failures, grouped into three distinct issues. The diff is small (4 lines in kvcache.py, 2
The PR renamed the module-level constant OPTION_PARAMS to WORKLOAD_PARAMS in AttributeError: module 'mlperf_wrapper' has no attribute 'OPTION_PARAMS' Affected tests: test_option_keys_are_1_2_3, test_option1_model_is_8b, test_option3_model_is_70b,
The PR changes mlpstorage_py/benchmarks/kvcache.py:400-403 from sum(...) to fmean(...) for read/write bandwidth The two failing tests assert the old "additive across ranks" semantics: test_sums_bandwidth_across_ranks: assert 2.0 == 4.0 (sum of 4 ranks of 1.0 each → 4.0; fmean → 1.0) The PR description says the goal is "fmean across the three runs," but the code does fmean across the flattened
test_none_p95_when_no_successful_reads When every rank/trial result file is missing, all_read_bw ends up empty. The old sum([]) returned 0 silently; Side note on the PR's stated goal The PR description says it "replaces the summation of results across the three runs with the fmean calculation Worth raising with the author whether (a) only the across-trials dimension should change (sum→mean across trials, |
|
That's a pretty bad code smell for the test to be reaching into a local scope variable and creating a dependency like this, but, looks like that's a Python thing. I'll update the tests. Claude is completely wrong in it's "side note", I hope its code is better than its reasoning here. The variable name obscures its shape, so I'll fix that issue too. |
|
@FileSystemGuy All tests passing now. |
This PR splits the KV cache readme into two parts, one that provides basic instructions on how to run the benchmark for the purpose of a submission, and a second document that provides the detailed architecture, design and open parameters.
This PR also replaces the summation of results across the three runs with the fmean calculation across the three runs, fixing an inaccuracy in the calculated results.