You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the benchmark runs, it should leave a per-run code-integrity artifact (a .code-hash file at minimum, optionally a source snapshot) inside the run's results directory. Today nothing is written, so a results directory in isolation gives no way to verify which version of mlpstorage_py produced it.
Background
Two pieces of related work exist:
On main: mlpstorage_py/submission_checker/checks/training_checks.py:303-308 contains:
This is the verifier side, currently a stub returning True.
PR Submission checker: complete Rules.md coverage + 'mlpstorage validate' subcommand #432 (FileSystemGuy-rules-validator) adds mlpstorage_py/submission_checker/tools/code_checksum.py::compute_code_tree_md5 and a compute_code_checksum.py CLI, implementing the Rules.md §2.1.6 (codeDirectoryContents) and §3.6.1 (trainingClosedSubmissionChecksum) algorithm. Combined with the REFERENCE_CHECKSUMS constant (commit 71a0966), this enables the submission checker to verify at validation time that a code tree handed to it matches the authoritative checksum.
What's missing: the benchmark runtime never writes a hash or snapshot during a run. A user inspecting <results_dir>/training/<model>/<command>/<datetime>/ after the fact cannot tell which code version produced it without out-of-band knowledge.
Repro
Run any benchmark to completion and inspect the results directory:
A code-snapshot/ directory containing a copy of mlpstorage_py/ (and kv_cache_benchmark/, vdb_benchmark/ as applicable).
A code_commit_sha and code_tree_md5 field in the metadata JSON.
The most minimal version is the first — one MD5 in a small file, no copying. That is enough for the submission checker to cross-check <.code-hash from run dir> == compute_code_tree_md5(<code tree at submission time>) and reject results whose code-tree was modified after the run.
Summary
When the benchmark runs, it should leave a per-run code-integrity artifact (a
.code-hashfile at minimum, optionally a source snapshot) inside the run's results directory. Today nothing is written, so a results directory in isolation gives no way to verify which version of mlpstorage_py produced it.Background
Two pieces of related work exist:
On
main:mlpstorage_py/submission_checker/checks/training_checks.py:303-308contains:This is the verifier side, currently a stub returning True.
PR Submission checker: complete Rules.md coverage + 'mlpstorage validate' subcommand #432 (
FileSystemGuy-rules-validator) addsmlpstorage_py/submission_checker/tools/code_checksum.py::compute_code_tree_md5and acompute_code_checksum.pyCLI, implementing the Rules.md §2.1.6 (codeDirectoryContents) and §3.6.1 (trainingClosedSubmissionChecksum) algorithm. Combined with theREFERENCE_CHECKSUMSconstant (commit71a0966), this enables the submission checker to verify at validation time that a code tree handed to it matches the authoritative checksum.What's missing: the benchmark runtime never writes a hash or snapshot during a run. A user inspecting
<results_dir>/training/<model>/<command>/<datetime>/after the fact cannot tell which code version produced it without out-of-band knowledge.Repro
Run any benchmark to completion and inspect the results directory:
No
.code-hash, no source snapshot, no commit SHA in the metadata.Suggested behavior
In
mlpstorage_py/benchmarks/base.py:Benchmark.__init__(or nearwrite_metadata), after the results directory is reserved, write one or more of:.code-hashfile containing the same MD5 digest that PR Submission checker: complete Rules.md coverage + 'mlpstorage validate' subcommand #432'scompute_code_tree_md5produces overmlpstorage_py/.code-snapshot/directory containing a copy ofmlpstorage_py/(andkv_cache_benchmark/,vdb_benchmark/as applicable).code_commit_shaandcode_tree_md5field in the metadata JSON.The most minimal version is the first — one MD5 in a small file, no copying. That is enough for the submission checker to cross-check
<.code-hash from run dir> == compute_code_tree_md5(<code tree at submission time>)and reject results whose code-tree was modified after the run.Coordination
compute_code_tree_md5from PR Submission checker: complete Rules.md coverage + 'mlpstorage validate' subcommand #432 to guarantee the algorithm matches.