Skip to content

Benchmark runtime should record a code snapshot/hash in each result dir #445

@FileSystemGuy

Description

@FileSystemGuy

Summary

When the benchmark runs, it should leave a per-run code-integrity artifact (a .code-hash file at minimum, optionally a source snapshot) inside the run's results directory. Today nothing is written, so a results directory in isolation gives no way to verify which version of mlpstorage_py produced it.

Background

Two pieces of related work exist:

  1. On main: mlpstorage_py/submission_checker/checks/training_checks.py:303-308 contains:

    def closed_submission_checksum(self):
        """
        For CLOSED submissions, verify code directory MD5 checksum.
        """
        # TODO
        return True

    This is the verifier side, currently a stub returning True.

  2. PR Submission checker: complete Rules.md coverage + 'mlpstorage validate' subcommand #432 (FileSystemGuy-rules-validator) adds mlpstorage_py/submission_checker/tools/code_checksum.py::compute_code_tree_md5 and a compute_code_checksum.py CLI, implementing the Rules.md §2.1.6 (codeDirectoryContents) and §3.6.1 (trainingClosedSubmissionChecksum) algorithm. Combined with the REFERENCE_CHECKSUMS constant (commit 71a0966), this enables the submission checker to verify at validation time that a code tree handed to it matches the authoritative checksum.

What's missing: the benchmark runtime never writes a hash or snapshot during a run. A user inspecting <results_dir>/training/<model>/<command>/<datetime>/ after the fact cannot tell which code version produced it without out-of-band knowledge.

Repro

Run any benchmark to completion and inspect the results directory:

$ ./mlpstorage whatif training unet3d datagen file \
    --num-processes 2 --data-dir /tmp/data --results-dir /tmp/results \
    --params dataset.num_files_train=10
$ ls /tmp/results/training/unet3d/datagen/*/
training_<DATETIME>_metadata.json
dlio.log
training_datagen.stderr.log
training_datagen.stdout.log

No .code-hash, no source snapshot, no commit SHA in the metadata.

Suggested behavior

In mlpstorage_py/benchmarks/base.py:Benchmark.__init__ (or near write_metadata), after the results directory is reserved, write one or more of:

The most minimal version is the first — one MD5 in a small file, no copying. That is enough for the submission checker to cross-check <.code-hash from run dir> == compute_code_tree_md5(<code tree at submission time>) and reject results whose code-tree was modified after the run.

Coordination

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions