Benchmark runtime should record a code snapshot/hash in each result dir

## Summary

When the benchmark runs, it should leave a per-run code-integrity artifact (a `.code-hash` file at minimum, optionally a source snapshot) inside the run's results directory. Today nothing is written, so a results directory in isolation gives no way to verify which version of mlpstorage_py produced it.

## Background

Two pieces of related work exist:

1. **On `main`**: `mlpstorage_py/submission_checker/checks/training_checks.py:303-308` contains:

   ```python
   def closed_submission_checksum(self):
       """
       For CLOSED submissions, verify code directory MD5 checksum.
       """
       # TODO
       return True
   ```

   This is the *verifier* side, currently a stub returning True.

2. **PR #432** (`FileSystemGuy-rules-validator`) adds `mlpstorage_py/submission_checker/tools/code_checksum.py::compute_code_tree_md5` and a `compute_code_checksum.py` CLI, implementing the Rules.md §2.1.6 (`codeDirectoryContents`) and §3.6.1 (`trainingClosedSubmissionChecksum`) algorithm. Combined with the `REFERENCE_CHECKSUMS` constant (commit `71a0966`), this enables the submission checker to verify *at validation time* that a code tree handed to it matches the authoritative checksum.

What's missing: **the benchmark runtime never writes a hash or snapshot during a run**. A user inspecting `<results_dir>/training/<model>/<command>/<datetime>/` after the fact cannot tell which code version produced it without out-of-band knowledge.

## Repro

Run any benchmark to completion and inspect the results directory:

```
$ ./mlpstorage whatif training unet3d datagen file \
    --num-processes 2 --data-dir /tmp/data --results-dir /tmp/results \
    --params dataset.num_files_train=10
$ ls /tmp/results/training/unet3d/datagen/*/
training_<DATETIME>_metadata.json
dlio.log
training_datagen.stderr.log
training_datagen.stdout.log
```

No `.code-hash`, no source snapshot, no commit SHA in the metadata.

## Suggested behavior

In `mlpstorage_py/benchmarks/base.py:Benchmark.__init__` (or near `write_metadata`), after the results directory is reserved, write one or more of:

- A `.code-hash` file containing the same MD5 digest that PR #432's `compute_code_tree_md5` produces over `mlpstorage_py/`.
- A `code-snapshot/` directory containing a copy of `mlpstorage_py/` (and `kv_cache_benchmark/`, `vdb_benchmark/` as applicable).
- A `code_commit_sha` and `code_tree_md5` field in the metadata JSON.

The most minimal version is the first — one MD5 in a small file, no copying. That is enough for the submission checker to cross-check `<.code-hash from run dir> == compute_code_tree_md5(<code tree at submission time>)` and reject results whose code-tree was modified after the run.

## Coordination

- Likely belongs in the same release as PR #432 so the verifier and the producer match.
- Should reuse `compute_code_tree_md5` from PR #432 to guarantee the algorithm matches.
- Independent of the accumulation work in PRs #441 / #442 / #443 — surfaced while end-to-end testing #441's results directory layout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark runtime should record a code snapshot/hash in each result dir #445

Summary

Background

Repro

Suggested behavior

Coordination

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Benchmark runtime should record a code snapshot/hash in each result dir #445

Description

Summary

Background

Repro

Suggested behavior

Coordination

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions