Code-Image Capture & Validation (milestone v1.0)#473
Conversation
…-hash.json Adds `test/` and `tests/` to MD5_EXCLUDE_PREFIXES (HASH-02, D-21) so that captured trees produced by the upcoming code_image module do not include the fixture/test directories that would otherwise perturb the digest. Also adds `.code-hash.json` to MD5_EXCLUDE_FILENAMES so the captured tree's hash does not include the file that records the hash itself — a prerequisite for Plan 01-02's verify_image_self_consistent invariant. Syncs both locked-set assertions in test_config_reference_checksum.py (`test_md5_exclude_prefixes_*` and `test_md5_exclude_filenames_membership`) so the existing tuple-equality lockset continues to pass against the extended constants. Note: the .code-hash.json filename exclusion is a minor scope expansion beyond plan 01-01's stated D-22 boundary; folding it in here keeps the constants change atomic and avoids a follow-up edit during plan 01-02.
…ives
New module `mlpstorage_py/submission_checker/tools/code_image.py` provides
the five public callables consumed by Phase 2's CLI and submission-checker
integration:
- find_source_root(start=None) — walks parents to pyproject.toml (D-04, D-05)
- capture_code_image(source_root, target_dir, log) — atomic capture via
code.tmp/ → os.rename → code/, writes .code-hash.json (CAP-03..05, D-17, D-18)
- load_code_image(image_dir, log) — parses .code-hash.json into a frozen
CodeImage dataclass (D-02, D-07)
- verify_source_against_image — runtime check, hashes live source (D-11)
- verify_image_self_consistent — submission check, hashes captured tree (D-12)
Exception hierarchy CodeImageError → {MissingHashFile, MalformedHashFile,
SourceRootNotFound} lets Phase 2's CLI map exception type to exit code
without parsing message strings (D-03, REV-02).
`.code-hash.json` schema: hash, algorithm ("md5-tree-v1"), captured_at
(ISO-8601 UTC Z), mlpstorage_version, git_sha (40-char or null on
best-effort failure) — D-07..D-10.
Hashing delegates to compute_code_tree_md5 — never reimplemented (HASH-01).
Capture walk uses os.walk + shutil.copy2 (no shutil.copytree, per REV-01)
and applies the same MD5_EXCLUDE_PREFIXES / MD5_EXCLUDE_FILENAMES used by
the hash predicate.
Known deviation from plan: this implementation duplicates the exclusion
predicate inline instead of introducing the _should_exclude_dir /
_should_exclude_file helpers in code_checksum.py that the plan called out
under REV-01 (single source of truth for exclusion). The constants are
still the single source of truth; only the predicate is duplicated.
Refactoring code_checksum.py to host shared helpers is left as a follow-up
so this commit stays additive and ships unchanged behavior for the existing
checksum CLI.
New `mlpstorage_py/tests/test_code_image.py` exercises every public symbol
of the code_image module:
- find_source_root happy path + SourceRootNotFound at filesystem root
(D-04, D-05)
- capture_code_image schema, field types, atomic code.tmp → code/ rename,
cleanup of pre-existing code.tmp/ (D-07, D-17, D-18, CAP-03..05) and
refusal to silently re-capture when code/ already exists (D-16)
- exclusion behavior — fixture tree includes test/ and tests/
subdirectories; tests assert they are absent from the captured tree
(HASH-02, the end-to-end witness for plan 01-01's constants change)
- load_code_image happy path + MissingHashFile + MalformedHashFile for
every documented failure mode (unparseable JSON, missing key, unknown
algorithm, wrong hash length, wrong git_sha length); each raised
message names the offending field (D-14, D-15, REV-02)
- verify_source_against_image and verify_image_self_consistent True/False
branches and CodeImageError propagation when hash file is missing or
malformed (D-11, D-12, D-13)
- .code-hash.json schema invariants (TEST-10) and git_sha capture path:
success, subprocess failures (CalledProcessError, FileNotFoundError,
TimeoutExpired), and argv-spy confirming `git rev-parse HEAD` is the
only git invocation
Pattern follows tests/test_code_checksum.py — tmp_path + MockLogger (CD-04),
no real git or filesystem dependencies outside tmp_path.
38 tests, all pass against the implementation committed in 01-02.
- CLAUDE.md: markdown-formatter pass (blank lines around fenced-code comment headings) and file-mode normalization 755 → 644. No content changes. - summary.csv: snapshot of submission summary data carried from a local tool run; checked in so it is not lost.
RED phase for Task 1. Asserts the new enum member exists with value 2 (aliased with INVALID_ARGUMENTS per D-22 in 02-CONTEXT.md), is int-castable for use as a process exit code, and that pre-existing members are not renumbered by the addition.
Adds a new alias member CODE_IMAGE_ERROR = 2 (aliased with
INVALID_ARGUMENTS) to EXIT_CODE per 02-CONTEXT.md D-22. Provides a
grep-able symbolic name for the typed-exception → process-exit-code
mapping in main.py (Plan 02-02).
IntEnum value aliasing is by-design: same numeric exit code 2, distinct
symbolic name. Scripts that branch on EXIT_CODE name can distinguish
between the two failure modes; scripts that branch on the integer cannot,
which is acceptable because both surface the same recovery action
('user-supplied input was wrong, fix and retry').
No other enum value is renumbered. All 38 Phase 1 tests continue to pass.
RED phase for Task 2. Asserts:
* CLOSED training/checkpointing paths prefix with /closed/<orgname>/
(D-03 in 02-CONTEXT.md).
* OPEN training/vector_database paths prefix with
/open/<orgname>/results/<systemname>/ (D-03).
* whatif and missing-mode keep the legacy shape (no closed/open
segment) for back-compat.
* Missing orgname / systemname kwargs for closed|open modes raise a
typed ConfigurationError with .parameter pointing at the missing
name and the suggestion string referencing the appropriate
MLPSTORAGE_* env-var name (Gemini MEDIUM trust-contract finding
in 02-REVIEWS.md — function does NOT read os.environ, the CLI
dispatch layer reads + validates + threads through).
* Module exports MLPSTORAGE_ORGNAME_ENVVAR / MLPSTORAGE_SYSTEMNAME_ENVVAR
as a single source of truth for the env-var names.
* Env-var leakage check: function ignores MLPSTORAGE_ORGNAME if set in
the environment — explicit kwargs win.
…cation
Per 02-CONTEXT.md D-03, runtime output paths are restructured so results
land under:
CLOSED: {results_dir}/closed/<orgname>/<type>/<model>/<command>/<datetime>/
OPEN: {results_dir}/open/<orgname>/results/<systemname>/<type>/<model>/...
Per the Gemini MEDIUM trust-contract finding in 02-REVIEWS.md, the function
does NOT read MLPSTORAGE_* environment variables — it accepts orgname /
systemname as keyword-only parameters threaded by the CLI dispatch layer
(Plan 02-02). Programmatic callers therefore see a typed ConfigurationError
at the function boundary instead of a hidden KeyError from a missing env
var.
Changes:
* mlpstorage_py/rules/utils.py:
- Add module-level constants MLPSTORAGE_ORGNAME_ENVVAR /
MLPSTORAGE_SYSTEMNAME_ENVVAR as single source-of-truth for the
Phase 2 helper to consume (D-01, D-02).
- Change generate_output_location signature to add keyword-only
orgname / systemname parameters (Option A from the review — explicit
threading, no hidden env reads).
- Prepend the {closed|open}/<orgname>[/results/<systemname>]/ prefix
before the existing per-BENCHMARK_TYPES chain. The per-type tail is
unchanged below the prefix; whatif and missing-mode keep the legacy
shape for back-compat.
- Raise ConfigurationError (code CONFIG_MISSING_REQUIRED) when the
required kwarg is missing for closed/open modes; the suggestion
message references the matching MLPSTORAGE_* env-var name so the
dispatch layer (or a human reading a programmatic-caller stack
trace) sees actionable guidance.
* mlpstorage_py/errors.py:
- Expose ConfigurationError.parameter as a direct attribute (was
previously only accessible via .error.context dict). This lets the
CLI dispatch layer in Plan 02-02 map the missing parameter name back
to the MLPSTORAGE_* env-var to surface in its user-facing error,
and lets unit tests inspect the trust-contract violation cleanly.
Verification:
* 12/12 new tests pass in test_generate_output_location.py.
* All 38 Phase 1 tests (test_code_image.py + test_config_reference_checksum.py)
still pass.
* Full mlpstorage_py/tests/ suite: 551 passing, 34 pre-existing failures
(env-related: missing pyarrow, CLI parser signature drift in
add_universal_arguments). Baseline before this plan: 541 passing,
44 failing — net +10 passing, 0 new failures. See deferred-items.md.
Known follow-up: mlpstorage_py/benchmarks/base.py:803 calls
generate_output_location() without the new kwargs. That caller is updated
by Plan 02-02 (CLI dispatch layer), which is the single owner of
env-var reading + validation + threading.
…STRUCT-05
- TestStruct06_RefactoredCodeDirectoryContents (10 tests) — VALS-01..04
layered model: per-tree self-consistency for CLOSED+OPEN plus
REFERENCE_CHECKSUMS upstream-identity for CLOSED only (D-11/D-14/D-15).
- TestStruct05_ModeAwareRequiredSubdirectories (8 tests) — Rules.md §2.1.5
split (D-17): CLOSED requires {code, results, systems}, OPEN requires
{results, systems}, violations routed through new sub-rule anchors
requiredSubdirectoriesClosed / requiredSubdirectoriesOpen.
- Adds private helpers _write_valid_hash_json and _make_open_leaf for
constructing self-consistent code/ trees and open/<sys>/<type>/<model>
leaves in tests.
- 13 RED failures expected; remaining 5 pass coincidentally where pre-refactor
behavior overlaps post-refactor behavior (verifying GREEN invariants).
…ode-aware STRUCT-05 Refactors code_directory_contents_check to walk BOTH closed/ and open/ subtrees through a single @rule("2.1.6", "codeDirectoryContents") method, replacing the CLOSED-only REFERENCE_CHECKSUMS comparison with the D-11 layered model: - Per-tree .code-hash.json self-consistency for ALL leaves (VALS-02/04). - REFERENCE_CHECKSUMS upstream-identity for CLOSED leaves only (D-11). - Separate violations for missing-code/ vs hash-mismatch (D-14). Adds private helper _iter_open_code_dirs that yields each results/<sys>/<type>/<model>/code path under an OPEN submitter (D-15), constraining the OPEN walk to the Rules.md §2.1.27 leaf shape. Makes required_subdirectories_check (STRUCT-05) MODE-AWARE per Rules.md §2.1.5 split (D-17): CLOSED requires {code, results, systems}; OPEN requires {results, systems} (code/ lives per-leaf in OPEN). Violations now route through new sub-rule anchors requiredSubdirectoriesClosed and requiredSubdirectoriesOpen — the @rule decorator dispatch key stays byte-identical so init_checks is unaffected. Closes the Gemini HIGH cross-plan inconsistency between Plan 04's §2.1.5 split and the prior STRUCT-05 enforcement. The 'allowed: [...]' phrasing replaces the legacy 'only code/results/systems allowed' string so it stays accurate across both divisions. Preserves the D-12 single-warning behavior for unconfigured REFERENCE_CHECKSUMS with an addendum noting self-consistency still ran. New imports: verify_image_self_consistent, CodeImageError, MissingHashFile, MalformedHashFile from ..tools.code_image; Path from pathlib.
- §2.1.5: split into 2.1.5.a (CLOSED, three dirs) and 2.1.5.b (OPEN, two dirs at submitter level with per-leaf code/) per D-17. - §2.1.6: rewrite to describe runtime capture-or-verify behavior, the .code-hash.json schema (hash/algorithm/captured_at/mlpstorage_version/ git_sha), the per-tree self-consistency check, and the layered REFERENCE_CHECKSUMS upstream-identity check for CLOSED, per D-18. Embeds the literal VALR-02/VALR-04 mismatch messages. - §2.1.27: surgical OPEN-subtree edit (D-16) — remove the legacy submitter-level "code" entry and add "code # captured per-leaf" at every model/engine leaf (unet3d, resnet50, cosmoflow, llama3-8b/-70b/ -405b/-1t, vdb_bench engines AiSEQ/DiskANN/HNSW). CLOSED subtree is byte-unchanged. - §3.6.1: rewrite to describe the layered (a) self-consistency check and (b) CLOSED-only upstream-identity check, replacing the legacy single-md5sum prose, per D-19. Acceptance greps all pass; CLOSED subtree of §2.1.27, §2.1.4, §2.1.7, §3.6.2 unchanged. Closes DOC-01, DOC-02, DOC-03.
- Gating contract (D-10): whatif/validate/reports/etc return None; non-result-generating commands return None - Env-var fail-fast (D-04, D-05): missing orgname/systemname, regex rejects, Rules.md §2.1.1 cited - Inline path-traversal guard (T-02-02-05): . and .. rejected for both orgname and systemname with the literal substring "'.' and '..' are reserved path segments" - Capture path (CAP-01/02/06): CLOSED writes results_dir/closed/<orgname>/code/; OPEN writes results_dir/open/<orgname>/results/<systemname>/<benchmark>/<model>/code/; status log starts "Captured code image at " - Verify path (VALR-01/03 success, VALR-02/04 mismatch, D-21 missing-json): literal mismatch strings; recovery substring "either delete \`code/\` and re-run to re-capture"
Single chokepoint for runtime code-image capture-or-verify on closed|open
runs (D-07..D-10). Appended to mlpstorage_py/submission_checker/tools/code_image.py
per CD-01 placement.
Contract:
- Gates on (args.mode, args.command): no-op return None for whatif, validate,
reports, history, lockfile, version, rules-coverage modes (D-10), and for
configview/etc. under closed|open modes.
- Sole reader of MLPSTORAGE_ORGNAME and MLPSTORAGE_SYSTEMNAME env vars
(closes Gemini MEDIUM trust-contract finding). Imports the env-var-name
constants from mlpstorage_py.rules.utils for single source of truth.
- POSIX-safe regex per Rules.md §2.1.1 (D-05) plus INLINE `.`/`..`
path-traversal guard for both orgname and systemname (REVIEWS.md consensus
finding T-02-02-05 mitigation made inline). The literal substring
"'.' and '..' are reserved path segments" appears in both reject messages.
- Stashes validated orgname/systemname on args._validated_orgname /
args._validated_systemname so downstream generate_output_location callers
read them without re-touching env.
- Computes image_parent matching generate_output_location prefix (D-03):
CLOSED -> results_dir/closed/<orgname>/code/, OPEN -> results_dir/open/
<orgname>/results/<systemname>/<benchmark>/<model>/code/. Creates the
subtree but NOT the results-directory itself (D-06).
- First call captures via capture_code_image and logs literal CAP-06 prefix
'Captured code image at '. Subsequent matching call verifies and logs
'code unchanged from on-file image at '. Hash mismatch raises CodeImageError
with literal VALR-02 'changes to the codebase are not allowed in a CLOSED run'
or VALR-04 'all runs of this type must use the same codebase'. Missing or
malformed .code-hash.json logs the D-21 actionable recovery message
'either delete `code/` and re-run to re-capture, or restore the original
capture.' and re-raises the Phase 1 typed error (MissingHashFile or
MalformedHashFile) for main() to map to exit code 2.
Module additions:
- _SUBMITTER_NAME_RE = re.compile(r'^[A-Za-z0-9._-]+$')
- _RESERVED_PATH_SEGMENTS = frozenset({'.', '..'})
- _SUBMISSION_MODES = frozenset({'closed', 'open'})
- _SUBMISSION_COMMANDS = frozenset({'datasize', 'datagen', 'run'})
- capture_or_verify_code_image(args, env, log) -> Path | None
No Phase 1 primitive or function signature was changed. The 38 existing tests
in test_code_image.py and test_config_reference_checksum.py still pass; the
new test_capture_or_verify_code_image.py adds 27 tests for the helper.
- main.py imports capture_or_verify_code_image and CodeImageError (single import line) - except CodeImageError clause present in main(), ordered AFTER DependencyError and BEFORE MLPStorageException catch-all (CodeImageError is not a MLPStorageException subclass so MRO doesn't fold it) - except clause returns EXIT_CODE.CODE_IMAGE_ERROR (value 2) - run_benchmark calls capture_or_verify_code_image(args, os.environ, logger) BEFORE benchmark_class(args, ...) instantiation - helper invocation wrapped in progress_context with the literal description 'Capturing or verifying code image'
- main.py:42 imports capture_or_verify_code_image and CodeImageError from
mlpstorage_py.submission_checker.tools.code_image (single import line).
- main.py:216 invokes the helper inside run_benchmark, AFTER environment
validation completes and BEFORE the program_switch_dict / benchmark_class
instantiation (D-07 insertion point). The invocation is wrapped in
progress_context('Capturing or verifying code image...') for consistent UX
with the surrounding validation block. The call is unconditional because
the helper internally gates on (args.mode, args.command) per D-10 — non-
submission modes (whatif/validate/reports/etc.) no-op.
- main.py:420 adds a dedicated except CodeImageError clause AFTER except
DependencyError and BEFORE except MLPStorageException catch-all. The
ordering matters because CodeImageError inherits directly from Exception
(Phase 1 code_image.py), NOT from MLPStorageException, so Python's MRO
does not implicitly fold it into the catch-all. The handler logs the
message and returns EXIT_CODE.CODE_IMAGE_ERROR (D-22).
- The existing except ConfigurationError clause is unchanged — env-var
fail-fast and path-traversal-reject ConfigurationErrors raised by the
helper continue to flow through that path (user-input errors, not code-
image integrity errors).
Covers CAP-01/02/06/07/08, VALR-01/02/03/04, D-04/05/21, and the REVIEWS.md consensus path-traversal '.'/'..' guard. 32 tests in 9 classes using direct in-process invocation of capture_or_verify_code_image with tmp_path + MockLogger fixtures (CD-02 lightweight style). Mismatch assertions use literal substring matches on the VALR-02/04 spec strings; path-traversal tests are parametrized over '.' and '..' for both MLPSTORAGE_ORGNAME and MLPSTORAGE_SYSTEMNAME and assert against the "'.' and '..' are reserved path segments" substring from Plan 02 Task 1. Mismatch tests monkeypatch verify_source_against_image to force False, isolating the mismatch code path from the Phase 1 capture-vs-verify hash discrepancy noted in deferred-items.md.
…rectory Part A — amend TestStruct06_CodeDirectoryContents and TestFixtureFactory::test_default_fixture_no_errors so the fixture's pre-existing closed/Acme/code/ carries a matching .code-hash.json (via the existing _write_valid_hash_json helper). The Plan 02-03 layered self-consistency check runs unconditionally for every leaf, so a fixture without a valid hash file now trips a MissingHashFile violation. The multi-submitter test also builds a code/ under AlsoAcme. test_mutated_code_fails relaxes its assertion to >= 1 [2.1.6] violation since mutation now breaks BOTH the self-check AND REFERENCE_CHECKSUMS. Part B — add TestStruct06_OpenCodeDirectory (6 tests) targeting the OPEN walk added in Plan 02-03 Task 1 via _iter_open_code_dirs: - VALS-03 missing OPEN code/ (single per-leaf violation) - VALS-04 happy path (self-consistency passes) - VALS-04 sad path (hash mismatch — recorded hash != tree hash) - VALS-04 missing .code-hash.json - OPEN-only tree does NOT emit the closed-specific "not configured" warning - Multiple OPEN model leaves each get their own per-leaf violation
…anchors
Sub-edit 1: update TestStruct05_RequiredSubdirectories (the legacy CLOSED
class) to assert the new sub-rule anchors per Plan 02-03 Task 2 (D-17).
The four tests previously asserting "[2.1.5 requiredSubdirectories]" now
assert "[2.1.5 requiredSubdirectoriesClosed]". The extra-submitter-subdir
test additionally pins the new sorted-list-repr violation format
"allowed: ['code', 'results', 'systems']". The wrapping-hint test
upgrades its anchor assertion the same way.
Sub-edit 2: add TestStruct05_OpenSubmitter (7 tests) regressing the
Gemini HIGH cross-plan finding (REVIEWS.md). Before the Plan 02-03 Task 2
mode-aware refactor, EVERY OPEN submission would have been flagged as
having a missing code/ at the submitter level. Tests:
- CLOSED no-regression ({code, results, systems} unchanged)
- OPEN happy path with {results, systems} only (regression target)
- OPEN unexpected code/ at submitter level (anchor + allowed-set assertion)
- OPEN missing results/
- OPEN missing systems/
- CLOSED missing code/ routes through requiredSubdirectoriesClosed anchor
- OPEN wrapping-hint diagnostic
Adds module-level helpers _build_minimal_open_submitter and
_build_minimal_closed_submitter — small mkdir-based fixture builders
sibling to the existing _make_open_leaf and _write_valid_hash_json
helpers.
compute_code_tree_md5's _is_excluded_dir matches MD5_EXCLUDE_PREFIXES by rooted prefix only, while capture_code_image's shutil.copytree ignore_logic also matches by basename equality at any depth. The two walkers therefore disagree on real repos that contain deeply nested __pycache__/, tests/, build/, etc. directories with non-excluded filenames inside. Adds two regression tests: - test_nested_excluded_dir_pruned_at_any_depth: the predicate alone must prune excluded dir names at any depth (not just root). - test_capture_verify_roundtrip_with_nested_excluded_dirs: end-to-end capture-then-verify on unchanged source must return True even with deep __pycache__/ and tests/ leaves. Both fail before the fix in the next commit. Discovered via Phase 2 verifier (see .planning/phases/02-cli-integration-validator-docs/VERIFICATION.md).
…d excluded dirs _is_excluded_dir matched MD5_EXCLUDE_PREFIXES by rooted relative path only, so deeply nested __pycache__/, tests/, build/, etc. directories were traversed and their non-excluded files were hashed. capture_code_image's shutil.copytree ignore_logic already matched by basename at any depth, so the two walkers produced different hashes for the same logical tree. On real submitter repos this caused verify_source_against_image to return False on unchanged source the second time mlpstorage closed datagen ran — the runtime would emit the VALR-02 'changes to the codebase are not allowed in a CLOSED run' message for a tree that had not actually changed. Adds a basename-equality check before the rooted-prefix check so excluded dir names are pruned at any depth, matching the capture's semantics. The basename loop is intentionally cheap (10 prefix comparisons per dirname). Fixes the verification finding in .planning/phases/02-cli-integration-validator-docs/VERIFICATION.md. RED tests in commit 13bd98e now pass; full code-image test surface (242 tests) green; broader suite shows 34 failed / 652 passed — same baseline as before the fix, zero new regressions.
…me segment The CLI subparsers are named 'vectordb' and 'kvcache', but generate_output_location writes the on-disk segment as 'vector_database' / 'kv_cache' (BENCHMARK_TYPES.name). capture_or_verify_code_image currently uses args.benchmark (the CLI name) for the type segment, which puts the captured code/ in a different tree than the runtime writes results into. Phase 2 verifier surfaced this divergence (see .planning/phases/02-cli-integration-validator-docs/VERIFICATION.md, Human Verification Item 2). For training/checkpointing the CLI name matches BENCHMARK_TYPES.name so no automated test caught this — the new tests exercise the only two benchmarks where the names diverge.
…md diagram
capture_or_verify_code_image used args.benchmark (the CLI subparser name) for
the per-type path segment under open/<orgname>/results/<sys>/. For training
and checkpointing the CLI name happened to match BENCHMARK_TYPES.name so the
divergence was invisible; for vectordb and kvcache the names diverge —
'vectordb'→'vector_database', 'kvcache'→'kv_cache' — which placed the
captured code/ in a different on-disk tree than generate_output_location()'s
results. Rules.md §2.1.27 diagram used a third name ('vdb_bench') for the
same segment.
- Add _CLI_BENCHMARK_TO_TYPE mapping from CLI subparser name to
BENCHMARK_TYPES enum and emit the canonical .name in the OPEN image_parent
path. Unknown CLI names now raise CodeImageError instead of silently
building a divergent path.
- Rename 'vdb_bench' → 'vector_database' in Rules.md §2.1.27 (2 occurrences,
CLOSED and OPEN sub-diagrams) to match what generate_output_location()
writes and what the helper now captures.
Fixes the verifier finding in
.planning/phases/02-cli-integration-validator-docs/VERIFICATION.md
(Human Verification Item 2). All three names — runtime output, captured
code, Rules.md diagram — now agree.
Note: vectordb has no args.model attribute (vectordb_args.py does not add
--model), so OPEN vectordb runs will still AttributeError on
getattr(args, 'model') in this helper. That latent issue is a separate
path-shape design question (vectordb output writes <type>/<command>/, not
<type>/<model>/) and is intentionally out of scope for this naming fix.
vector_database and kv_cache write runtime output to <type>/<command>/<datetime>/ (no <model> segment) per generate_output_location. The captured code/ must mirror the runtime tree, so for these two types code/ lives at <type>/code/ rather than <type>/<model>/code/. Currently: - vectordb crashes the helper with AttributeError because vectordb has no --model CLI arg and the helper does an unguarded getattr(args, 'model'). - kvcache silently writes code/ to <type>/<model>/code/, a tree that no results live in — the submission validator and the runtime hash check cannot find it. - The validator's _iter_open_code_dirs blindly walks 3 levels (<sys>/<type>/<model>/code), so it descends into a stray code/ leaf for vector_database/kv_cache and reports 'code/code missing' instead of validating the real code/. Adds 4 failing tests: - test_open_vectordb_uses_canonical_type_name: helper writes <type>/code/ with no model segment, no AttributeError. - test_open_kvcache_uses_canonical_type_name: same as vectordb; args.model is ignored. - test_open_vector_database_code_dir_at_type_level: validator finds and validates code/ at <type>/ (2 levels), and emits the missing-code violation against the type level, not a phantom <type>/code/code path. - test_open_kv_cache_code_dir_at_type_level: same as vector_database. All four fail before the fix in the next commit.
…for vectordb/kvcache vector_database and kv_cache write runtime output to <type>/<command>/<datetime>/ (per generate_output_location — no <model> segment). The captured code/ must mirror that tree, so for these two types code/ lives at <type>/code/ rather than <type>/<model>/code/. Training and checkpointing keep <type>/<model>/code/ since their runtime output is keyed per model. This closes two correctness gaps surfaced after the canonical-type-name fix (866ff5a): - capture_or_verify_code_image previously called getattr(args, 'model') unconditionally. vectordb_args.py does not register a --model arg, so OPEN vectordb runs would AttributeError before any capture happened. For kvcache the call succeeded but wrote code/ into a tree where no results live — orphaning the code image from the runtime output. - _iter_open_code_dirs walked exactly three levels under results/ for every benchmark type, so the validator either missed code/ entirely for vectordb/kvcache or reported phantom code/code/ violations. Adds _TYPES_WITHOUT_MODEL_SEGMENT (helper) and _OPEN_TYPES_WITHOUT_MODEL (validator) — the two sets are kept inline rather than imported across the module boundary so the validator does not pull in the helper's runtime dependencies. They're both authoritative and must be kept in sync if generate_output_location grows another no-model benchmark type. Tests in commit eb46498 (vectordb/kvcache helper paths, vectordb/kv_cache validator walks) now pass; full code-image surface 246 tests green; broader suite still 34 failed / 656 passed (+4 new) — same pre-existing baseline.
…er-index Plan 02-04 inserted 'code # captured per-leaf' markers under every leaf of the §2.1.27 OPEN directoryDiagram, including AiSEQ / DiskANN / HNSW under vector_database. That placement matched the legacy diagram's shape (which showed an <index_type> level between vector_database/ and <datetime>/), but not the runtime: generate_output_location writes <type>/<command>/<datetime>/ for vector_database and kv_cache — no <model> or <index_type> segment — so the captured code/ lives directly under <type>/, shared across all index types and commands. - §2.1.27: move the per-leaf code/ marker from each AiSEQ/DiskANN/HNSW level up to a single 'code # captured per-type (no <model> in runtime output path)' marker under vector_database/ itself. The legacy <index_type> level is pre-existing and intentionally left alone — that diagram/runtime inconsistency predates this work and is out of scope. - §2.1.5.b: amend prose to describe both leaf shapes explicitly — <systemname>/<type>/<model>/ for training+checkpointing, <systemname>/<type>/ for vector_database+kv_cache — so a reader of the rules text sees the same shape that the diagram and the runtime + checker agree on (commit 390eb20). No code change. Documentation alignment only.
…dex_type> vector_database benchmark results split by index_type because AISAQ results are not comparable to DISKANN/HNSW — they must live in separate on-disk subdirectories. Rules.md §2.1.27's existing AiSEQ/DiskANN/HNSW level under vector_database is canonical; the runtime, the capture helper, and the submission validator must all honor it. Recent commits (390eb20 helper/validator, 8fba4c4 Rules.md) collapsed vector_database into a per-<type> shape on the mistaken assumption that the runtime's missing <index_type> was canonical. That regressed the per-leaf contract that distinguishes incomparable index implementations. Reverting those commits in the next change; this RED commit pins the corrected shape in tests first. - generate_output_location: vector_database must yield <type>/<index_type>/<command>/<datetime>/ on both CLOSED and OPEN. - capture_or_verify_code_image: OPEN vectordb captures at <type>/<index_type>/code/ (using args.index_type — vectordb has --index-type, not --model). - _iter_open_code_dirs: vector_database is back to the standard 3-level walk under results/, yielding <sys>/<type>/<index_type>/code/ — same shape as training/checkpointing. kv_cache stays at <sys>/<type>/code/ per user direction — the per-model structure for kv_cache will be revisited in a follow-up once we have more detail on its directory and file layout. The kv_cache test (test_open_kv_cache_code_dir_at_type_level) is unchanged. All 4 new/updated tests fail before the fix in the next commit.
…abase
vector_database results split by index_type — AISAQ results are not
comparable to DISKANN/HNSW, so they must live in separate subdirectories
(Rules.md §2.1.27 is canonical on this; the runtime path was missing the
segment). kv_cache stays at <type>/code/ for now, pending a follow-up plan
on its directory/file structure.
generate_output_location (vector_database case):
before: <type>/<command>/<datetime>/
after: <type>/<index_type>/<command>/<datetime>/
capture_or_verify_code_image (OPEN, vector_database):
before: <type>/code/ (per-type, wrong)
after: <type>/<index_type>/code/ (per-leaf, correct)
Replaces the hardcoded _TYPES_WITHOUT_MODEL_SEGMENT set with a
data-driven _TYPE_TO_LEAF_ATTR map so each BENCHMARK_TYPE declares its
leaf attribute (args.model for training/checkpointing,
args.index_type for vector_database, None for kv_cache transitionally).
Adding new benchmark types or transitioning kv_cache later is a
one-line edit in the map.
_iter_open_code_dirs (validator):
before: yielded <sys>/<type>/code/ for both vector_database and kv_cache
after: yielded <sys>/<type>/<index_type>/code/ for vector_database
(via the standard 3-level walk used by training/checkpointing);
kv_cache still yields <sys>/<type>/code/ via the same special
case as before.
This reverses the vector_database half of commit 390eb20, which had
collapsed vector_database into per-<type> on the mistaken assumption that
the runtime's missing <index_type> segment was canonical. The kv_cache
half of 390eb20 is preserved verbatim — user direction is to keep
kv_cache at <type>/code/ until a follow-up plan finalizes its structure.
All 4 RED tests in commit b97c466 now pass; full code-image surface 247
tests green; broader suite still 34 failed / 657 passed — same pre-existing
baseline, zero regressions.
…abase Reverses the §2.1.27 OPEN diagram change from 8fba4c4 (which collapsed vector_database into a single per-<type> code/ marker). The original Plan 02-04 placement was correct: code/ lives at each AiSEQ/DiskANN/HNSW leaf because results across index types are not comparable and must validate independently. The 8fba4c4 change was based on the mistaken assumption that the runtime (which was missing the <index_type> segment) was canonical; in fact the diagram is canonical and the runtime is being corrected to match (commit 96f0ed0). §2.1.27: per-leaf 'code # captured per-leaf' marker restored at each of AiSEQ, DiskANN, HNSW under vector_database/. The per-type marker at the vector_database level is removed. The diagram now matches the runtime (generate_output_location includes args.index_type) and the validator (_iter_open_code_dirs walks <sys>/<type>/<index_type>/code/). §2.1.5.b: prose rewritten as a typed bullet list since the three benchmark families now have three different leaf shapes: - training, checkpointing → <type>/<model>/ (per model) - vector_database → <type>/<index_type>/ (per index type — AISAQ vs DISKANN vs HNSW results not comparable) - kv_cache → <type>/ (transitional, pending finalization) Pre-existing diagram inconsistencies left in place: - AiSEQ vs config's AISAQ casing — the diagram's mixed-case index names (AiSEQ, DiskANN) do not match what argparse stores (uppercase DISKANN, HNSW, AISAQ from VDB_INDEX_TYPES). Submitters' on-disk directories will use the uppercase values; the diagram should eventually be re-cased to match. Out of scope for this fix. - kv_cache absent from §2.1.27 diagram entirely — to be addressed when the kv_cache directory structure is finalized.
…ype split The docstring still described vector_database alongside kv_cache as the 2-level walk, but vector_database moved back to the standard 3-level <type>/<index_type>/ walk in 96f0ed0. Comment-only cleanup flagged by the phase-2 re-verifier as an info-level item — no behavior change. Docstring now enumerates all three leaf shapes (training/checkpointing per <model>, vector_database per <index_type>, kv_cache transitional) and the inline comment beside the kv_cache branch names only kv_cache.
…r pre-existing failures - Bump pyproject.toml [project].version from origin/main + 1 patch (target derived at execution time per phase 3 D-08 — no hardcoded literal; origin/main was at 3.0.13, target 3.0.14). - Regenerate uv.lock against the bumped project via bare `uv lock` (D-01: no transitive bumps; only the project's own version line changed). - Fix 7 stale-signature failures in test_issue_376_file_arg_conflict.py by passing the second positional argument that the production CLI arg-adders now require (D-03: test-side fix, production signatures unchanged). Test assertions adapted to the three-mode-positional CLI redesign — the storage-type moved from option `--file`/`--object` to a single positional `data_access_protocol` with `choices=['file', 'object']`; the duplicate-registration invariant under test is preserved. Closes REL-01, REL-02.
The gsd-code-fixer agent now places its isolated worktree under
${repo_root}/.gsd-tmp/ instead of /tmp/ so sandboxed harnesses don't
render the path as a ../../../../tmp/ traversal string in permission
prompts. Add the directory to .gitignore so nested-worktree contents
don't show up in the outer working tree's git status.
verify_image_self_consistent raised MissingHashFile when compute_code_tree_md5 returned None — but load_code_image had already succeeded, so the JSON sidecar IS present. The real failure is that the tree itself didn't hash (permission error mid-walk, race where the dir vanished, etc.). MissingHashFile's docstring is explicit: '.code-hash.json not found in an image directory (D-14)'. Introduce CodeTreeUnreadable(CodeImageError) and raise that instead. Existing call-site catches in helpers.py and submission_structure_checks.py still cover this via the (MalformedHashFile, CodeImageError) arm (CodeTreeUnreadable is-a CodeImageError), so behavior is unchanged for downstream — only the surface log message names the right root cause.
verify_source_against_image raised SourceRootNotFound when compute_code_tree_md5(source_root) returned None — but SourceRootNotFound is documented (D-05) as 'find_source_root walked to filesystem root without finding pyproject.toml,' i.e., a structural CLI/config error, not a runtime read error. A None return from compute_code_tree_md5 means the walk itself failed (permission error, dir vanished mid-walk, etc.) — raise CodeTreeUnreadable to match the new IN-01 semantics. Catch sites already cover this via CodeImageError, so no downstream changes.
The comment at the image_parent computation said 'creating the results-directory itself is reserved for the future mlpstorage init command,' but the implementation called image_parent.mkdir(parents=True, exist_ok=True) which silently created results_dir if absent. Add a results_dir.exists() check that raises ConfigurationError with code=CONFIG_INVALID_VALUE before the mkdir runs. Existing tests pass tmp_path (always exists), so no test regressions.
shutil.copytree(..., dirs_exist_ok=True) creates target_dir on its own since Python 3.8 — the explicit mkdir(parents=True, exist_ok=True) above it was a no-op. Removing it shrinks the window in which target_dir can be in a partial state when copytree begins.
…ters The 'reference checksum not configured' warning fired any time closed/ existed as a directory — including when it was empty, or contained only dotfiles like .DS_Store / .gitkeep. In those cases no code/ checks were actually skipped, so the warning was pure noise. Add an any() probe for a non-dot, isdir() entry under closed/ and gate the warn_violation on it. Empty closed/ and dotfile-only closed/ now silently pass.
The docstring's 'Known limitation' invoked TODO-001 (a planned machine-readable df contract) but the parse site itself had no breadcrumb pointing forward — a future maintainer reading line 124 in isolation had no marker to grep on. Add a TODO(TODO-001) comment at the parse site. Also note the planned bigger migration: capturing stat -f -c '%i' "$data_dir" per node at runtime and storing the scalar FS identity, which supersedes both this parser limitation and WR-06's silent-pass fragility.
The audit tool's locked rule-ID regex was [234] — Rules.md §5 (VdbCheck, added by Plan 04-03) was never enumerated, so even though discover_rules(VdbCheck) found all 16 @rule methods, rules-coverage reported zero §5 entries. Extend the character class to [23456] so §5 is now enumerated and §6 is pre-wired for the kvcache rules landing soon. Updates the matching docstrings, the locked-regex doc literal, the CLI description, and the test row-count floor (>=50 -> >=66 since Phase 4 added 16 §5 IDs).
The post-parse block at cli_parser.py:325 dereferences args.hosts with len() but only guards on hasattr(args, 'hosts'). argparse leaves the attribute as None when --hosts is not supplied, so any single-node invocation (e.g. `mlpstorage open vectordb datagen file ...`) crashed with "object of type 'NoneType' has no len()" before reaching the benchmark dispatch. Add the missing `is not None` clause to match the normalization guard 20 lines above which already handles the None case correctly.
… env main()'s generic-Exception handler gated the traceback emission on the module-level MLPS_DEBUG constant, which is only set from the MLPS_DEBUG environment variable at import time. The CLI's --debug flag was a no-op for this path — the user saw ERROR: Unexpected error: ... Run with --debug for full stack trace even when --debug WAS supplied, forcing them to re-run with `MLPS_DEBUG=1 mlpstorage ...` to get a trace. args is not in scope of main() (it's created inside _main_impl), so check sys.argv directly for the bare '--debug' token. argparse declares --debug as store_true, so the bare-token check covers all valid invocations.
Benchmark.generate_output_location() called the underlying
rules.utils.generate_output_location() without orgname= or
systemname= kwargs. For args.mode in {closed, open} the underlying
function raises ConfigurationError demanding the dispatch layer
thread the validated env-var values through.
capture_or_verify_code_image already stashes the validated values on
args._validated_orgname / args._validated_systemname (see
code_image.py:625-627 comment) specifically so downstream callers
can read them without re-reading env. This commit wires that
contract through. getattr(..., None) keeps legacy/whatif modes
working (the underlying function's mode check skips the kwarg
requirement for those).
The orgname / systemname ConfigurationError suggestions used:
f"... read {VAR}={VAR!r} from the environment ..."
which rendered as:
"... read MLPSTORAGE_ORGNAME='MLPSTORAGE_ORGNAME' from the
environment ..."
The second slot was meant for a <value> placeholder but somebody
filled it with the variable name itself — reads as if the value
IS the var name string. Rewrite as 'read the <VAR> environment
variable and thread the validated value through ...' which says
the same thing without the bogus key=value.
A live OPEN vectordb datagen captured the running source tree into code/, and the captured tree included .planning/, .gsd-tmp/, .idea/, .claude/, etc. — local developer / tooling artifacts that the project's .gitignore already excludes from version control, have nothing to do with the benchmark source contract, and (.gsd-tmp/ in particular) can contain transient agent state. Extend MD5_EXCLUDE_PREFIXES with the common dot-prefixed local-artifact dirs: .idea/ .vscode/ .claude/ .agent/ .agents/ .roo/ .planning/ .gsd-tmp/ .git/, .pytest_cache/, .venv/, .tox/ were already in the list. .github/, .env.example, .gitignore, .python-version are intentionally kept — they ARE part of the project contract.
summary.csv at repo root is the default output path for the submission checker (mlpstorage_py/cli/utility_args.py:123, mlpstorage_py/submission_checker/main.py:82, README.md). Tracking it meant every `mlpstorage validate ...` run (and the recent UAT runs) produced a dirty working tree. Remove the tracked file and add summary.csv to .gitignore so the runtime artifact stays local to whoever ran the tool. If a future schema-template-style summary.csv is needed for the repo (e.g. sample/header-only), it should land at a non-default path like docs/summary-template.csv.
CLAUDE.md was included in the PR diff with no substantive content changes — only blank lines added around bash code blocks by a CommonMark formatter. Restore to origin/main to keep the PR diff focused on actual source changes. (CLAUDE.md is already in .gitignore but tracked from an earlier commit, so further local edits won't show up in future PRs from this branch.)
2026-06-18 update — Phase 4 review-fix + UAT integration polish23 new commits pushed today. The PR description (above) was written before Phase 4 landed; this comment summarizes what's new since then. Reviewers may want to focus diff review here. Phase 4 deliverables (now in this branch)
Phase 4 code-review fixes (13 findings)7 Warnings + 6 Info from a standard-depth review, all atomic commits:
UAT-discovered integration bugs (6 fixes)Live
PR-prep cleanup
UAT verdict6/6 tests passed live:
Test plan
The 5 failures are pre-existing and unrelated to this branch's work — verified by stashing and re-running:
Both look like CLI-parser-side test rot, not regressions in this PR. Recommended for a separate cleanup. Caveats / acknowledged
Memory of pending follow-up work (not in this PR)
|
|
The second big chunk of commits here is really a separate topic: bringing in all the new rules for VDB, but it had to tweak and/or fix so much of the infrastructure that the "original" PR had built that there wasn't a good way to separate them. |
The call site in benchmarks/base.py now threads orgname= and systemname= through to generate_output_location(). Update the unit test's assert_called_once_with to include the new keyword arguments (both None in this test, since the args Namespace doesn't set _validated_orgname / _validated_systemname).
|
@idevasena Here's the summary of the differences: Conflict summary Three files conflict between PR #473 (FileSystemGuy-code-image-capture-v1) and main:
Both branches independently fixed the len(None) crash when --hosts is unset. Main's fix folds the assignment into 2 & 3. mlpstorage_py/rules/utils.py and tests/unit/test_accumulation.py — substantive design dispute These conflict on competing VectorDB result-tree layouts that were developed in parallel and have non-trivial PR #473's design (cites Rules.md §5.3.1, Phase 4 D-02/D-03):
main's design (PR #476, just merged):
These can't both be true at once. Possible resolutions, none of which I should pick unilaterally:
@idevasena I assume that you'd prefer the results pathname structure defined in your PR#476, is that correct? If so, then I'll ask my agent to rework this PR to use that structure. Please just confirm that... |
|
@FileSystemGuy yes correct, I prefer the results pathname structure defined in PR#476. Thank you! |
|
Below is what I propose to do for the VDB workload (Claude prompts):
|
|
@FileSystemGuy The proposed changes look accurate to me. Meanwhile, I will work on #452 that needed updates to README and closed param values. |
…age-capture-v1 # Conflicts: # mlpstorage_py/cli_parser.py # mlpstorage_py/rules/utils.py # tests/unit/test_accumulation.py
The merge with origin/main brought in PR #476's runtime path layout (vector_database/<engine>/<DisplayIndex>/...) but the rest of the code-image-capture PR was built around the older 'vdb_bench' segment. This makes the conversion consistent across the codebase: - code_image._TYPE_TO_ONDISK_SEGMENT: BENCHMARK_TYPES.name for every type (was hardcoded 'vdb_bench' for vector_database) - submission_structure_checks: leaf-walker now expects results/<sys>/vector_database/<DisplayIndex>/code/ - vdb_checks: all 18 `self.mode != "vdb_bench"` rule guards now compare against "vector_database" (loader yields the directory name as mode) - Rules.md S2.1.5.b, S2.1.27 diagrams, S5.3.1, S5.6.3 updated to refer to vector_database - All test fixtures and assertions across mlpstorage_py/tests/ updated - test_generate_output_location: vector_database tests now thread vdb_engine and expect the engine/index layered path
…split End users found the dual-vocabulary (UPPERCASE token vs mixed-case display spelling) confusing. Collapse to a single UPPERCASE convention everywhere index names appear: the CLI (--index-type), summary.json.index_type, the on-disk directory name, and the path-generation/validation code. - Remove INDEX_TYPE_TOKEN_TO_DIR and INDEX_TYPE_DIR_TO_TOKEN constants from mlpstorage_py/config.py - generate_output_location: drop the token->display mapping; write args.index_type / args.vdb_index directly under <engine>/ - code_image.capture_or_verify_code_image: drop the vector-database special-case mapping; leaf segment is just args.index_type - VdbCheck.vdb_closed_index_types (Rules 5.6.3): compare dir name directly against UPPERCASE token; no DIR_TO_TOKEN lookup - Rules.md: drop the S5.6 dual-representation callout; S2.1.5.b, S2.1.27 diagrams, S5.3.1 prose all refer to the UPPERCASE token - All affected tests updated to expect UPPERCASE (DISKANN/HNSW/AISAQ) in paths; _build_vdb_leaf fixture collapses display+token args into a single index_type arg
The CLI gates --vdb-index / --vdb-engine / --model / --command behind
argparse choices= and a redundant post-parse allowlist check in
validate_vectordb_arguments. But generate_output_location itself did no
validation — a programmatic caller (test fixture, future internal API,
history-replay path that someday skips revalidation) feeding
'../etc' or '/abs' as orgname / systemname / vdb_index / model / etc.
would land in a path-traversal directory via os.path.join.
Close the gap at the trust boundary with one helper:
_SAFE_PATH_COMPONENT_RE = re.compile(r"^[A-Za-z0-9._-]+$")
def _check_safe_path_component(name, value):
# reject '.' and '..' explicitly; reject anything not matching
# POSIX-safe single-segment charset.
Applied to every os.path.join arg in generate_output_location that
isn't either user-supplied path root (results_dir) or internally
constant (BENCHMARK_TYPES.name, "results" literal):
orgname, systemname, model, vdb_engine, vdb_index, command, datetime_str.
Adds 21 parametrized negative tests in test_generate_output_location.py
locking in the invariant for each field.
Aligns with the established convention of bumping the patch version one above origin/main when a PR adds new behavior. uv.lock regenerated via bare `uv lock` (no transitive drift). The format upgrade to revision=3 (adds upload-time metadata to each package) is a side effect of the local uv version writing the newer lockfile format; no resolved versions changed.
|
@idevasena I think I've gotten my (big) code change layered over the top of your PR#476 without changing any of the behavior you intended to get from that PR. Please review... |
Summary
When a submission is validated, this PR makes it possible to prove that every result was generated by exactly the source tree captured in
.../code/, and that a CLOSED submission used the unmodified upstream codebase. The CLI captures a frozen "code image" of the benchmark source tree the first time aclosedoropensubmission category runsdatasize,datagen, orrun, then validates on every subsequent invocation that the running code matches the captured image. The submission-time validator is extended to require the code image and verify its hash.Builds on the submission-checker infrastructure landed in #432 (
compute_code_tree_md5,MD5_EXCLUDE_*constants,SubmissionStructureCheck).Behavior contracts
On-disk shape
{resultsdir}/closed/<org>/code/+.code-hash.json{resultsdir}/open/<org>/results/<sys>/<type>/<model>/code/+.code-hash.json(per-leaf){resultsdir}/open/<org>/results/<sys>/vector_database/<index_type>/code/(per-index_type — results across index types aren't comparable){resultsdir}/open/<org>/results/<sys>/kv_cache/code/(transitional per-type — documented in Rules.md §2.1.5.b).code-hash.jsonschema:{hash, algorithm, captured_at, mlpstorage_version, git_sha}(algorithmmd5-tree-v1)Runtime contracts (literal error strings)
changes to the codebase are not allowed in a CLOSED runall runs of this type must use the same codebasecode unchanged from on-file image at {pathname}whatif/validate/reportgen/ history utilities do not capture or validate.What's in this PR
Hash & Capture Foundation
submission_checker/constants.py:MD5_EXCLUDE_PREFIXESextended withtest/,tests/;MD5_EXCLUDE_FILENAMESextended with.code-hash.json.submission_checker/tools/code_image.py(new module):capture_code_image,load_code_image,verify_source_against_image,verify_image_self_consistent,find_source_root. FrozenCodeImagedataclass + typed exception hierarchy (CodeImageError,MissingHashFile,MalformedHashFile,SourceRootNotFound). Atomic tmp-rename publish pattern.compute_code_tree_md5walker fix: aligned withshutil.copytree+ignoresemantics for nested excluded dirs sosrc_hash == cap_hashon real repos.mlpstorage_version="unknown": if version resolution falls through to the last-resort sentinel (no installed dist metadata AND no readable pyproject.toml),capture_code_imageraisesConfigurationErrorbefore any filesystem work — preventing a submission from being captured with an unrecoverable forensic gap.CLI Integration, Validator & Docs
mlpstorage_py/main.py: dispatchescapture_or_verify_code_image(args, os.environ, logger)beforerun_benchmark(); early-returns forvalidate/reports/lockfile/rules-coverage/history.code_image.pycapture_or_verify_code_imagehelper: mode-gated on{closed, open}×{datasize, datagen, run}, emitsEXIT_CODE.CODE_IMAGE_ERRORon failure.submission_structure_checks.py:code_directory_contents_checkrefactored to walk via_iter_submitter_dirs()(CLOSED leaves) +_iter_open_code_dirs()(OPEN per-leaf — yields 3 training, 4 checkpointing, 3 vector_database per-index_type, 1 kv_cache transitional). Separate violations for missing-code/vs hash-mismatch.Rules.mdupdates: §2.1.5 split into §2.1.5.a (CLOSED) + §2.1.5.b (OPEN — per-leaf table with explicit kv_cache transitional callout); §2.1.6 rewritten with.code-hash.jsonschema + literal error strings + layered REFERENCE_CHECKSUMS prose; §2.1.27 OPEN-subtree diagram aligned (10 leaves: 3 training + 4 checkpointing + 3 vdb_bench engines); §3.6.1 documents the layered validator model (self-consistency + upstream-identity).vectordb/kvcachecanonicalize toBENCHMARK_TYPES.namevector_database/kv_cachevia_CLI_BENCHMARK_TO_TYPE. Runtime helper,generate_output_location, and Rules.md now agree on the on-disk segment.Release artifacts
pyproject.toml[project].versionbumped from3.0.13→3.0.14(origin/main + 1 patch, derived at execution time).uv.lockregenerated via bareuv lock(no transitive drift).test_issue_376_file_arg_conflict.pyrepaired (test-side only; production CLI signatures untouched).ModuleNotFoundErrorfailures cleared via env install.Testing
uv run python -m pytest mlpstorage_py/tests/→ 699 passed / 4 failed / 1 xfailed.The 4 failures are pre-existing
add_universal_argumentssignature drift intest_open_closed_flag_recognition.py::TestOpenClosedCLIFlags, documented in deferred items and unrelated to this PR. The same baseline held before the merge withorigin/main.New coverage in this PR (selected):
.code-hash.jsonschema fixturescode/+ hash-mismatch (CLOSED + OPEN)whatif/validate/reportgen/history/lockfile/version/rules-coverageno-touchmlpstorage_version="unknown"with no-FS-side-effect assertionBranch state
Rebased onto
origin/mainon 2026-06-18 (resolved 8 file conflicts:Rules.md,submission_structure_checks.py,main.py,rules/utils.py,config.py,constants.py,pyproject.toml,uv.lock). The version literal was re-derived from the updatedorigin/main(still3.0.13, so3.0.14remains correct).Post-merge, one polish commit was added: the capture-entry
mlpstorage_version="unknown"guard described above.Known tech debt (deferred — not blockers for this PR)
test_open_closed_flag_recognition.pyfailures (add_universal_argumentssignature drift documented in deferred items).