Skip to content

feat(training): model zoo — declarative variant specs + train driver (L4488c)#231

Merged
cipher813 merged 2 commits into
mainfrom
feat/l4488c-model-zoo
Jun 2, 2026
Merged

feat(training): model zoo — declarative variant specs + train driver (L4488c)#231
cipher813 merged 2 commits into
mainfrom
feat/l4488c-model-zoo

Conversation

@cipher813
Copy link
Copy Markdown
Owner

L4488c — item 3 of the model-rotation scaffolding arc (L4488); the experiment/model-spec layer (SOTA pillar 2).

A model spec is a config OVERLAY over the existing training knobs. Running a spec trains ONE variant and registers it as a challenger (capture-gap; challenger-first never overwrites the champion) — which the shadow runner (#228) shadows and the net-of-cost scorer (L4488b) ranks. This is the layer that lets a variety of models be rotated in/out as experiments are run.

Thin by design (not a generic ML platform)

Reuses train_handler.main() unchanged, applying a spec's overrides around the call via a save/restore context — the knobs are module-level cfg constants read via cfg.X at call time (verified: no bare from config import …). No config-object refactor of the trainer.

What

  • training/model_zoo.pyresolve_spec / spec_overrides (allowlisted, restores on exception + removes previously-absent attrs) / train_spec / train_all_active (sequential; one spec's failure never aborts the rest) + CLI (--spec / --all-active / --list). Override allowlist = {FORWARD_DAYS, RESIDUAL_MOMENTUM_ENABLED, XSEC_DEMEAN_ALPHA_ENABLED, MODEL_VERSION_LABEL}; disallowed keys fail loud.
  • meta_trainer — manifest/feature_list/summary versionMODEL_VERSION_LABEL (default v3.0-meta), so each spec registers under its own version_id ({label}-{date}-{fp}) — distinct challengers on the leaderboard.
  • configMODEL_SPECS (list, default []) + MODEL_VERSION_LABEL; sample.yaml documents the schema.

Documented limitation

A horizon (FORWARD_DAYS) override additionally needs any import-time-derived constant verified to read cfg.X at call time — checked when the 60d variant lands (L4488d). The CLI runs one spec at a time (~one full train each), so the operator paces experiments rather than looping all in one Saturday SF.

Tests

+8 (resolve active/retired/missing; allowlist; save/restore incl. exception + absent-attr; train_spec applies overrides + defaults label; all-active skips retired + continues on failure). Updated the feature_list source-pin for the spec-driven version. Suite 1363→1371.

Next: L4488d — seed the zoo (residual-momentum, 60d-target, nonlinear-blender) → populates the challenger track and settles the horizon call via L4488b's net-of-cost leaderboard.

cipher813 and others added 2 commits June 2, 2026 13:27
…(L4488c)

Item 3 of the model-rotation scaffolding arc (L4488); the experiment/model-spec
layer (SOTA pillar 2). A "model spec" is a config OVERLAY over the existing
training knobs; running a spec trains ONE variant and registers it as a
CHALLENGER (capture-gap; challenger-first never overwrites the champion), which
the shadow runner shadows and the net-of-cost scorer (L4488b) ranks. Lets a
variety of models be rotated in/out as experiments are run.

Deliberately a THIN spec-overlay, NOT a generic ML platform: reuses
train_handler.main() unchanged, applying a spec's overrides around the call via
a save/restore context (the knobs are module-level cfg constants read via cfg.X
at call time — verified no bare imports). No config-object refactor of the
trainer.

- training/model_zoo.py: resolve_spec / spec_overrides (allowlisted save+restore,
  restores on exception, removes previously-absent attrs) / train_spec /
  train_all_active (sequential; one spec's failure never aborts the rest) + CLI
  (--spec / --all-active / --list). Override allowlist = {FORWARD_DAYS,
  RESIDUAL_MOMENTUM_ENABLED, XSEC_DEMEAN_ALPHA_ENABLED, MODEL_VERSION_LABEL};
  disallowed keys fail loud.
- meta_trainer: manifest/feature_list/summary version -> MODEL_VERSION_LABEL
  (default v3.0-meta) so each spec registers under its own version_id
  ({label}-{date}-{fp}) — distinct challengers on the leaderboard.
- config: MODEL_SPECS (list, default []) + MODEL_VERSION_LABEL; sample.yaml
  documents the model_specs schema.

Limitation documented: a horizon (FORWARD_DAYS) override needs any
import-time-derived constant verified to read cfg.X at call time — checked when
the 60d variant lands (L4488d). CLI runs one spec at a time (each ~one full
train) so the operator paces experiments rather than looping all in one SF.

Tests: +8 (resolve active/retired/missing; allowlist; save/restore incl.
exception + absent-attr; train_spec applies overrides + defaults label;
all-active skips retired + continues on failure). Updated the feature_list
source-pin for the spec-driven version. Suite 1363 -> 1371.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ining:, broke YAML)

The L4488c model-zoo doc block + top-level model_version_label were inserted
BETWEEN batch_size and learning_rate inside the training: mapping, so CI (which
copies predictor.sample.yaml -> predictor.yaml) failed to parse it (block-end
expected at learning_rate). Local suite passed because the real gitignored
predictor.yaml was used, not the sample. Moved the block to a proper top-level
location after shadow_versions:. Sample now parses.
@cipher813 cipher813 merged commit 4b60740 into main Jun 2, 2026
1 check passed
@cipher813 cipher813 deleted the feat/l4488c-model-zoo branch June 2, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant