feat(model_zoo): weekly rotation scheduler — train the N stalest specs per run (L4488g)#234
Merged
Merged
Conversation
…s per run (L4488g) Operator decision (2026-06-02): keep a zoo of N challenger specs but train only 2-3 each Saturday to bound compute. This adds the rotation MECHANISM (scheduler + config + tests); it does NOT touch the live Saturday SF — nothing auto-fires, `--weekly-rotation` is runnable on demand and the SF wiring + per-spec leaderboard stay as follow-ups (designed against a real run, per the accepted Plan B). - select_rotation_specs(specs, registered_versions, budget): pure policy — picks the `budget` STALEST active specs by their newest registered version date (never-registered = maximally stale → trained first; id tiebreak for determinism). Maps a registered version back to its spec via model_version == the spec's label (manifest["version"]=MODEL_VERSION_LABEL, set by train_spec; verified meta_trainer.py:3537). - train_weekly_rotation(...): trains the selected specs (registry read best-effort + injectable for tests; one spec's failure never aborts the rest). Registry-unreachable → id-ordered fallback, logged, never fatal. - MODEL_ZOO_WEEKLY_BUDGET config (default 3) + sample.yaml knob (validated parses — CI copies the sample). - CLI: `--weekly-rotation [--budget N]`. Design rule it embodies (from the ROADMAP L4488g re-spec): rotation only controls version FRESHNESS — a frozen version keeps shadow-scoring daily regardless of retrain, so it never starves the GATE 2 maturity clock; the zoo refreshes round-robin over ~ceil(active/N) weeks. Champion retrain is separate + always-on; this budget is the challenger zoo on top. Tests: 6 new (label resolution, never-trained-first, oldest-version-first via newest-per-spec, budget cap, retired excluded, rotation trains only the budget + continues past failure). Full predictor suite: 1382 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Implements the rotation mechanism for the model zoo per your 2026-06-02 decision: keep a zoo of N challenger specs but train only 2–3 each Saturday to keep compute tenable.
Scope (the option you picked): scheduler + config + tests only. This does not touch the live Saturday SF — nothing auto-fires;
--weekly-rotationis runnable on demand. The SF/EventBridge wiring + per-spec leaderboard aggregation remain follow-ups, to be designed against a real run (the Plan B you accepted).How
select_rotation_specs(specs, registered_versions, budget)— pure, unit-testable policy. Picks thebudgetstalest active specs by their newest registered version date; a never-registered spec is maximally stale (trained first); id tiebreak for determinism. Maps a registered version back to its spec viamodel_version == the spec's label(manifest["version"] = MODEL_VERSION_LABEL, set bytrain_spec— verifiedmeta_trainer.py:3537).train_weekly_rotation(...)— trains the selected specs. Registry read is best-effort (+ injectable for tests); registry-unreachable falls back to id-ordered selection (logged, never fatal). One spec's failure never aborts the rest.MODEL_ZOO_WEEKLY_BUDGETconfig (default 3) +sample.yamlknob (validated it parses — CI copies the sample).python -m training.model_zoo --weekly-rotation [--budget N].The design rule it embodies
Rotation only controls version freshness — a frozen version keeps shadow-scoring daily regardless of whether its spec was retrained this week, so it never starves the GATE 2 maturity clock. The zoo refreshes round-robin over ~
ceil(active/N)weeks. The champion retrain is separate + always-on; this budget is the challenger zoo on top. (Full spec: ROADMAPL4488g.)Tests
6 new: label resolution (overrides → declared →
spec-<id>), never-trained-first, oldest-version-first via newest-per-spec, budget cap, retired excluded, rotation trains only the budget + continues past failure. Full predictor suite: 1382 passed.Not in this PR (follow-ups)
--weekly-rotationinto a parallel, off-critical-path Saturday spot (production behavior + weekly compute commit).--all-activeafter 6/6 first).🤖 Generated with Claude Code