Spectral-rollout loss + cold-start training for poly Ouroboros#5
Open
jmxpearson wants to merge 31 commits into
Open
Spectral-rollout loss + cold-start training for poly Ouroboros#5jmxpearson wants to merge 31 commits into
jmxpearson wants to merge 31 commits into
Conversation
Hand-port from PR #4 (arneodo-parameterization). Adds drive_lowpass_ms (default 0.0) and keep_const (default False) to the polynomial Ouroboros only, with matching zero-phase Gaussian _lowpass / _lowpass_weights helpers applied to omega, gamma, and the kernel weights inside forward and get_funcs. Skips the Arneodo class entirely. model/kernels.py: gate the (0,0) constant-term zero on fullPolyModule.keep_const. train/train.py: persist parameterization / drive_lowpass_ms / keep_const in the checkpoint and read them back in load_model. Verified end-to-end (forward + get_funcs, both keep_const settings) on a tiny n_layers=2 poly model with drive_lowpass_ms=1.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hand-port from PR #4 (arneodo-parameterization). Adds the closed-loop polynomial Ouroboros integrator, the autonomy-based validation metric (spec_corr - amp_pen - pitch_pen, plus bounded_frac), and the deployed generate-with-amplitude-rescale helper. Stripped the is_poly branch and integrate_model_autonomous reference -- this PR is poly-only. Verified end-to-end on a random-init poly model over a 50 ms segment: integration runs, autonomy_score returns finite breakdown, generate_autonomous returns bounded waveform. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/stage_finch_blk445_syllC.py is a verbatim port from PR #4 that symlinks the Mooney/Muscimol blk445 directory layout (segs/<day>/syllable_C + <day>/double_denoised) into ~/ouroboros_data/blk445_syllC/day{day}/ with the {stem}.wav / {stem}.txt convention the segmented-audio loader expects. Idempotent. Verified: 738/624/871 paired files staged for days 84/85/86 (empty annotations skipped, no missing wavs). examples/_voc_windows.py is a new module that hoists the int16-fixed load_voc_windows / load_voc_windows_coldstart helpers out of the legacy run_lambda_pipeline so the new spectral entry point can reuse them without dragging in the rest of that pipeline. Verified against day 85 (sr=44100, 3 coldstart segs OK, 3 mid-voc segs OK). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
data/load_data.py: - ONSET=0, OFFSET=1, MID=2 category constants. - get_audio_training_edge_weighted: builds three pools per (wav, txt) pair (onset-aligned + silence_prefix_ms prefix; offset-aligned + silence_suffix_ms tail; non-overlapping mid-syllable windows inside [on+edge_ms, off-edge_ms]), then samples to max_segs with the requested category ratio. int16 normalize. - get_segmented_audio_edge_weighted: convenience wrapper that pairs the audio/seg directories like get_segmented_audio. data/data_utils.py: - aud_neur_ds now optionally carries per-example category labels; __getitem__ returns a 4-tuple (x, dxdt, d2x, category) when categories are set. - _stratified_split + get_loaders_edge: per-category stratified train/val/test split so val/test are never starved of ONSET examples even at low onset ratio. Verified on day 85 (sr=44100): 300 segs sampled with the (0.4, 0.4, 0.2) ratio gives exactly 120/120/60 counts; stratified split yields 240/30/30 with 4-element batches containing category tensors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Verbatim port of train/rollout_refine.py from PR #4. This PR uses mrstft_loss, stft_mag, gaussian_envelope, env_loss, and gather_windows as building blocks for the new spectral-rollout training step. The rollout_refine driver function itself is kept too (post-hoc fine-tune is still useful as a diagnostic) but is NOT the primary training loop for this PR. Verified mrstft_loss + env_loss imports and run on random (4, 2000) waveforms. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ranch train/spectral_rollout.py (NEW): - teacher_forced_rollout: lifts the RK4 + soft-tanh kernel from train/rollout_refine.py into a standalone function. Drives encoded from the target via model.get_funcs (low-passed inside the model); state (x, x') autonomous starting from the data IC or, for examples whose ic_mask is True, from N(0, ic_noise_rms^2) -- the cold-start training mode for ONSET segments. - spectral_rollout_step: combines MRSTFT magnitude loss on the integrated waveform, a variance-normalized teacher-forced d2-MSE anchor, and an optional envelope L1 term. Auto-drops STFT configs with n_fft > H. - horizon_for_epoch: geom/linear/const curriculum so the rollout horizon ramps up across training. train/train.py: - loss_mode='mse_accel' (default, legacy) | 'spectral_rollout' kwarg, with the spectral path branching at the top of the inner loop -- skips model.forward entirely (spectral_rollout_step does its own get_funcs with a cloned dxdt), unpacks the 4-tuple batch from the edge-biased loader, builds ic_mask = (cats == ONSET), and logs per-component losses + the current horizon to TB. NaN total -> skip optimizer step + log counter, grad-clip 5.0. - Val loop now tolerates 4-tuple batches; in spectral mode it is skipped entirely (autonomy-based selection in train/model_cv handles validation). - Pre-existing bug fixed: total_loss was previously only defined inside the if-reg_weights block, which crashed unregularized runs. Smoke-tested end-to-end on day 85: 2 epochs, H=256, 64 segments, 4-tuple batches with ONSET ic_mask. Pipeline runs, gradients finite, no NaNs. autonomy_score returns the expected -5.0 divergence baseline for an undertrained model. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a poly-only seed loop alongside the existing model_cv_lambdas. Fixes lambda (no kernel-weight CV in this PR) and varies only the random init, which is the dominant axis for autonomous-reconstruction quality. Each seed trains with loss_mode='spectral_rollout'; validation uses cold-start raw autonomy_score on held-out (silence lead-in + voc) windows. Features: - Resume-from-checkpoint per seed: a seed with a checkpoint past the target epoch is not retrained. - Optional seed culling (cull_frac, cull_keep): train all seeds to cull_frac * n_epochs, rank by val cold-start autonomy, finish only the top cull_keep. Roughly halves seed-search cost. - seed_cv.csv summary written to model_path. - Tie-breaker: bounded_frac (so a collapsed-but-lucky rollout doesn't beat a truly bounded one at the same val_autonomy). - Attaches _selected_seed / _selected_lambda / _selected_val_autonomy / _selected_test_autonomy / _selected_test_breakdown to the returned model for the caller's manifest. Verified end-to-end on day 85: 2 seeds, cull_frac=0.5, cull_keep=1. Both seeds train, cull ranking runs, top-1 resumes from its cull checkpoint, autonomy_score returns the expected -5.0 (divergence sentinel) for the undertrained pair, manifest fields populate, seed_cv.csv lands on disk. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
examples/train_poly_spectral_blk445.py: production entry-point for the spectral-rollout polynomial Ouroboros on a single (bird, syllable, day). Pipeline: - file-level holdout of WAV stems inside --data-dir (shuffle deterministic by --seed): last --test-frac for test, next --val-frac for val, rest train. - training set via get_audio_training_edge_weighted (onset/offset/mid pools, configurable ratio). - held-out val/test cold-start vocs (silence-pad lead-in + voc) via inline _coldstart_from_files helper (parallel to examples/_voc_windows.py but file-list based for the single-directory holdout). - model_seed_cv_spectral with full CLI surface for the spectral loss (lam-spec/lam-tf/lam-env, MRSTFT configs, H curriculum, ic-noise-rms, grad-clip) + seed culling. - deployed (rescaled) generation of the first test voc -> WAV. - selected_model.json manifest with selected seed/lambda, val/test cold-start raw scores, deployed rescaled score, all loss + sampler hparams, sr. Verified end-to-end on day 85 (2 seeds x 1 cull + 1 finish epoch, H=256, lr=1e-4): file-level split 500/62/62, sampler 26/26/12 (ONSET/OFFSET/MID), both seeds run, cull ranks them, top seed finishes, bounded_frac=1.0 (soft- tanh holds), manifest + WAV + seed_cv.csv all land on disk. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Recipe + smoke-test invocation + loss math + acceptance bar for the new spectral-rollout training pipeline. Includes a comparison table vs the legacy run_lambda_pipeline so the differences (loss, sampler, IC, holdout, selection) are easy to scan, and a list of known follow-ups (multi-day holdout, causal Mamba, generic per-syllable autosegmenter) explicitly called out as out of scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
In spectral_rollout mode the val loop is skipped (`continue` at top of the val block) because cold-start autonomy_score handles validation in model_seed_cv_spectral. The periodic save_model was nested INSIDE the val block, so save_freq was a no-op in spectral mode: only the final checkpoint written by _train_to() landed on disk. Move the periodic checkpoint OUTSIDE the val block, gated on save_freq > 0. Both legacy MSE mode and spectral mode now write checkpoints at the configured cadence. (This does not affect the in-flight day85 run, which is already going without intermediate checkpoints. Future runs pick up the fix.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round 2 of the spectral-mode save-checkpoint fix. The previous fix moved save_model OUT of the val block, but the spectral-mode early-exit inside the val block was still a `continue`. Because the val block sits at the epoch-loop level, that `continue` jumps to the NEXT epoch -- skipping the (now sibling) save_model block too. So periodic checkpoints still never landed in spectral mode. Restructure the val block to gate its entry on loss_mode != spectral and 'val' in loaders (instead of entering and bailing with continue). The save_model block at the bottom of the epoch loop now always runs. (Observed live: a 4h spectral run produced ZERO checkpoints; the bug was masked by the smoke test, which used --cull-frac > 0, so save_model fired from _train_to() after each `target`-epoch milestone.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ining Two follow-up fixes for the spectral-rollout training step, both motivated by an end-to-end smoke run on day85 that exposed early-epoch instability: 1. Precompute Var(d2x) over the training set ONCE before the epoch loop (rollout_refine.py:110 convention), pass it to spectral_rollout_step as tf_var. Per-batch d2x.var() is unstable when the batch is dominated by silence (ONSET segments), and dividing the TF anchor by a near-zero normaliser blows it up. Cache the precomputed value with a 1e-6 floor. 2. Add spec_warmup_epochs (default 5): linearly ramp lam_spec from 0 to its target over the first N epochs. Random-init Mamba produces saturated rollouts whose MRSTFT against quiet / onset targets is enormous; without warmup the spectral term swamps the TF anchor and the model fails to enter a learnable basin. Threading: - train/train.py: tf_var precomputed in the spectral_rollout setup block, spec_warmup_epochs ramps lam_spec_t each step, logs Loss/total and Train/lam_spec_t to TensorBoard. - train/spectral_rollout.py: spectral_rollout_step accepts tf_var (uses it instead of per-batch variance when provided). - train/model_cv.py: model_seed_cv_spectral accepts spec_warmup_epochs and forwards it to train(). - examples/train_poly_spectral_blk445.py: --spec-warmup-epochs CLI flag, persisted in selected_model.json manifest. - docs/spectral_rollout_blk445.md: short section explaining when to set spec_warmup_epochs to 0 (resuming a checkpoint past warmup). Smoke run (1 seed, 2 epochs, 64 segs, n_layers=2, H=256) on day85: PIPELINE DONE, manifest + WAV written, test breakdown bounded_frac=1.0 -- the soft-tanh saturation + cold-start noise IC integrate cleanly even at random init. (autonomy still -11 at this microscopic budget; real numbers need the full 50-epoch / 4-seed run.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The save_model 5-keep policy evicts older checkpoints once a 6th is written. For a 50-epoch run at --save-freq 1 that means only checkpoints 46-50 survive, which loses the ability to revisit / resume from any earlier epoch (and the single-seed 50-epoch monitoring run we're about to kick off wants to keep all of them so the user can identify when the model actually converged). Threaded max_saved through train() (default 5, backward-compat), through model_seed_cv_spectral, and added --max-saved to the entry point (default 60 -- enough to retain every epoch of a 50-epoch run). Verified end-to-end with --save-freq 1 --max-saved 10: both checkpoint_0 and checkpoint_1 are retained on disk after training (vs. checkpoint_1 only with the prior default). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/monitor_spectral_diagnose.py: latest_checkpoint() sorted glob results lexicographically, so once the epoch counter went into double digits checkpoint_9.tar sorted AFTER checkpoint_15.tar and the monitor silently stopped emitting NEW_CKPT / PLATEAU events on every ckpt past epoch 9. Sort by parsed epoch number instead. scripts/plot_blk445_specgram.py: new helper that loads a checkpoint, picks the same held-out cold-start val voc autonomy_score evaluates (rng seed 1234, voc-idx CLI flag), runs integrate_poly_autonomous (or generate_autonomous with --rescale), and writes a 2x2 figure (target waveform + spectrogram on top, autonomous on bottom). Saves to <seed_dir>/specgram_ckpt<N>_voc<I>.png. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mous The autonomous waveform now inherits the target's ylim so the reconstruction is read on the target's amplitude scale. For rescaled plots this makes the match easy to judge; for raw plots it visually clips a saturated rollout / flattens a collapsed one, which is itself informative. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/monitor_spectral_diagnose.py: --seed-dir, --state-path, --train-pattern CLI flags so a second monitor instance can watch a parallel run dir without copying the script. Defaults preserve the legacy single-run behavior. scripts/plot_loss_panels.py: --smooth defaults to 'auto', picking max(1, min(5, shortest_run_epochs // 3)) so a short overlay (e.g. env-run with 3 epochs vs live-run with 27) keeps multiple points instead of dissolving under valid-mode convolve. Integer override still works. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a faint dashed vertical line at the first epoch where Train/lam_spec_t reaches its max (detected from the TB stream). Drawn in the run's color on every panel so readers know that pre-boundary total-loss values are weighted- ramp artifacts of spec_warmup, not signal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
spectral_rollout_step and teacher_forced_rollout each called model.get_funcs (all three Mamba encoders) on identical (x, dxdt, dt) every training step -- once for the teacher-forced anchor, once for the rollout. The drives are deterministic in their inputs, so the second pass was pure waste. Add a `drives` parameter to teacher_forced_rollout and pass the (omega, gamma, weights, z2) already encoded for the TF anchor. Backprop through the single shared forward sums the TF and spectral gradients exactly as the two separate forwards did, so this is gradient-equivalent -- it just halves the encoder forwards per spectral-rollout step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four correctness-preserving speedups to the per-step training path: - RK4 rollout: hoist per-substep constants out of the 4x-called inner f() -- square omega once over the horizon and slice ga[:,k]/w[:,k] once per step instead of re-slicing/re-squaring on every RK4 substep; reuse the kernel's cached powers vector instead of allocating a fresh arange each rollout. Applied in both spectral_rollout.py and rollout_refine.py. - Dataloader: store batches as float32 instead of float64 (halves H2D bytes) and add pin_memory=True + non_blocking=True host->device transfers so copies overlap compute. - Logging: collapse 5 per-step .item() device->host syncs (one a duplicate of out["spec"]) into a single torch.stack(...).tolist(). - MRSTFT: lru_cache the STFT Hann window (depends only on n_fft, was rebuilt ~6x per step). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
train/train.py: --env-warmup-epochs (default 0) linearly ramps lam_env from 0 to its target over the first N epochs, mirroring spec_warmup_epochs. Needed when running large lam_env (1e4+) so the random-init env gradient doesn't blow up params before the TF anchor has stabilized. Logs Train/lam_env_t to TB so the ramp is visible. train/model_cv.py: env_warmup_epochs threaded through model_seed_cv_spectral. examples/train_poly_spectral_blk445.py: --env-warmup-epochs CLI flag, persisted in selected_model.json manifest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
env1e4 was falling through to matplotlib's default color cycle, which starts at tab:blue and collided with live. Add an explicit entry so the three concurrent runs are distinguishable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The torch index pin used bare top-level `[sources]`/`[[index]]` tables, which uv does not recognize (the schema is `[tool.uv.sources]` / `[[tool.uv.index]]`). So the directive was silently ignored and the lock resolved torch 2.8.0 from PyPI -- the +cu128 build, whose binaries start at sm_70 and fail on Pascal cards (e.g. the GTX 1080 Ti, sm_61) with "no kernel image is available for execution on the device". Move the config under `[tool.uv.*]` and relock so torch resolves to 2.8.0+cu126 (ships sm_61 kernels) on Linux, CPU wheel elsewhere. Drop the unused torchvision source. Requires uv >= 0.4.23 for named indexes. Verified: a fresh `uv sync` yields torch 2.8.0+cu126 and runs a CUDA matmul on the 1080 Ti. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n buckets
The autonomous RK4 rollout is a sequential Python loop over the horizon H on tiny
tensors, so it is launch-overhead-bound (~3.2 s/step eager at H=2000 on a contended
1080 Ti). Add a `rollout_backend` switch threaded through train() ->
spectral_rollout_step() -> teacher_forced_rollout():
- 'eager' : the existing Python loop (default; unchanged behavior).
- 'cudagraph' : torch.cuda.make_graphed_callables captures a fwd+bwd CUDA graph and
replays it (~3.4x measured at H=400). Works on Pascal+ (no Triton).
Falls back to eager if capture fails (e.g. OOM), cached so it does
not re-attempt every step.
- 'compile' : torch.compile(mode='reduce-overhead'); needs CUDA capability >= 7.0
for the Triton backend, else auto-falls back to 'cudagraph'.
The RK4 core is factored into `_rk4_core_factory(H, powers)` so all three backends
share one implementation; graphs/compiled fns are cached per (H, batch, dtype).
Graphs require static shapes (one capture per distinct H), so add a 'pow2' horizon
schedule: factor-of-2 buckets from H_min to H_max (final capped at H_max), spread
evenly across the same n_epochs ramp -- e.g. 512->1024->2000 over the run, just 3
distinct horizons. train() warns if a graph backend is paired with a non-bucketed
schedule.
Verified on the GTX 1080 Ti (cu126): eager matches the original loop bit-for-bit
(fwd + all grads, max|Δ|=0), cudagraph matches eager (fwd + grads, max|Δ|=0), and
'compile' correctly falls back to cudagraph on sm_61. Default stays 'eager', so
training behavior is unchanged unless opted in.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… via CLI Bump the spectral-rollout H_max default 2000 -> 2048 so the pow2 horizon schedule yields clean factor-of-2 buckets [512, 1024, 2048] with no capped final bucket (2048 < segment length L, which is context-len 0.05s + 25ms silence pre/suffix). Updated in train.train, train.model_cv, the example CLI default, and the docstring. Thread `rollout_backend` from the example CLI through model_seed_cv_spectral to train(), and add 'pow2' to the --H-schedule choices so the bucketed schedule (needed by the graphed backends) is selectable. Default stays 'eager'; recorded in the run manifest. NOTE: a contended benchmark on the shared GTX 1080 Ti (5 concurrent seed jobs, GPU at 100% util) showed no cudagraph speedup -- CUDA-graph capture OOM'd and fell back to eager, and a saturated GPU hides the launch overhead graphs remove anyway. cudagraph is expected to help only on an underutilized/single-process GPU, so it stays opt-in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Express the autonomous RK4 rollout as a scan (the jax.lax.scan idiom): a single-step
combine_fn `_rk4_step(carry=(xc,xp), x=(om2k,gak,w_k)) -> ((xc',xp'), xc')`, driven by
`_scan_rollout`. Two drivers:
- use_hop=True: torch._higher_order_ops.scan under torch.compile -- lowers the step
body ONCE instead of Dynamo-unrolling the H-step Python loop, which is what makes
the whole-loop 'compile'/'cudagraph' backends infeasible at training horizons
(unroll blowup / ~15 MB/step capture memory -> ~30 GB at H=2048). Needs sm>=70.
- use_hop=False: the identical eager fold (same fold semantics, no compile) -- runs
everywhere, including Pascal where torch.compile is unavailable.
Wired as rollout_backend='scan': sm>=70 compiles the scan HOP; otherwise it warns and
runs the eager fold. Added to the example CLI choices. Default stays 'eager'.
NOTE: experimental and NOT speed-validated -- the scan HOP requires a working
torch.compile, which this Pascal GTX 1080 Ti (sm_61) lacks, so it can only run as the
eager fold here. Correctness verified on-GPU: the eager-fold scan core is bit-identical
to the whole-loop eager core in forward AND gradients (max|Δ|=0), both at the core level
and through teacher_forced_rollout. Speed needs an sm>=70 card to evaluate; this is a
sequential scan (no time parallelism), so the win there is compile/fusion + no Python
loop overhead, not parallelism.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add explicit color for the lam_env=1e5 run so it's distinguishable from the green env1e4 line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Default init gives |gamma| ~ 0.4 with random per-seed sign (envelope tau ~0.1 ms), so ~half of seeds are purely dissipative -- any oscillation dies instantly and the model can only AM-modulate input/IC noise. osc_init makes the damping a small NEGATIVE constant (slow energy injection) and seeds an amplitude-dependent re-damping (+y^2*ydot) plus a hardening cubic (+y^3) so growth saturates into a bounded limit cycle. Control-head weights for the seeded terms are zeroed so they start as clean constants the encoders learn to modulate; only the structural prior is injected. Threaded through model_seed_cv_spectral and exposed as --osc-init (default off; baseline behavior unchanged). Smoke-tested: with lam_env=1e5 and pow2 horizon buckets the training step stays finite/bounded, and from a noise IC osc_init ignites a bounded oscillation (rms 4.2e-3 -> 5.7e-2) where the default init decays to zero. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add explicit color for the --osc-init lam_env=1e5 run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add env_loss_log to train/rollout_refine.py: |log((env(a)+eps)/(env(g)+eps))| averaged over time and batch. Symmetric in (auto, target) scale, dB-natural, no trivial-silent-floor like the linear env_loss (which lets the optimizer satisfy the penalty by shrinking the rollout amplitude past target). Threaded lam_env_log (weight) and env_log_eps (noise floor) through spectral_rollout_step, train(), model_seed_cv_spectral, and the entry script. Reuses env_warmup_epochs to ramp lam_env_log alongside lam_env. Logs Loss/env_log + Train/lam_env_log_t to TB when active. Use case: the current --lam-env=1e5 osc-init run pulled the rollout from ~28x loud (osc-init default) past target into ~1430x quiet (decay basin) -- env_loss can't tell those apart in its tail. env_loss_log treats both symmetrically (|log K| same for K and 1/K) so the optimizer can't escape into the decay basin. eps trade-off: smaller eps -> closer to amp_pen, but huge gradient on silence-prefix samples; larger eps -> bounded gradient near silence but loses sensitivity to extreme decay. Default 1e-4 matches normalized blk445's noise floor; effective |log| cap is ~log(env(g)/eps). Verified end-to-end with a synthetic test: K=1 -> env_loss=0, env_loss_log=0 K=28 -> env_loss=27, env_loss_log=3.31 (matches |log 28|=3.33) K=1/1430-> env_loss=0.999, env_loss_log=3.66 (eps-capped; |log 1430|=7.27) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add explicit colors for the two osc-init experiments comparing the new log-ratio envelope loss against no envelope penalty. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
arneodo-parameterization): drive low-pass on (ω, γ, kernel weights) inside the polynomial Ouroboros,autonomy_score+ poly autonomous integration, the int16-aware voc-window loaders, the blk445 staging script, and the MRSTFT loss machinery fromtrain/rollout_refine.py. Arneodo and the k-rollout consistency loss are NOT pulled in — this PR is poly-only and replaces the k-rollout machinery with the spectral rollout.examples/train_poly_spectral_blk445.pyruns the full recipe on a single (bird, syllable, day): file-level holdout, edge-biased sampler,model_seed_cv_spectral(seed loop + cull + cold-start raw autonomy selection), deployed (rescaled) generation + manifest with the selection breakdown. Recipe doc atdocs/spectral_rollout_blk445.md.Branch is structured as 9 logically-independent commits (one per file group) — see
git log main..spectral-loss-coldstartfor the sequence.Test plan
bounded_frac=1.0,seed_cv.csv+ WAV +selected_model.jsonland on disk, no NaN crashes.--n-seeds 4 --n-epochs 50 --cull-frac 0.4 --cull-keep 2on day 85 (~6 h on one GPU). Acceptance bar: valbounded_frac >= 0.9by epoch 10, valspec_corr >= 0.5by end of training,coldstart_raw_test_autonomy > 0, and the deployed WAV is recognizably syllable-like with a clean onset. Seedocs/spectral_rollout_blk445.mdfor the full recipe.--loss-mode mse_accel(legacy) to confirm spectral mode beats it on cold-start raw autonomy andamp_pen.🤖 Generated with Claude Code