Skip to content

Spectral-rollout loss + cold-start training for poly Ouroboros#5

Open
jmxpearson wants to merge 31 commits into
mainfrom
spectral-loss-coldstart
Open

Spectral-rollout loss + cold-start training for poly Ouroboros#5
jmxpearson wants to merge 31 commits into
mainfrom
spectral-loss-coldstart

Conversation

@jmxpearson
Copy link
Copy Markdown
Member

Summary

  • Replace the legacy MSE-on-d²y/ds² training objective for the polynomial Ouroboros with a DDSP-style multi-resolution STFT magnitude loss (arxiv:1910.11480) on a teacher-forced RK4 reconstruction of the waveform, plus a variance-normalized acceleration MSE anchor and an optional envelope L1 amplitude pin.
  • Cold-start ignition training: on ONSET segments the RK4 initial condition is overridden with low-amplitude Gaussian noise, so the model has to learn to ignite from silence-near-silence (the situation the real-time synthesis target actually faces) rather than from the data IC.
  • Edge-biased segment sampler heavily oversamples syllable onsets / offsets (configurable ONSET/OFFSET/MID ratio, default 0.4 / 0.4 / 0.2) and carries per-segment category labels through to the training loop.
  • Cherry-picked from PR Add Arneodo-2021 syrinx-ODE parameterization with autonomous integration #4 (arneodo-parameterization): drive low-pass on (ω, γ, kernel weights) inside the polynomial Ouroboros, autonomy_score + poly autonomous integration, the int16-aware voc-window loaders, the blk445 staging script, and the MRSTFT loss machinery from train/rollout_refine.py. Arneodo and the k-rollout consistency loss are NOT pulled in — this PR is poly-only and replaces the k-rollout machinery with the spectral rollout.
  • New entry point examples/train_poly_spectral_blk445.py runs the full recipe on a single (bird, syllable, day): file-level holdout, edge-biased sampler, model_seed_cv_spectral (seed loop + cull + cold-start raw autonomy selection), deployed (rescaled) generation + manifest with the selection breakdown. Recipe doc at docs/spectral_rollout_blk445.md.

Branch is structured as 9 logically-independent commits (one per file group) — see git log main..spectral-loss-coldstart for the sequence.

Test plan

  • Smoke test on day 85 (2 seeds × 1 cull + 1 finish epoch, H=256, lr=1e-4): file-level split 500/62/62, sampler 26/26/12 (ONSET/OFFSET/MID), cull → resume → final selection all work, bounded_frac=1.0, seed_cv.csv + WAV + selected_model.json land on disk, no NaN crashes.
  • Full run (deferred to a follow-up): --n-seeds 4 --n-epochs 50 --cull-frac 0.4 --cull-keep 2 on day 85 (~6 h on one GPU). Acceptance bar: val bounded_frac >= 0.9 by epoch 10, val spec_corr >= 0.5 by end of training, coldstart_raw_test_autonomy > 0, and the deployed WAV is recognizably syllable-like with a clean onset. See docs/spectral_rollout_blk445.md for the full recipe.
  • Baseline comparison: same data + capacity with --loss-mode mse_accel (legacy) to confirm spectral mode beats it on cold-start raw autonomy and amp_pen.

🤖 Generated with Claude Code

jmxpearson and others added 30 commits June 4, 2026 15:16
Hand-port from PR #4 (arneodo-parameterization). Adds drive_lowpass_ms
(default 0.0) and keep_const (default False) to the polynomial Ouroboros
only, with matching zero-phase Gaussian _lowpass / _lowpass_weights helpers
applied to omega, gamma, and the kernel weights inside forward and
get_funcs. Skips the Arneodo class entirely.

model/kernels.py: gate the (0,0) constant-term zero on fullPolyModule.keep_const.
train/train.py: persist parameterization / drive_lowpass_ms / keep_const in
the checkpoint and read them back in load_model.

Verified end-to-end (forward + get_funcs, both keep_const settings) on a tiny
n_layers=2 poly model with drive_lowpass_ms=1.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hand-port from PR #4 (arneodo-parameterization). Adds the closed-loop
polynomial Ouroboros integrator, the autonomy-based validation metric
(spec_corr - amp_pen - pitch_pen, plus bounded_frac), and the deployed
generate-with-amplitude-rescale helper.

Stripped the is_poly branch and integrate_model_autonomous reference --
this PR is poly-only.

Verified end-to-end on a random-init poly model over a 50 ms segment:
integration runs, autonomy_score returns finite breakdown, generate_autonomous
returns bounded waveform.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/stage_finch_blk445_syllC.py is a verbatim port from PR #4 that
symlinks the Mooney/Muscimol blk445 directory layout (segs/<day>/syllable_C
+ <day>/double_denoised) into ~/ouroboros_data/blk445_syllC/day{day}/ with
the {stem}.wav / {stem}.txt convention the segmented-audio loader expects.
Idempotent. Verified: 738/624/871 paired files staged for days 84/85/86
(empty annotations skipped, no missing wavs).

examples/_voc_windows.py is a new module that hoists the int16-fixed
load_voc_windows / load_voc_windows_coldstart helpers out of the legacy
run_lambda_pipeline so the new spectral entry point can reuse them without
dragging in the rest of that pipeline. Verified against day 85 (sr=44100,
3 coldstart segs OK, 3 mid-voc segs OK).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
data/load_data.py:
- ONSET=0, OFFSET=1, MID=2 category constants.
- get_audio_training_edge_weighted: builds three pools per (wav, txt) pair
  (onset-aligned + silence_prefix_ms prefix; offset-aligned + silence_suffix_ms
  tail; non-overlapping mid-syllable windows inside [on+edge_ms, off-edge_ms]),
  then samples to max_segs with the requested category ratio. int16 normalize.
- get_segmented_audio_edge_weighted: convenience wrapper that pairs the
  audio/seg directories like get_segmented_audio.

data/data_utils.py:
- aud_neur_ds now optionally carries per-example category labels; __getitem__
  returns a 4-tuple (x, dxdt, d2x, category) when categories are set.
- _stratified_split + get_loaders_edge: per-category stratified train/val/test
  split so val/test are never starved of ONSET examples even at low onset ratio.

Verified on day 85 (sr=44100): 300 segs sampled with the (0.4, 0.4, 0.2) ratio
gives exactly 120/120/60 counts; stratified split yields 240/30/30 with
4-element batches containing category tensors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Verbatim port of train/rollout_refine.py from PR #4. This PR uses
mrstft_loss, stft_mag, gaussian_envelope, env_loss, and gather_windows as
building blocks for the new spectral-rollout training step. The rollout_refine
driver function itself is kept too (post-hoc fine-tune is still useful as a
diagnostic) but is NOT the primary training loop for this PR.

Verified mrstft_loss + env_loss imports and run on random (4, 2000) waveforms.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ranch

train/spectral_rollout.py (NEW):
- teacher_forced_rollout: lifts the RK4 + soft-tanh kernel from
  train/rollout_refine.py into a standalone function. Drives encoded from the
  target via model.get_funcs (low-passed inside the model); state (x, x')
  autonomous starting from the data IC or, for examples whose ic_mask is True,
  from N(0, ic_noise_rms^2) -- the cold-start training mode for ONSET
  segments.
- spectral_rollout_step: combines MRSTFT magnitude loss on the integrated
  waveform, a variance-normalized teacher-forced d2-MSE anchor, and an
  optional envelope L1 term. Auto-drops STFT configs with n_fft > H.
- horizon_for_epoch: geom/linear/const curriculum so the rollout horizon
  ramps up across training.

train/train.py:
- loss_mode='mse_accel' (default, legacy) | 'spectral_rollout' kwarg, with
  the spectral path branching at the top of the inner loop -- skips
  model.forward entirely (spectral_rollout_step does its own get_funcs with
  a cloned dxdt), unpacks the 4-tuple batch from the edge-biased loader,
  builds ic_mask = (cats == ONSET), and logs per-component losses + the
  current horizon to TB. NaN total -> skip optimizer step + log counter,
  grad-clip 5.0.
- Val loop now tolerates 4-tuple batches; in spectral mode it is skipped
  entirely (autonomy-based selection in train/model_cv handles validation).
- Pre-existing bug fixed: total_loss was previously only defined inside the
  if-reg_weights block, which crashed unregularized runs.

Smoke-tested end-to-end on day 85: 2 epochs, H=256, 64 segments, 4-tuple
batches with ONSET ic_mask. Pipeline runs, gradients finite, no NaNs.
autonomy_score returns the expected -5.0 divergence baseline for an
undertrained model.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a poly-only seed loop alongside the existing model_cv_lambdas. Fixes
lambda (no kernel-weight CV in this PR) and varies only the random init,
which is the dominant axis for autonomous-reconstruction quality. Each seed
trains with loss_mode='spectral_rollout'; validation uses cold-start raw
autonomy_score on held-out (silence lead-in + voc) windows.

Features:
- Resume-from-checkpoint per seed: a seed with a checkpoint past the target
  epoch is not retrained.
- Optional seed culling (cull_frac, cull_keep): train all seeds to
  cull_frac * n_epochs, rank by val cold-start autonomy, finish only the
  top cull_keep. Roughly halves seed-search cost.
- seed_cv.csv summary written to model_path.
- Tie-breaker: bounded_frac (so a collapsed-but-lucky rollout doesn't beat
  a truly bounded one at the same val_autonomy).
- Attaches _selected_seed / _selected_lambda / _selected_val_autonomy /
  _selected_test_autonomy / _selected_test_breakdown to the returned model
  for the caller's manifest.

Verified end-to-end on day 85: 2 seeds, cull_frac=0.5, cull_keep=1.
Both seeds train, cull ranking runs, top-1 resumes from its cull
checkpoint, autonomy_score returns the expected -5.0 (divergence sentinel)
for the undertrained pair, manifest fields populate, seed_cv.csv lands on
disk.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
examples/train_poly_spectral_blk445.py: production entry-point for the
spectral-rollout polynomial Ouroboros on a single (bird, syllable, day).

Pipeline:
- file-level holdout of WAV stems inside --data-dir (shuffle deterministic
  by --seed): last --test-frac for test, next --val-frac for val, rest train.
- training set via get_audio_training_edge_weighted (onset/offset/mid pools,
  configurable ratio).
- held-out val/test cold-start vocs (silence-pad lead-in + voc) via inline
  _coldstart_from_files helper (parallel to examples/_voc_windows.py but
  file-list based for the single-directory holdout).
- model_seed_cv_spectral with full CLI surface for the spectral loss
  (lam-spec/lam-tf/lam-env, MRSTFT configs, H curriculum, ic-noise-rms,
  grad-clip) + seed culling.
- deployed (rescaled) generation of the first test voc -> WAV.
- selected_model.json manifest with selected seed/lambda, val/test cold-start
  raw scores, deployed rescaled score, all loss + sampler hparams, sr.

Verified end-to-end on day 85 (2 seeds x 1 cull + 1 finish epoch, H=256,
lr=1e-4): file-level split 500/62/62, sampler 26/26/12 (ONSET/OFFSET/MID),
both seeds run, cull ranks them, top seed finishes, bounded_frac=1.0 (soft-
tanh holds), manifest + WAV + seed_cv.csv all land on disk.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Recipe + smoke-test invocation + loss math + acceptance bar for the new
spectral-rollout training pipeline. Includes a comparison table vs the
legacy run_lambda_pipeline so the differences (loss, sampler, IC, holdout,
selection) are easy to scan, and a list of known follow-ups (multi-day
holdout, causal Mamba, generic per-syllable autosegmenter) explicitly
called out as out of scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
In spectral_rollout mode the val loop is skipped (`continue` at top of the
val block) because cold-start autonomy_score handles validation in
model_seed_cv_spectral. The periodic save_model was nested INSIDE the val
block, so save_freq was a no-op in spectral mode: only the final checkpoint
written by _train_to() landed on disk.

Move the periodic checkpoint OUTSIDE the val block, gated on save_freq > 0.
Both legacy MSE mode and spectral mode now write checkpoints at the
configured cadence.

(This does not affect the in-flight day85 run, which is already going
without intermediate checkpoints. Future runs pick up the fix.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round 2 of the spectral-mode save-checkpoint fix. The previous fix moved
save_model OUT of the val block, but the spectral-mode early-exit inside
the val block was still a `continue`. Because the val block sits at the
epoch-loop level, that `continue` jumps to the NEXT epoch -- skipping
the (now sibling) save_model block too. So periodic checkpoints still
never landed in spectral mode.

Restructure the val block to gate its entry on loss_mode != spectral and
'val' in loaders (instead of entering and bailing with continue). The
save_model block at the bottom of the epoch loop now always runs.

(Observed live: a 4h spectral run produced ZERO checkpoints; the bug
was masked by the smoke test, which used --cull-frac > 0, so save_model
fired from _train_to() after each `target`-epoch milestone.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ining

Two follow-up fixes for the spectral-rollout training step, both motivated
by an end-to-end smoke run on day85 that exposed early-epoch instability:

1. Precompute Var(d2x) over the training set ONCE before the epoch loop
   (rollout_refine.py:110 convention), pass it to spectral_rollout_step as
   tf_var. Per-batch d2x.var() is unstable when the batch is dominated by
   silence (ONSET segments), and dividing the TF anchor by a near-zero
   normaliser blows it up. Cache the precomputed value with a 1e-6 floor.

2. Add spec_warmup_epochs (default 5): linearly ramp lam_spec from 0 to its
   target over the first N epochs. Random-init Mamba produces saturated
   rollouts whose MRSTFT against quiet / onset targets is enormous; without
   warmup the spectral term swamps the TF anchor and the model fails to
   enter a learnable basin.

Threading:
- train/train.py: tf_var precomputed in the spectral_rollout setup block,
  spec_warmup_epochs ramps lam_spec_t each step, logs Loss/total and
  Train/lam_spec_t to TensorBoard.
- train/spectral_rollout.py: spectral_rollout_step accepts tf_var (uses
  it instead of per-batch variance when provided).
- train/model_cv.py: model_seed_cv_spectral accepts spec_warmup_epochs and
  forwards it to train().
- examples/train_poly_spectral_blk445.py: --spec-warmup-epochs CLI flag,
  persisted in selected_model.json manifest.
- docs/spectral_rollout_blk445.md: short section explaining when to set
  spec_warmup_epochs to 0 (resuming a checkpoint past warmup).

Smoke run (1 seed, 2 epochs, 64 segs, n_layers=2, H=256) on day85:
  PIPELINE DONE, manifest + WAV written, test breakdown bounded_frac=1.0
  -- the soft-tanh saturation + cold-start noise IC integrate cleanly
  even at random init. (autonomy still -11 at this microscopic budget;
  real numbers need the full 50-epoch / 4-seed run.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The save_model 5-keep policy evicts older checkpoints once a 6th is written.
For a 50-epoch run at --save-freq 1 that means only checkpoints 46-50 survive,
which loses the ability to revisit / resume from any earlier epoch (and the
single-seed 50-epoch monitoring run we're about to kick off wants to keep all
of them so the user can identify when the model actually converged).

Threaded max_saved through train() (default 5, backward-compat), through
model_seed_cv_spectral, and added --max-saved to the entry point (default 60
-- enough to retain every epoch of a 50-epoch run).

Verified end-to-end with --save-freq 1 --max-saved 10: both checkpoint_0 and
checkpoint_1 are retained on disk after training (vs. checkpoint_1 only with
the prior default).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/monitor_spectral_diagnose.py: latest_checkpoint() sorted glob results
lexicographically, so once the epoch counter went into double digits
checkpoint_9.tar sorted AFTER checkpoint_15.tar and the monitor silently
stopped emitting NEW_CKPT / PLATEAU events on every ckpt past epoch 9. Sort
by parsed epoch number instead.

scripts/plot_blk445_specgram.py: new helper that loads a checkpoint, picks the
same held-out cold-start val voc autonomy_score evaluates (rng seed 1234,
voc-idx CLI flag), runs integrate_poly_autonomous (or generate_autonomous
with --rescale), and writes a 2x2 figure (target waveform + spectrogram on
top, autonomous on bottom). Saves to <seed_dir>/specgram_ckpt<N>_voc<I>.png.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mous

The autonomous waveform now inherits the target's ylim so the reconstruction
is read on the target's amplitude scale. For rescaled plots this makes the
match easy to judge; for raw plots it visually clips a saturated rollout /
flattens a collapsed one, which is itself informative.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/monitor_spectral_diagnose.py: --seed-dir, --state-path, --train-pattern
CLI flags so a second monitor instance can watch a parallel run dir without
copying the script. Defaults preserve the legacy single-run behavior.

scripts/plot_loss_panels.py: --smooth defaults to 'auto', picking
max(1, min(5, shortest_run_epochs // 3)) so a short overlay (e.g. env-run with
3 epochs vs live-run with 27) keeps multiple points instead of dissolving
under valid-mode convolve. Integer override still works.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a faint dashed vertical line at the first epoch where Train/lam_spec_t
reaches its max (detected from the TB stream). Drawn in the run's color on
every panel so readers know that pre-boundary total-loss values are weighted-
ramp artifacts of spec_warmup, not signal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
spectral_rollout_step and teacher_forced_rollout each called model.get_funcs
(all three Mamba encoders) on identical (x, dxdt, dt) every training step --
once for the teacher-forced anchor, once for the rollout. The drives are
deterministic in their inputs, so the second pass was pure waste.

Add a `drives` parameter to teacher_forced_rollout and pass the
(omega, gamma, weights, z2) already encoded for the TF anchor. Backprop through
the single shared forward sums the TF and spectral gradients exactly as the two
separate forwards did, so this is gradient-equivalent -- it just halves the
encoder forwards per spectral-rollout step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four correctness-preserving speedups to the per-step training path:

- RK4 rollout: hoist per-substep constants out of the 4x-called inner f() --
  square omega once over the horizon and slice ga[:,k]/w[:,k] once per step
  instead of re-slicing/re-squaring on every RK4 substep; reuse the kernel's
  cached powers vector instead of allocating a fresh arange each rollout.
  Applied in both spectral_rollout.py and rollout_refine.py.
- Dataloader: store batches as float32 instead of float64 (halves H2D bytes)
  and add pin_memory=True + non_blocking=True host->device transfers so copies
  overlap compute.
- Logging: collapse 5 per-step .item() device->host syncs (one a duplicate of
  out["spec"]) into a single torch.stack(...).tolist().
- MRSTFT: lru_cache the STFT Hann window (depends only on n_fft, was rebuilt
  ~6x per step).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
train/train.py: --env-warmup-epochs (default 0) linearly ramps lam_env from
0 to its target over the first N epochs, mirroring spec_warmup_epochs.
Needed when running large lam_env (1e4+) so the random-init env gradient
doesn't blow up params before the TF anchor has stabilized. Logs
Train/lam_env_t to TB so the ramp is visible.

train/model_cv.py: env_warmup_epochs threaded through model_seed_cv_spectral.

examples/train_poly_spectral_blk445.py: --env-warmup-epochs CLI flag,
persisted in selected_model.json manifest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
env1e4 was falling through to matplotlib's default color cycle, which starts
at tab:blue and collided with live. Add an explicit entry so the three
concurrent runs are distinguishable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The torch index pin used bare top-level `[sources]`/`[[index]]` tables, which uv
does not recognize (the schema is `[tool.uv.sources]` / `[[tool.uv.index]]`). So
the directive was silently ignored and the lock resolved torch 2.8.0 from PyPI --
the +cu128 build, whose binaries start at sm_70 and fail on Pascal cards (e.g. the
GTX 1080 Ti, sm_61) with "no kernel image is available for execution on the device".

Move the config under `[tool.uv.*]` and relock so torch resolves to 2.8.0+cu126
(ships sm_61 kernels) on Linux, CPU wheel elsewhere. Drop the unused torchvision
source. Requires uv >= 0.4.23 for named indexes. Verified: a fresh `uv sync` yields
torch 2.8.0+cu126 and runs a CUDA matmul on the 1080 Ti.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n buckets

The autonomous RK4 rollout is a sequential Python loop over the horizon H on tiny
tensors, so it is launch-overhead-bound (~3.2 s/step eager at H=2000 on a contended
1080 Ti). Add a `rollout_backend` switch threaded through train() ->
spectral_rollout_step() -> teacher_forced_rollout():

  - 'eager'     : the existing Python loop (default; unchanged behavior).
  - 'cudagraph' : torch.cuda.make_graphed_callables captures a fwd+bwd CUDA graph and
                  replays it (~3.4x measured at H=400). Works on Pascal+ (no Triton).
                  Falls back to eager if capture fails (e.g. OOM), cached so it does
                  not re-attempt every step.
  - 'compile'   : torch.compile(mode='reduce-overhead'); needs CUDA capability >= 7.0
                  for the Triton backend, else auto-falls back to 'cudagraph'.

The RK4 core is factored into `_rk4_core_factory(H, powers)` so all three backends
share one implementation; graphs/compiled fns are cached per (H, batch, dtype).

Graphs require static shapes (one capture per distinct H), so add a 'pow2' horizon
schedule: factor-of-2 buckets from H_min to H_max (final capped at H_max), spread
evenly across the same n_epochs ramp -- e.g. 512->1024->2000 over the run, just 3
distinct horizons. train() warns if a graph backend is paired with a non-bucketed
schedule.

Verified on the GTX 1080 Ti (cu126): eager matches the original loop bit-for-bit
(fwd + all grads, max|Δ|=0), cudagraph matches eager (fwd + grads, max|Δ|=0), and
'compile' correctly falls back to cudagraph on sm_61. Default stays 'eager', so
training behavior is unchanged unless opted in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… via CLI

Bump the spectral-rollout H_max default 2000 -> 2048 so the pow2 horizon schedule
yields clean factor-of-2 buckets [512, 1024, 2048] with no capped final bucket
(2048 < segment length L, which is context-len 0.05s + 25ms silence pre/suffix).
Updated in train.train, train.model_cv, the example CLI default, and the docstring.

Thread `rollout_backend` from the example CLI through model_seed_cv_spectral to
train(), and add 'pow2' to the --H-schedule choices so the bucketed schedule (needed
by the graphed backends) is selectable. Default stays 'eager'; recorded in the run
manifest.

NOTE: a contended benchmark on the shared GTX 1080 Ti (5 concurrent seed jobs, GPU at
100% util) showed no cudagraph speedup -- CUDA-graph capture OOM'd and fell back to
eager, and a saturated GPU hides the launch overhead graphs remove anyway. cudagraph
is expected to help only on an underutilized/single-process GPU, so it stays opt-in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Express the autonomous RK4 rollout as a scan (the jax.lax.scan idiom): a single-step
combine_fn `_rk4_step(carry=(xc,xp), x=(om2k,gak,w_k)) -> ((xc',xp'), xc')`, driven by
`_scan_rollout`. Two drivers:

  - use_hop=True: torch._higher_order_ops.scan under torch.compile -- lowers the step
    body ONCE instead of Dynamo-unrolling the H-step Python loop, which is what makes
    the whole-loop 'compile'/'cudagraph' backends infeasible at training horizons
    (unroll blowup / ~15 MB/step capture memory -> ~30 GB at H=2048). Needs sm>=70.
  - use_hop=False: the identical eager fold (same fold semantics, no compile) -- runs
    everywhere, including Pascal where torch.compile is unavailable.

Wired as rollout_backend='scan': sm>=70 compiles the scan HOP; otherwise it warns and
runs the eager fold. Added to the example CLI choices. Default stays 'eager'.

NOTE: experimental and NOT speed-validated -- the scan HOP requires a working
torch.compile, which this Pascal GTX 1080 Ti (sm_61) lacks, so it can only run as the
eager fold here. Correctness verified on-GPU: the eager-fold scan core is bit-identical
to the whole-loop eager core in forward AND gradients (max|Δ|=0), both at the core level
and through teacher_forced_rollout. Speed needs an sm>=70 card to evaluate; this is a
sequential scan (no time parallelism), so the win there is compile/fusion + no Python
loop overhead, not parallelism.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add explicit color for the lam_env=1e5 run so it's distinguishable from
the green env1e4 line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Default init gives |gamma| ~ 0.4 with random per-seed sign (envelope tau
~0.1 ms), so ~half of seeds are purely dissipative -- any oscillation dies
instantly and the model can only AM-modulate input/IC noise. osc_init makes
the damping a small NEGATIVE constant (slow energy injection) and seeds an
amplitude-dependent re-damping (+y^2*ydot) plus a hardening cubic (+y^3) so
growth saturates into a bounded limit cycle. Control-head weights for the
seeded terms are zeroed so they start as clean constants the encoders learn
to modulate; only the structural prior is injected.

Threaded through model_seed_cv_spectral and exposed as --osc-init (default
off; baseline behavior unchanged). Smoke-tested: with lam_env=1e5 and pow2
horizon buckets the training step stays finite/bounded, and from a noise IC
osc_init ignites a bounded oscillation (rms 4.2e-3 -> 5.7e-2) where the
default init decays to zero.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add explicit color for the --osc-init lam_env=1e5 run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add env_loss_log to train/rollout_refine.py: |log((env(a)+eps)/(env(g)+eps))|
averaged over time and batch. Symmetric in (auto, target) scale, dB-natural,
no trivial-silent-floor like the linear env_loss (which lets the optimizer
satisfy the penalty by shrinking the rollout amplitude past target).

Threaded lam_env_log (weight) and env_log_eps (noise floor) through
spectral_rollout_step, train(), model_seed_cv_spectral, and the entry
script. Reuses env_warmup_epochs to ramp lam_env_log alongside lam_env.
Logs Loss/env_log + Train/lam_env_log_t to TB when active.

Use case: the current --lam-env=1e5 osc-init run pulled the rollout from
~28x loud (osc-init default) past target into ~1430x quiet (decay basin) --
env_loss can't tell those apart in its tail. env_loss_log treats both
symmetrically (|log K| same for K and 1/K) so the optimizer can't escape
into the decay basin.

eps trade-off: smaller eps -> closer to amp_pen, but huge gradient on
silence-prefix samples; larger eps -> bounded gradient near silence but
loses sensitivity to extreme decay. Default 1e-4 matches normalized blk445's
noise floor; effective |log| cap is ~log(env(g)/eps).

Verified end-to-end with a synthetic test:
  K=1     -> env_loss=0,     env_loss_log=0
  K=28    -> env_loss=27,    env_loss_log=3.31  (matches |log 28|=3.33)
  K=1/1430-> env_loss=0.999, env_loss_log=3.66  (eps-capped; |log 1430|=7.27)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add explicit colors for the two osc-init experiments comparing the new
log-ratio envelope loss against no envelope penalty.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant