Add per-sample spectral whitening option to the energy-score loss#1303
Draft
mcgibbon wants to merge 2 commits into
Draft
Add per-sample spectral whitening option to the energy-score loss#1303mcgibbon wants to merge 2 commits into
mcgibbon wants to merge 2 commits into
Conversation
… loss EnergyScoreLoss gains a spectral_whitening='per_sample' mode that reweights the per-(l,m) energy score by the inverse per-degree RMS amplitude of each target sample (computed over valid orders m<=l, detached, magnitude-preserving so energy_score_weight keeps its meaning). This flattens each target sample's spectrum so high-l (small-scale) modes are no longer starved by the red spectrum, raising their gradient SNR. Default 'none' is a no-op (backward compatible). Threaded through EnsembleLoss via energy_score_whitening / energy_score_whitening_eps_frac kwargs. Tests cover the no-op, white-target invariance, small-scale boost, magnitude preservation, and config wiring.
mcgibbon
commented
Jun 22, 2026
| def __init__( | ||
| self, | ||
| sht: Callable[[torch.Tensor], torch.Tensor], | ||
| spectral_whitening: str = "none", |
Contributor
Author
There was a problem hiding this comment.
Make this a Literal[(options)] instead of str
| finite_difference_crps_weight: float = 0.0, | ||
| finite_difference_crps_levels: int = 1, | ||
| almost_fair_crps_alpha: float = 1.0, | ||
| energy_score_whitening: str = "none", |
Contributor
Author
There was a problem hiding this comment.
Similar type comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Small-scale (high-
l) spectral power converges very slowly in SFNO training because the loss lives in a red (grid) spectrum: high-lmodes carry tiny amplitude, so they contribute little to the energy score and their dedicateddhconvfilter weights see proportionally small, low-SNR gradients. A per-lgradient probe on a trained checkpoint confirmed an ~8× monotone decline of high-lgradient magnitude (and the energy score, though computed per(l,m)mode, uses uniformmode_weights, so it does not correct this).This PR adds an opt-in
spectral_whitening='per_sample'mode toEnergyScoreLossthat reweights the per-(l,m)energy score by the inverse per-degree RMS amplitude of each target sample. This flattens each target's angular power spectrum so high-lmodes are no longer starved, raising their gradient SNR. The factor is computed from the detached target coefficients (no new gradient path) and is magnitude-preserving (rescaled per(sample, channel)so the overall energy-score magnitude — and the meaning ofenergy_score_weight— is unchanged; only the per-scale balance shifts). A white-spectrum target yields a uniform factor (no-op). Default'none'is bit-for-bit backward compatible.An A/B validation run (a perturbation of a
fg16 sr0.125residual SFNO baseline, identical except whitening enabled) showed the targeted effect with neutral skill:smallest_scale_norm_biasreaches the baseline's final (epoch-120) value by epoch ~11 and finishes ~2× lower (0.031 vs 0.061). Inference-time spectral bias also improves substantially (e.g.10yearh500 1.23 → 0.40, TMP850 0.13 → −0.006).Changes:
fme.core.loss.EnergyScoreLoss— addspectral_whitening/whitening_eps_fracargs and_spectral_whitening_factor(per-(sample, channel, l), detached, magnitude-preserving, floored atwhitening_eps_fracof the per-sample mean degree amplitude); applied as a multiplicative reweight aftermode_weights.fme.core.loss.EnsembleLoss— threadenergy_score_whitening/energy_score_whitening_eps_frackwargs through toEnergyScoreLoss.fme.core.test_loss— tests for the no-op default, white-target invariance, small-scale boost, magnitude preservation, and config wiring.Tests added
If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated
Resolves # (delete if none)