Skip to content

Remove redundant per-variant normalization in CV notebooks#233

Merged
jaredgalloway merged 1 commit into
mainfrom
231-remove-redundant-cv-normalization
Apr 17, 2026
Merged

Remove redundant per-variant normalization in CV notebooks#233
jaredgalloway merged 1 commit into
mainfrom
231-remove-redundant-cv-normalization

Conversation

@jaredgalloway

Copy link
Copy Markdown
Member

Summary

  • Both CV notebooks divided total_loss_{training,validation} by a sample-count dict, but PR Normalize functional score loss per variant within each condition #230 already made functional_score_loss a per-variant .mean() — so the division was doubly-normalizing, silently diverging from the "mean Huber loss per variant" axis label.
  • Fix: drop the n_samples construction and the division in experiments/simulation/notebooks/cross_validation.ipynb and experiments/scv2-spike/notebooks/cross_validation.ipynb; alias mean_loss = loss so the CSV schema (and manuscript_figures.ipynb, which reads mean_loss) is preserved.
  • Delete experiments/scv2-spike/config/config_mean_loss.yaml and its Snakefile profile branch — the default config.yaml already absorbed the V0.4.0-anchored grid, so the profile was a duplicate.
  • Update CLAUDE.md's "Key design patterns" to warn that total_loss_* columns are per-variant averages.

Validation

Ran the test profile of both pipelines locally:

  • Simulation: 24 CV rows, mean_loss == loss row-wise, loss values 4.93–6.09 (per-variant Huber scale, not sum-scale).
  • Spike: 8 CV rows, mean_loss == loss row-wise, loss values 2.53–3.39 (per-variant scale).
  • Neither output CSV leaks an n_samples column.

pixi run lint, pixi run fmt-check, and the full pixi run test suite (182 passed) are clean on this branch.

Test plan

  • pixi run sim-test completes, CV CSV has mean_loss == loss
  • pixi run spike-test completes, CV CSV has mean_loss == loss
  • pixi run lint
  • pixi run fmt-check
  • pixi run test
  • CI rerun of all three Python versions

Closes #231

🤖 Generated with Claude Code

Since PR #230, functional_score_loss is already a per-variant .mean(),
so total_loss_{training,validation} on ModelCollection.fit_models are
already per-variant averages. Both CV notebooks were dividing by
sample count again, producing doubly-normalized values that no
longer matched the "mean Huber loss per variant" axis label.

- Drop the n_samples dict and the `loss / n_samples` division in both
  experiments/simulation/notebooks/cross_validation.ipynb and
  experiments/scv2-spike/notebooks/cross_validation.ipynb
- Keep mean_loss as an alias for loss in the emitted CSVs so
  downstream readers (manuscript_figures.ipynb) pick up the fix
  without schema changes
- Delete experiments/scv2-spike/config/config_mean_loss.yaml (its grid
  was already absorbed into the default config.yaml) and drop the
  corresponding profile branch from the spike Snakefile
- Update CLAUDE.md to warn that total_loss_* columns are per-variant

Closes #231

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jaredgalloway jaredgalloway merged commit cb929ec into main Apr 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove redundant per-variant normalization in CV notebooks

1 participant