Remove redundant per-variant normalization in CV notebooks by jaredgalloway · Pull Request #233 · matsengrp/multidms

jaredgalloway · 2026-04-17T18:15:45Z

Summary

Both CV notebooks divided total_loss_{training,validation} by a sample-count dict, but PR Normalize functional score loss per variant within each condition #230 already made functional_score_loss a per-variant .mean() — so the division was doubly-normalizing, silently diverging from the "mean Huber loss per variant" axis label.
Fix: drop the n_samples construction and the division in experiments/simulation/notebooks/cross_validation.ipynb and experiments/scv2-spike/notebooks/cross_validation.ipynb; alias mean_loss = loss so the CSV schema (and manuscript_figures.ipynb, which reads mean_loss) is preserved.
Delete experiments/scv2-spike/config/config_mean_loss.yaml and its Snakefile profile branch — the default config.yaml already absorbed the V0.4.0-anchored grid, so the profile was a duplicate.
Update CLAUDE.md's "Key design patterns" to warn that total_loss_* columns are per-variant averages.

Validation

Ran the test profile of both pipelines locally:

Simulation: 24 CV rows, mean_loss == loss row-wise, loss values 4.93–6.09 (per-variant Huber scale, not sum-scale).
Spike: 8 CV rows, mean_loss == loss row-wise, loss values 2.53–3.39 (per-variant scale).
Neither output CSV leaks an n_samples column.

pixi run lint, pixi run fmt-check, and the full pixi run test suite (182 passed) are clean on this branch.

Test plan

pixi run sim-test completes, CV CSV has mean_loss == loss
pixi run spike-test completes, CV CSV has mean_loss == loss
pixi run lint
pixi run fmt-check
pixi run test
CI rerun of all three Python versions

Closes #231

🤖 Generated with Claude Code

Since PR #230, functional_score_loss is already a per-variant .mean(), so total_loss_{training,validation} on ModelCollection.fit_models are already per-variant averages. Both CV notebooks were dividing by sample count again, producing doubly-normalized values that no longer matched the "mean Huber loss per variant" axis label. - Drop the n_samples dict and the `loss / n_samples` division in both experiments/simulation/notebooks/cross_validation.ipynb and experiments/scv2-spike/notebooks/cross_validation.ipynb - Keep mean_loss as an alias for loss in the emitted CSVs so downstream readers (manuscript_figures.ipynb) pick up the fix without schema changes - Delete experiments/scv2-spike/config/config_mean_loss.yaml (its grid was already absorbed into the default config.yaml) and drop the corresponding profile branch from the spike Snakefile - Update CLAUDE.md to warn that total_loss_* columns are per-variant Closes #231 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jaredgalloway merged commit cb929ec into main Apr 17, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant per-variant normalization in CV notebooks#233

Remove redundant per-variant normalization in CV notebooks#233
jaredgalloway merged 1 commit into
mainfrom
231-remove-redundant-cv-normalization

jaredgalloway commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jaredgalloway commented Apr 17, 2026

Summary

Validation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant