Change beta0_ridge to standard ridge on β0 magnitude#238
Closed
jaredgalloway wants to merge 1 commit into
Closed
Conversation
Replace `_beta_ridge_penalty` with a standard ridge on β0 — summing β0**2 across all conditions (reference included). Previously the penalty was on the squared difference between each non-reference β0 and the reference, which was asymmetric and left the reference unanchored. Extract `_beta_ridge_penalty` to a module-level function so it can be unit-tested directly. The closure inside `fit()` no longer duplicates it. Add three new correctness tests: - gradient check: grad = 2 * r * β0[d] for every d - label-permutation invariance: penalty unchanged when conditions relabel - zero-at-origin: penalty is 0 iff all β0 are 0, with explicit formula Rewrite `test_beta0_ridge_penalty` to check magnitude shrinkage across all conditions (old semantics checked differences, which was the old invariant, not the new one). Fix `tuple(fit_config["beta_clip_range"])` crash when YAML value is `null` — affects all three experiment helpers (scv2-spike, simulation, loss-normalization). Introduce a shared `_clip_range` helper that returns `None` untouched and otherwise tuple-wraps. Update production spike config (config.yaml): - tol: 1e-4 → 1e-5 (top-level, ge_kwargs, cal_kwargs) - maxiter: 50 → 75 (top-level, ge_kwargs, cal_kwargs) - l2reg: 0.0 → 1e-7 - beta0_ridge: 0.0 → 1e-3 (new semantics) - beta_clip_range: [-10, 10] → null (clipping disabled) Reset `beta0_ridge` in config_experimental.yaml and config_test.yaml to 0.0 — the old 1e-4 values were calibrated for the difference-from- reference penalty and are not directly transferable. Update CLAUDE.md with notes on the new beta0_ridge semantics and the correct YAML encoding for disabling beta_clip_range. Closes #237 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
|
Closing without merging — failed experiment. Keeping the branch ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_beta_ridge_penaltyas a standard ridge: sumβ0**2across all conditions (reference included) timesbeta0_ridge. The old form penalized non-reference β0 difference from the reference — asymmetric under relabeling, and left the reference unanchored._beta_ridge_penaltyto a module-level function so it's unit-testable directly (was a closure insidefit).2 * r * β0[d]for everyd; penalty is invariant under condition relabeling; penalty is 0 iff every β0 is 0, with the formula checked against the model-under-test.test_beta0_ridge_penalty— the old assertion (differences shrink) was literally the old invariant, so it had to change.tuple(fit_config["beta_clip_range"])crash when YAML setsbeta_clip_range: null. Shared fix across three experiment helpers (scv2-spike, simulation, loss-normalization): a_clip_rangehelper returnsNoneuntouched, tuple-wraps otherwise.config.yaml):tol 1e-4 → 1e-5,maxiter 50 → 75(top-level +ge_kwargs+cal_kwargs);l2reg 0 → 1e-7;beta0_ridge 0 → 1e-3(new semantics);beta_clip_range [-10, 10] → null(disables clipping, replaced by smooth regularization on the intercept).beta0_ridgeinconfig_experimental.yamlandconfig_test.yamlto0.0— the existing1e-4values were calibrated for the old difference-from-reference semantics and are not directly transferable.CLAUDE.mdwith the newbeta0_ridgesemantics and the correct YAML encoding for disablingbeta_clip_range(singlenull, not[null, null]).Test plan
pixi run lint— cleanpixi run fmt-check— cleanpixi run test(pytest --doctest-modules) — 185 passed, including 4 new correctness testsconfig.yaml— DAG constructs cleanly withbeta_clip_range: nullTypeErrorfrom clip handlingProduction run results
Executed
pixi run remote-pipeline -- spike prod host=orca04. Pipeline completed cleanly; results preserved underexperiments/scv2-spike/results-prod-237-beta0-ridge-magnitude/in the main clone (gitignored per repo convention, matching howresults-prod-232-*andresults-prod-235-*are handled).Training loss at
fusionreg=0.0, rep_1(first row ofcross_validation_loss.csv):l2reg-1e-7(closest baseline)tight-toltimes-seen-thresholdThe new run lands squarely within the baseline range — a touch lower than the most directly comparable baseline (#232
l2reg-1e-7), consistent with tightertol=1e-5and longermaxiter=75producing slightly better convergence. All per-condition losses finite, CV completed across the full fusionreg grid.Closes #237
🤖 Generated with Claude Code