Skip to content

Fix IndexError in mut_param_dataset_correlation for 1-column groups#239

Open
jaredgalloway wants to merge 1 commit into
mainfrom
fix/mut-param-correlation-1col
Open

Fix IndexError in mut_param_dataset_correlation for 1-column groups#239
jaredgalloway wants to merge 1 commit into
mainfrom
fix/mut-param-correlation-1col

Conversation

@jaredgalloway

Copy link
Copy Markdown
Member

Summary

Fixes a latent IndexError in ModelCollection.mut_param_dataset_correlation that surfaces when a (mut_param, x) group has only one replicate with a non-zero entry — a regime that's reached under sparse shift solutions (e.g. continuation-strategy fits with strong fusion regularization).

The bug was at multidms/model_collection.py:1424:

"correlation": replicate_params_df.T.corr().iloc[0, 1] ** r,

When replicate_params_df.T has only one column, .corr() returns a 1×1 DataFrame and iloc[0, 1] raises:

IndexError: index 1 is out of bounds for axis 0 with size 1

What changed

  • Extract the per-group correlation reduction into a small helper _pairwise_correlation(replicate_params_df, r) that returns NaN for 1-column groups instead of indexing past the matrix bounds.
  • Add unit tests for both branches: the 2-column happy path and the 1-column regression case.

How this was discovered

Running the spike pipeline with strategy: continuation, tol=1e-5, maxiter=100, beta0_ridge=1e-3 against the prod fusionreg grid produced sparse shift solutions where some (mut_param, fusionreg) cells had a single surviving replicate, tripping the latent bug. The independent-strategy run on the same hyperparameters did not hit it because its converged solutions are denser. Fit pickles from the failed run were preserved, so re-running only the evaluate rule reproduces the issue cheaply.

Test plan

  • pixi run fmt-check — clean
  • pixi run lint — clean
  • pixi run pytest tests/test_model_collection.py -k "pairwise_correlation or mut_param_dataset_correlation" — 5 passed
    • test_pairwise_correlation_two_replicates (new)
    • test_pairwise_correlation_one_replicate_returns_nan (new regression test)
    • test_mut_param_dataset_correlation (existing, still green)
    • test_mut_param_dataset_correlation_return_data_r1 (existing, still green)
    • test_mut_param_dataset_correlation_return_data_r2 (existing, still green)
  • CI: ruff + black + full pytest suite + docs build
  • End-to-end verification: re-run spike pipeline evaluate rule against the cached continuation fit_collection.pkl and confirm it now completes

🤖 Generated with Claude Code

`ModelCollection.mut_param_dataset_correlation` crashed with
`IndexError: index 1 is out of bounds for axis 0 with size 1` when a
`(mut_param, x)` group reduced to a single column — i.e. when only one
of the two replicates had a non-zero entry for that mutation. This
surfaces under sparse shift solutions (e.g. continuation-strategy fits
with strong fusion regularization).

Extract the per-group correlation reduction into a helper
`_pairwise_correlation` that returns NaN for 1-column groups instead
of indexing past the matrix bounds. Add unit tests covering both the
2-column happy path and the 1-column regression case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant