Skip to content

Add times_seen_threshold slider to dashboard and set default=1 in evaluate notebooks#236

Merged
jaredgalloway merged 1 commit into
mainfrom
235-times-seen-threshold
Apr 20, 2026
Merged

Add times_seen_threshold slider to dashboard and set default=1 in evaluate notebooks#236
jaredgalloway merged 1 commit into
mainfrom
235-times-seen-threshold

Conversation

@jaredgalloway

Copy link
Copy Markdown
Member

Summary

  • Adds a times_seen_threshold slider (default 1, range 0–20) to experiments/dashboard.py. The slider appears in both the Param Correlation and Replicate Scatter tabs and is wired into mc.mut_param_dataset_correlation(times_seen_threshold=...) and mc.split_apply_combine_muts(..., times_seen_threshold=...).
  • Sets times_seen_threshold=1 at all three call sites in experiments/scv2-spike/notebooks/evaluate.ipynb (mut_param_dataset_correlation, combine_replicate_muts, split_apply_combine_muts).
  • Sets times_seen_threshold=1 at both call sites in experiments/simulation/notebooks/evaluate.ipynb (split_apply_combine_muts, mut_param_dataset_correlation).
  • Sets beta0_ridge: 0.0 in experiments/scv2-spike/config/config.yaml for the production run.

Motivation

Mutations whose reference-condition times_seen == 0 in one replicate land at exactly β = 0 (zero-initialized, no data to drive the estimate), producing prominent horizontal/vertical stripes in replicate-scatter panels that artificially deflate correlation estimates. threshold=1 removes these — it is the weakest filter that excludes zero-evidence mutations. Setting threshold=0 on the slider exactly reproduces pre-change behavior.

Backward compatibility

CSV outputs (library_replicate_correlation.csv, mutations_df.csv, collection_muts.csv) will have fewer rows after re-running the pipeline — the removed rows are exactly those with min(times_seen_*) == 0. Downstream consumers (manuscript_figures.ipynb) should be re-run; changing correlation values are expected and indicate the filter is working.

Test plan

  • ruff check . and black --check . pass (confirmed locally)
  • Launch dashboard against a spike fit_collection.pkl; verify slider renders in both tabs, threshold=0 reproduces old behavior, threshold=1 eliminates β=0 stripes
  • Re-run spike pipeline; verify CSV row counts drop and no errors

Closes #235

🤖 Generated with Claude Code

…#235)

- Add mo.ui.slider for times_seen_threshold (default 1, range 0–20) to
  dashboard.py; wire it into Param Correlation and Replicate Scatter tabs
  to filter mutations unseen in any condition before correlation is computed.
- Set times_seen_threshold=1 in scv2-spike evaluate.ipynb at all three
  call sites: mut_param_dataset_correlation, combine_replicate_muts, and
  split_apply_combine_muts.
- Set times_seen_threshold=1 in simulation evaluate.ipynb at both call
  sites: split_apply_combine_muts and mut_param_dataset_correlation.
- Set beta0_ridge=0.0 in scv2-spike config.yaml for production run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jaredgalloway jaredgalloway merged commit 581e6f4 into main Apr 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add times_seen_threshold control to dashboard and set default of 1 in evaluate notebooks

1 participant