Skip to content

Low-impact performance optimizations for all SuStaIn classes#84

Draft
gdevenyi wants to merge 15 commits into
ucl-pond:masterfrom
gdevenyi:performance-glm-5-1-low-impact
Draft

Low-impact performance optimizations for all SuStaIn classes#84
gdevenyi wants to merge 15 commits into
ucl-pond:masterfrom
gdevenyi:performance-glm-5-1-low-impact

Conversation

@gdevenyi

@gdevenyi gdevenyi commented Apr 21, 2026

Copy link
Copy Markdown

Performance analysis with GLM 5.1

All work here is validated to pass the test suite provided after each change.

Summary

Surgical, numerically-validated performance optimizations across all 5 SuStaIn subclasses. No algorithmic changes, no API changes, no cross-class interface changes. All changes validated against reference CSV files for all 5 classes.

Changes

Constant / precomputation caching

  • ZscoreSustain: Cache _log_factor and _log_coeff invariants in __init__ instead of recomputing per call.
  • ZScoreSustainMissingData: Cache ~np.isnan(data) NaN mask, replace np.tile with broadcasting, use scalar log_p_missing.

Algorithmic-strength reduction (same math, cheaper computation)

  • OrdinalSustain: Replace O(J×B) full product per stage with O(J) running log-space sum in _calculate_likelihood_stage. Correctly handles multi-event biomarkers via biomarker_active_score tracking.
  • ZscoreSustain: Eliminate (J,I,K+1) intermediate tensor in _calculate_likelihood_stage — loop over I biomarkers accumulating into (J,K+1).
  • MixedTypeSustain: Replace scipy.stats.norm.pdf(x) with np.exp(-0.5*x*x) / sqrt(2π) — avoids scipy dispatch overhead.
  • AbstractSustain: Replace p_perm_k * f_broadcast + np.sum with np.einsum('jks,s->jk', ...) — eliminates 2 temporary (M,K+1,N_S) allocations per call.

Memory allocation reduction

  • MCMC sequence moves (all 5 classes): Replace np.delete + np.concatenate with in-place array slice construction.
  • EM trace arrays (AbstractSustain._perform_em): Replace NaN-pre-allocated (100,N,N_S) arrays with list.append, convert to array at return. EM typically converges in 10–20 iterations, so ~80% of pre-allocated space was wasted.

Pandas 2.0 compat

  • Fix DataFrame.append removal in sim/simrun.py.

Broadcasting / numpy modernization (all classes)

  • Replace np.tile with broadcasting, np.argsort for sorting, np.logical_and/np.argmax with dtype awareness.

Validation

All changes pass numerical validation against reference outputs for all 5 classes:

MixtureSustain test passed!
ZscoreSustain test passed!
OrdinalSustain test passed!
ZscoreSustainMissingData test passed!
MixedTypeSustain test passed!

Commits (15)

Commit Description
1f3804a Broadcasting + vectorize in AbstractSustain
a97e9fb Broadcasting + np.argsort in ZscoreSustain
df9f6e3 Broadcasting + np.argsort in OrdinalSustain
87a3a45 Broadcasting + np.argsort in MixtureSustain
b673f01 Broadcasting + np.argsort in ZScoreSustainMissingData
524d384 Broadcasting + np.argsort in MixedTypeSustain
61b7770 Fix DataFrame.append removal (pandas 2.0)
708f093 Cache invariant constants in ZscoreSustain.init
69a53b1 Manual Gaussian in MixedTypeSustain
ded69ce O(J) running log-sum in OrdinalSustain
771b202 NaN mask caching + broadcasting in ZScoreSustainMissingData
1fff230 Eliminate (J,I,K+1) tensor in ZscoreSustain
5c8b10d einsum in AbstractSustain._calculate_likelihood
d4c5072 In-place array moves in MCMC (all 5 classes)
43b78df list.append for EM trace arrays

gdevenyi added 15 commits April 21, 2026 13:10
- Replace np.tile+transpose for f_opt with reshape+broadcasting
- Replace np.tile normalization with keepdims=True
- Replace Python sum() with np.sum()
- Replace np.nan*np.ones with np.full
- Vectorize subtype/stage argmax with np.argmax+np.take_along_axis
- Vectorize cluster assignment with np.argmax
- Replace O(M^2) CV index listcomp with boolean mask
- Replace np.where(x==max(x)) with np.argmax
- Vectorize MCMC permutation inversion with np.argsort
…reSustain

- Replace np.tile+transpose for f_opt with reshape+broadcasting
- Replace np.array([0]*N) inverse permutation with np.argsort
- Replace .astype(int)+==2 with np.logical_and
- Replace np.array(range(N)) with np.arange(N)
- Replace np.exp(ratio)<random() with log_ratio<np.log(random())
- Replace np.where(x==max(x)) with np.argmax
- Cache possible_biomarkers and use np arrays for constants
…ustain

- Replace np.tile+transpose for f_opt with reshape+broadcasting
- Replace np.array([0]*N) inverse permutation with np.argsort
- Replace .astype(int)+==2 with np.logical_and
- Replace np.array(range(N)) with np.arange(N)
- Replace Python sum() with np.sum()
- Replace np.exp(ratio)<random() with log_ratio<np.log(random())
- Replace np.where(x==max(x)) with np.argmax
- Use dtype=bool for IS_normal/IS_abnormal, eliminating .astype(bool)
…stain

- Replace np.tile+transpose for f_opt with reshape+broadcasting
- Replace np.zeros+index assignment with np.argsort for inverse permutation
- Replace np.where(x==np.max(x)) with np.argmax
- Replace np.exp(ratio)<random() with log_ratio<np.log(random())
- Replace .astype(int)+==2 with np.logical_and
…reSustainMissingData

- Replace np.tile+transpose for f_opt with reshape+broadcasting
- Replace np.array([0]*N) inverse permutation with np.argsort
- Replace .astype(int)+==2 with np.logical_and
- Replace np.array(range(N)) with np.arange(N)
- Replace np.exp(ratio)<random() with log_ratio<np.log(random())
- Replace np.where(x==max(x)) with np.argmax
…dTypeSustain

- Replace np.tile+transpose for f_opt with reshape+broadcasting
- Replace np.array([0]*N) inverse permutation with np.argsort
- Replace .astype(int)+==2 with np.logical_and
- Replace np.array(range(N)) with np.arange(N)
- Replace np.exp(ratio)<random() with log_ratio<np.log(random())
- Replace np.where(x==max(x)) with np.argmax
Replace deprecated df.append() with pd.concat() in save_time()
Cache _log_factor and _log_coeff as instance attributes instead of
recomputing them on every _calculate_likelihood_stage call. These
constants (log normalisation factor and log uniform coefficient) are
called O(N_MCMC × C) times.
…ustain

stats.norm.pdf incurs significant overhead from input validation and
generic dispatch. For unit-variance Gaussian evaluation, the closed-form
np.exp(-0.5*x*x)/sqrt(2*pi) is 5-10x faster on array input.
…rdinalSustain

_calculate_likelihood_stage previously recomputed the full product
of prob_score and prob_nl from scratch at each of K stages. Replace
with a running log-space sum that adds/subtracts the contribution
of the biomarker that transitions at each stage. Handles multi-event
biomarkers correctly by tracking the active score index per biomarker.
- Cache ~np.isnan(data) mask once before the stage loop instead of
  recomputing it every iteration
- Replace np.tile for stage_value/sigmat with broadcasting
- Use scalar log_p_missing instead of tiled (M,B) array
- Eliminates O(K) redundant NaN checks and large intermediate copies
Replace single broadcast creating a (J, I, K+1) 3D tensor with a
loop over I biomarkers accumulating into (J, K+1). Reduces peak
memory ~3x in the hottest path, enabling larger subject counts.
Fuse p_perm_k * f_broadcast then sum into single einsum operations,
eliminating two (M, N+1, N_S) temporary allocations per call.
In both _optimise_parameters and _perform_mcmc across all 5 Sustain
classes, move element from position A to position B using direct array
slicing instead of delete+concatenate. Eliminates O(N_MCMC) small array
allocations per MCMC run.
The EM loop typically converges in 10-20 iterations but pre-allocated
(100, N, N_S) arrays were fully serialized across multiprocessing
workers. Use list.append then convert to array at end.
Copilot AI review requested due to automatic review settings April 21, 2026 21:07
@gdevenyi

Copy link
Copy Markdown
Author

Performance evaluation of the test suite before/after changes.

Before:

date,method,time
2026-04-21,MixtureSustain,18.43230628967285
2026-04-21,ZscoreSustain,85.9639208316803
2026-04-21,OrdinalSustain,63.20519685745239
2026-04-21,ZscoreSustainMissingData,134.01159358024597
2026-04-21,MixedTypeSustain,123.14559602737427

After:

date,method,time
2026-04-21,MixtureSustain,16.9381103515625
2026-04-21,ZscoreSustain,54.60138821601868
2026-04-21,OrdinalSustain,39.61232709884644
2026-04-21,ZscoreSustainMissingData,112.20350003242493
2026-04-21,MixedTypeSustain,74.5290641784668

@gdevenyi gdevenyi marked this pull request as draft April 21, 2026 21:12

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR applies low-impact performance and memory optimizations across SuStaIn model implementations (Z-score, missing-data Z-score, ordinal, mixture/event-based, mixed-type) and updates a validation utility for pandas 2.x compatibility, while aiming to preserve numerical behavior.

Changes:

  • Reduce temporary allocations and improve broadcasting/shape handling in likelihood, EM, and MCMC routines across multiple SuStaIn subclasses.
  • Speed up key likelihood computations (e.g., log-space accumulation in OrdinalSustain; per-biomarker accumulation in ZscoreSustain; einsum fusion in AbstractSustain).
  • Update validation timing CSV writing to avoid deprecated DataFrame.append.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/validation.py Replaces deprecated DataFrame.append usage with pd.concat for pandas 2.x compatibility.
pySuStaIn/ZscoreSustain.py Caches log constants; avoids large intermediate tensors; reduces allocations in optimisation/MCMC.
pySuStaIn/ZScoreSustainMissingData.py Reworks missing-data likelihood to reduce tiling and repeated allocations; broadcasting-based updates.
pySuStaIn/OrdinalSustain.py Uses running log-sum to avoid repeated full products; updates broadcasting/normalization logic.
pySuStaIn/MixtureSustain.py Uses broadcasting instead of tiling; reduces allocations in sequence-move proposals and likelihood weighting.
pySuStaIn/MixedTypeSustain.py Replaces scipy.stats.norm.pdf with a manual standard-normal pdf; broadcasting/normalization updates.
pySuStaIn/AbstractSustain.py Avoids expensive index construction in CV; replaces preallocated EM traces with lists; uses einsum for weighted sums; vectorizes subtype/stage selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pySuStaIn/AbstractSustain.py
@gdevenyi

Copy link
Copy Markdown
Author

Larger-scale testing is ongoing right now with real-world data. Will return with performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants