Low-impact performance optimizations for all SuStaIn classes by gdevenyi · Pull Request #84 · ucl-pond/pySuStaIn

gdevenyi · 2026-04-21T21:07:18Z

Performance analysis with GLM 5.1

All work here is validated to pass the test suite provided after each change.

Summary

Surgical, numerically-validated performance optimizations across all 5 SuStaIn subclasses. No algorithmic changes, no API changes, no cross-class interface changes. All changes validated against reference CSV files for all 5 classes.

Changes

Constant / precomputation caching

ZscoreSustain: Cache _log_factor and _log_coeff invariants in __init__ instead of recomputing per call.
ZScoreSustainMissingData: Cache ~np.isnan(data) NaN mask, replace np.tile with broadcasting, use scalar log_p_missing.

Algorithmic-strength reduction (same math, cheaper computation)

OrdinalSustain: Replace O(J×B) full product per stage with O(J) running log-space sum in _calculate_likelihood_stage. Correctly handles multi-event biomarkers via biomarker_active_score tracking.
ZscoreSustain: Eliminate (J,I,K+1) intermediate tensor in _calculate_likelihood_stage — loop over I biomarkers accumulating into (J,K+1).
MixedTypeSustain: Replace scipy.stats.norm.pdf(x) with np.exp(-0.5*x*x) / sqrt(2π) — avoids scipy dispatch overhead.
AbstractSustain: Replace p_perm_k * f_broadcast + np.sum with np.einsum('jks,s->jk', ...) — eliminates 2 temporary (M,K+1,N_S) allocations per call.

Memory allocation reduction

MCMC sequence moves (all 5 classes): Replace np.delete + np.concatenate with in-place array slice construction.
EM trace arrays (AbstractSustain._perform_em): Replace NaN-pre-allocated (100,N,N_S) arrays with list.append, convert to array at return. EM typically converges in 10–20 iterations, so ~80% of pre-allocated space was wasted.

Pandas 2.0 compat

Fix DataFrame.append removal in sim/simrun.py.

Broadcasting / numpy modernization (all classes)

Replace np.tile with broadcasting, np.argsort for sorting, np.logical_and/np.argmax with dtype awareness.

Validation

All changes pass numerical validation against reference outputs for all 5 classes:

MixtureSustain test passed!
ZscoreSustain test passed!
OrdinalSustain test passed!
ZscoreSustainMissingData test passed!
MixedTypeSustain test passed!

Commits (15)

Commit	Description
`1f3804a`	Broadcasting + vectorize in AbstractSustain
`a97e9fb`	Broadcasting + np.argsort in ZscoreSustain
`df9f6e3`	Broadcasting + np.argsort in OrdinalSustain
`87a3a45`	Broadcasting + np.argsort in MixtureSustain
`b673f01`	Broadcasting + np.argsort in ZScoreSustainMissingData
`524d384`	Broadcasting + np.argsort in MixedTypeSustain
`61b7770`	Fix DataFrame.append removal (pandas 2.0)
`708f093`	Cache invariant constants in ZscoreSustain.init
`69a53b1`	Manual Gaussian in MixedTypeSustain
`ded69ce`	O(J) running log-sum in OrdinalSustain
`771b202`	NaN mask caching + broadcasting in ZScoreSustainMissingData
`1fff230`	Eliminate (J,I,K+1) tensor in ZscoreSustain
`5c8b10d`	einsum in AbstractSustain._calculate_likelihood
`d4c5072`	In-place array moves in MCMC (all 5 classes)
`43b78df`	list.append for EM trace arrays

- Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.tile normalization with keepdims=True - Replace Python sum() with np.sum() - Replace np.nan*np.ones with np.full - Vectorize subtype/stage argmax with np.argmax+np.take_along_axis - Vectorize cluster assignment with np.argmax - Replace O(M^2) CV index listcomp with boolean mask - Replace np.where(x==max(x)) with np.argmax - Vectorize MCMC permutation inversion with np.argsort

…reSustain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax - Cache possible_biomarkers and use np arrays for constants

…ustain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace Python sum() with np.sum() - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax - Use dtype=bool for IS_normal/IS_abnormal, eliminating .astype(bool)

…stain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.zeros+index assignment with np.argsort for inverse permutation - Replace np.where(x==np.max(x)) with np.argmax - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace .astype(int)+==2 with np.logical_and

…reSustainMissingData - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax

…dTypeSustain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax

Replace deprecated df.append() with pd.concat() in save_time()

Cache _log_factor and _log_coeff as instance attributes instead of recomputing them on every _calculate_likelihood_stage call. These constants (log normalisation factor and log uniform coefficient) are called O(N_MCMC × C) times.

…ustain stats.norm.pdf incurs significant overhead from input validation and generic dispatch. For unit-variance Gaussian evaluation, the closed-form np.exp(-0.5*x*x)/sqrt(2*pi) is 5-10x faster on array input.

…rdinalSustain _calculate_likelihood_stage previously recomputed the full product of prob_score and prob_nl from scratch at each of K stages. Replace with a running log-space sum that adds/subtracts the contribution of the biomarker that transitions at each stage. Handles multi-event biomarkers correctly by tracking the active score index per biomarker.

- Cache ~np.isnan(data) mask once before the stage loop instead of recomputing it every iteration - Replace np.tile for stage_value/sigmat with broadcasting - Use scalar log_p_missing instead of tiled (M,B) array - Eliminates O(K) redundant NaN checks and large intermediate copies

Replace single broadcast creating a (J, I, K+1) 3D tensor with a loop over I biomarkers accumulating into (J, K+1). Reduces peak memory ~3x in the hottest path, enabling larger subject counts.

Fuse p_perm_k * f_broadcast then sum into single einsum operations, eliminating two (M, N+1, N_S) temporary allocations per call.

In both _optimise_parameters and _perform_mcmc across all 5 Sustain classes, move element from position A to position B using direct array slicing instead of delete+concatenate. Eliminates O(N_MCMC) small array allocations per MCMC run.

The EM loop typically converges in 10-20 iterations but pre-allocated (100, N, N_S) arrays were fully serialized across multiprocessing workers. Use list.append then convert to array at end.

gdevenyi · 2026-04-21T21:10:27Z

Performance evaluation of the test suite before/after changes.

Before:

date,method,time
2026-04-21,MixtureSustain,18.43230628967285
2026-04-21,ZscoreSustain,85.9639208316803
2026-04-21,OrdinalSustain,63.20519685745239
2026-04-21,ZscoreSustainMissingData,134.01159358024597
2026-04-21,MixedTypeSustain,123.14559602737427

After:

date,method,time
2026-04-21,MixtureSustain,16.9381103515625
2026-04-21,ZscoreSustain,54.60138821601868
2026-04-21,OrdinalSustain,39.61232709884644
2026-04-21,ZscoreSustainMissingData,112.20350003242493
2026-04-21,MixedTypeSustain,74.5290641784668

Copilot

Pull request overview

This PR applies low-impact performance and memory optimizations across SuStaIn model implementations (Z-score, missing-data Z-score, ordinal, mixture/event-based, mixed-type) and updates a validation utility for pandas 2.x compatibility, while aiming to preserve numerical behavior.

Changes:

Reduce temporary allocations and improve broadcasting/shape handling in likelihood, EM, and MCMC routines across multiple SuStaIn subclasses.
Speed up key likelihood computations (e.g., log-space accumulation in OrdinalSustain; per-biomarker accumulation in ZscoreSustain; einsum fusion in AbstractSustain).
Update validation timing CSV writing to avoid deprecated DataFrame.append.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/validation.py	Replaces deprecated `DataFrame.append` usage with `pd.concat` for pandas 2.x compatibility.
pySuStaIn/ZscoreSustain.py	Caches log constants; avoids large intermediate tensors; reduces allocations in optimisation/MCMC.
pySuStaIn/ZScoreSustainMissingData.py	Reworks missing-data likelihood to reduce tiling and repeated allocations; broadcasting-based updates.
pySuStaIn/OrdinalSustain.py	Uses running log-sum to avoid repeated full products; updates broadcasting/normalization logic.
pySuStaIn/MixtureSustain.py	Uses broadcasting instead of tiling; reduces allocations in sequence-move proposals and likelihood weighting.
pySuStaIn/MixedTypeSustain.py	Replaces `scipy.stats.norm.pdf` with a manual standard-normal pdf; broadcasting/normalization updates.
pySuStaIn/AbstractSustain.py	Avoids expensive index construction in CV; replaces preallocated EM traces with lists; uses `einsum` for weighted sums; vectorizes subtype/stage selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gdevenyi · 2026-04-21T21:12:39Z

Larger-scale testing is ongoing right now with real-world data. Will return with performance reports.

gdevenyi added 15 commits April 21, 2026 13:10

Fix DataFrame.append removal in pandas 2.0+

61b7770

Replace deprecated df.append() with pd.concat() in save_time()

perf: replace scipy.stats.norm.pdf with manual Gaussian in MixedTypeS…

69a53b1

…ustain stats.norm.pdf incurs significant overhead from input validation and generic dispatch. For unit-variance Gaussian evaluation, the closed-form np.exp(-0.5*x*x)/sqrt(2*pi) is 5-10x faster on array input.

perf: eliminate (J,I,K+1) intermediate tensor in ZscoreSustain

1fff230

Replace single broadcast creating a (J, I, K+1) 3D tensor with a loop over I biomarkers accumulating into (J, K+1). Reduces peak memory ~3x in the hottest path, enabling larger subject counts.

perf: replace multiply-then-sum with einsum in _calculate_likelihood

5c8b10d

Fuse p_perm_k * f_broadcast then sum into single einsum operations, eliminating two (M, N+1, N_S) temporary allocations per call.

perf: replace NaN pre-allocated EM trace arrays with list.append

43b78df

The EM loop typically converges in 10-20 iterations but pre-allocated (100, N, N_S) arrays were fully serialized across multiprocessing workers. Use list.append then convert to array at end.

Copilot AI review requested due to automatic review settings April 21, 2026 21:07

Copilot started reviewing on behalf of gdevenyi April 21, 2026 21:07 View session

gdevenyi marked this pull request as draft April 21, 2026 21:12

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread pySuStaIn/AbstractSustain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low-impact performance optimizations for all SuStaIn classes#84

Low-impact performance optimizations for all SuStaIn classes#84
gdevenyi wants to merge 15 commits into
ucl-pond:masterfrom
gdevenyi:performance-glm-5-1-low-impact

gdevenyi commented Apr 21, 2026 •

edited

Loading

Uh oh!

gdevenyi commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gdevenyi commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gdevenyi commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance analysis with GLM 5.1

Summary

Changes

Constant / precomputation caching

Algorithmic-strength reduction (same math, cheaper computation)

Memory allocation reduction

Pandas 2.0 compat

Broadcasting / numpy modernization (all classes)

Validation

Commits (15)

Uh oh!

gdevenyi commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

gdevenyi commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gdevenyi commented Apr 21, 2026 •

edited

Loading