Low-impact performance optimizations for all SuStaIn classes#84
Draft
gdevenyi wants to merge 15 commits into
Draft
Low-impact performance optimizations for all SuStaIn classes#84gdevenyi wants to merge 15 commits into
gdevenyi wants to merge 15 commits into
Conversation
- Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.tile normalization with keepdims=True - Replace Python sum() with np.sum() - Replace np.nan*np.ones with np.full - Vectorize subtype/stage argmax with np.argmax+np.take_along_axis - Vectorize cluster assignment with np.argmax - Replace O(M^2) CV index listcomp with boolean mask - Replace np.where(x==max(x)) with np.argmax - Vectorize MCMC permutation inversion with np.argsort
…reSustain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax - Cache possible_biomarkers and use np arrays for constants
…ustain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace Python sum() with np.sum() - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax - Use dtype=bool for IS_normal/IS_abnormal, eliminating .astype(bool)
…stain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.zeros+index assignment with np.argsort for inverse permutation - Replace np.where(x==np.max(x)) with np.argmax - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace .astype(int)+==2 with np.logical_and
…reSustainMissingData - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax
…dTypeSustain - Replace np.tile+transpose for f_opt with reshape+broadcasting - Replace np.array([0]*N) inverse permutation with np.argsort - Replace .astype(int)+==2 with np.logical_and - Replace np.array(range(N)) with np.arange(N) - Replace np.exp(ratio)<random() with log_ratio<np.log(random()) - Replace np.where(x==max(x)) with np.argmax
Replace deprecated df.append() with pd.concat() in save_time()
Cache _log_factor and _log_coeff as instance attributes instead of recomputing them on every _calculate_likelihood_stage call. These constants (log normalisation factor and log uniform coefficient) are called O(N_MCMC × C) times.
…ustain stats.norm.pdf incurs significant overhead from input validation and generic dispatch. For unit-variance Gaussian evaluation, the closed-form np.exp(-0.5*x*x)/sqrt(2*pi) is 5-10x faster on array input.
…rdinalSustain _calculate_likelihood_stage previously recomputed the full product of prob_score and prob_nl from scratch at each of K stages. Replace with a running log-space sum that adds/subtracts the contribution of the biomarker that transitions at each stage. Handles multi-event biomarkers correctly by tracking the active score index per biomarker.
- Cache ~np.isnan(data) mask once before the stage loop instead of recomputing it every iteration - Replace np.tile for stage_value/sigmat with broadcasting - Use scalar log_p_missing instead of tiled (M,B) array - Eliminates O(K) redundant NaN checks and large intermediate copies
Replace single broadcast creating a (J, I, K+1) 3D tensor with a loop over I biomarkers accumulating into (J, K+1). Reduces peak memory ~3x in the hottest path, enabling larger subject counts.
Fuse p_perm_k * f_broadcast then sum into single einsum operations, eliminating two (M, N+1, N_S) temporary allocations per call.
In both _optimise_parameters and _perform_mcmc across all 5 Sustain classes, move element from position A to position B using direct array slicing instead of delete+concatenate. Eliminates O(N_MCMC) small array allocations per MCMC run.
The EM loop typically converges in 10-20 iterations but pre-allocated (100, N, N_S) arrays were fully serialized across multiprocessing workers. Use list.append then convert to array at end.
Author
|
Performance evaluation of the test suite before/after changes. Before: After: |
There was a problem hiding this comment.
Pull request overview
This PR applies low-impact performance and memory optimizations across SuStaIn model implementations (Z-score, missing-data Z-score, ordinal, mixture/event-based, mixed-type) and updates a validation utility for pandas 2.x compatibility, while aiming to preserve numerical behavior.
Changes:
- Reduce temporary allocations and improve broadcasting/shape handling in likelihood, EM, and MCMC routines across multiple SuStaIn subclasses.
- Speed up key likelihood computations (e.g., log-space accumulation in OrdinalSustain; per-biomarker accumulation in ZscoreSustain;
einsumfusion in AbstractSustain). - Update validation timing CSV writing to avoid deprecated
DataFrame.append.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/validation.py | Replaces deprecated DataFrame.append usage with pd.concat for pandas 2.x compatibility. |
| pySuStaIn/ZscoreSustain.py | Caches log constants; avoids large intermediate tensors; reduces allocations in optimisation/MCMC. |
| pySuStaIn/ZScoreSustainMissingData.py | Reworks missing-data likelihood to reduce tiling and repeated allocations; broadcasting-based updates. |
| pySuStaIn/OrdinalSustain.py | Uses running log-sum to avoid repeated full products; updates broadcasting/normalization logic. |
| pySuStaIn/MixtureSustain.py | Uses broadcasting instead of tiling; reduces allocations in sequence-move proposals and likelihood weighting. |
| pySuStaIn/MixedTypeSustain.py | Replaces scipy.stats.norm.pdf with a manual standard-normal pdf; broadcasting/normalization updates. |
| pySuStaIn/AbstractSustain.py | Avoids expensive index construction in CV; replaces preallocated EM traces with lists; uses einsum for weighted sums; vectorizes subtype/stage selection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Author
|
Larger-scale testing is ongoing right now with real-world data. Will return with performance reports. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance analysis with GLM 5.1
All work here is validated to pass the test suite provided after each change.
Summary
Surgical, numerically-validated performance optimizations across all 5 SuStaIn subclasses. No algorithmic changes, no API changes, no cross-class interface changes. All changes validated against reference CSV files for all 5 classes.
Changes
Constant / precomputation caching
_log_factorand_log_coeffinvariants in__init__instead of recomputing per call.~np.isnan(data)NaN mask, replacenp.tilewith broadcasting, use scalarlog_p_missing.Algorithmic-strength reduction (same math, cheaper computation)
_calculate_likelihood_stage. Correctly handles multi-event biomarkers viabiomarker_active_scoretracking.(J,I,K+1)intermediate tensor in_calculate_likelihood_stage— loop over I biomarkers accumulating into(J,K+1).scipy.stats.norm.pdf(x)withnp.exp(-0.5*x*x) / sqrt(2π)— avoids scipy dispatch overhead.p_perm_k * f_broadcast+np.sumwithnp.einsum('jks,s->jk', ...)— eliminates 2 temporary(M,K+1,N_S)allocations per call.Memory allocation reduction
np.delete+np.concatenatewith in-place array slice construction.AbstractSustain._perform_em): ReplaceNaN-pre-allocated(100,N,N_S)arrays withlist.append, convert to array at return. EM typically converges in 10–20 iterations, so ~80% of pre-allocated space was wasted.Pandas 2.0 compat
DataFrame.appendremoval insim/simrun.py.Broadcasting / numpy modernization (all classes)
np.tilewith broadcasting,np.argsortfor sorting,np.logical_and/np.argmaxwith dtype awareness.Validation
All changes pass numerical validation against reference outputs for all 5 classes:
Commits (15)
1f3804aa97e9fbdf9f6e387a3a45b673f01524d38461b7770708f09369a53b1ded69ce771b2021fff2305c8b10dd4c507243b78df