Added functionalities & Fixed bugs#7
Open
zZz-Tristan wants to merge 17 commits into
Open
Conversation
added boruta package
Updated readme to include boruta and sampling methods
updated cli.py to include boruta and sampling methods
…n.py added the Boruta function
Added ENN, SMOTEENN, and RandomUndersampling
Fixed bug regarding loss of classes when undersampling
Updated readme to include added flags for choosing index column, and flags for controlling boruta (perc, alpha, max_iter)
Added flag for choosing index column, and added flag for controlling boruta (perc, alpha, max_iter)
Added flag for choosing index column
Added flags for controlling Boruta (perc, alpha and max_iter) Added print function to show number of features MUVR selected Added error message in case no features are selected by Boruta.
Add functions
Bug fix 22 05 2026
changed muvr_file to feature_file
Fixed bug regarding problems with datatypes when loading in data.
Bug fix 09 06 2026
…ineages - Add GroupKFold fallback when StratifiedGroupKFold cannot be applied - Handle lineages with insufficient class counts or groups - Add validation for n_splits >= 2 - Check for missing required columns before processing - Remove rows with missing values in required fields - Normalize lineage, group, and outcome columns - Add duplicate sample ID detection - Catch sklearn splitting errors and assign affected lineages to train - Add safeguards against empty train/test outputs - Remove unused imports and legacy splitting code
Improve robustness of train/test splitting for small and imbalanced lineages
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
added Boruta, ENN, SMOTEENN and RandomUndersampling
Fixed bug regarding loss of classes when undersampling
Added flags to control Boruta (perc, alpha, max_iter)
Added print function to show number of features MUVR selected
Added error message in case no features are selected by Boruta
Added flag for choosing index column
Updated readme to include Boruta and sampling methods, as well as added flags
Changed muvr_file to feature_file
Fixed bug in 00_split_dataset with datatypes causing problems when loading in data