Skip to content

Added functionalities & Fixed bugs#7

Open
zZz-Tristan wants to merge 17 commits into
jpaganini:mainfrom
zZz-Tristan:main
Open

Added functionalities & Fixed bugs#7
zZz-Tristan wants to merge 17 commits into
jpaganini:mainfrom
zZz-Tristan:main

Conversation

@zZz-Tristan

Copy link
Copy Markdown

added Boruta, ENN, SMOTEENN and RandomUndersampling
Fixed bug regarding loss of classes when undersampling
Added flags to control Boruta (perc, alpha, max_iter)
Added print function to show number of features MUVR selected
Added error message in case no features are selected by Boruta
Added flag for choosing index column
Updated readme to include Boruta and sampling methods, as well as added flags
Changed muvr_file to feature_file
Fixed bug in 00_split_dataset with datatypes causing problems when loading in data

zZz-Tristan and others added 17 commits March 10, 2026 11:54
added boruta package
Updated readme to include boruta and sampling methods
updated cli.py to include boruta and sampling methods
Added ENN, SMOTEENN, and RandomUndersampling
Fixed bug regarding loss of classes when undersampling
Updated readme to include added flags for choosing index column, and flags for controlling boruta (perc, alpha, max_iter)
Added flag for choosing index column, and added flag for controlling boruta (perc, alpha, max_iter)
Added flag for choosing index column
Added flags for controlling Boruta (perc, alpha and max_iter)
Added print function to show number of features MUVR selected
Added error message in case no features are selected by Boruta.
changed muvr_file to feature_file
Fixed bug regarding problems with datatypes when loading in data.
…ineages

- Add GroupKFold fallback when StratifiedGroupKFold cannot be applied
- Handle lineages with insufficient class counts or groups
- Add validation for n_splits >= 2
- Check for missing required columns before processing
- Remove rows with missing values in required fields
- Normalize lineage, group, and outcome columns
- Add duplicate sample ID detection
- Catch sklearn splitting errors and assign affected lineages to train
- Add safeguards against empty train/test outputs
- Remove unused imports and legacy splitting code
Improve robustness of train/test splitting for small and imbalanced lineages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants