A Study of Data Pre-processing Techniques for Imbalanced Bioinformatics Data Classification
src folder contains all source code
data folder contains all data used in this study, these data are also available on the website as advised in our paper.
Input_example.m handles the following
Read the data from a database file
split the data into training and testing data
Train the classification model
Test the classification model and output the classification performance: Accuracy, Precision, Recall, FM and AUC
When use the code, please choose a classifier accordingly, for example, 1 means SVM, 2 for C4.5 decision tree, 3 for KNN, 4 for LDA, and 5 for RF.
If one wants to apply these techniques to other imbalanced datasets, plese Format the data for the re-balanced process.
If you have any query, please let me know