Skip to content

CIMUXTECH/ecg-classification-random-forest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECG Signal Quality Classification using Random Forest

Python License

Machine Learning project for automatic classification of ECG signal quality using a Random Forest classifier.

Overview

Electrocardiogram (ECG) monitoring systems are susceptible to signal quality degradation caused by patient movement, electrode disconnection, and hardware-related issues. Poor signal quality can negatively affect diagnosis and monitoring performance.

This project implements a Random Forest-based classification model capable of automatically identifying ECG signal quality conditions and categorizing recordings into four classes.


Classes

Label Class Description
1 Normal ECG signal with acceptable quality
2 Motion Artifact Distortion caused by patient movement
3 Lead Disconnection Electrode detachment or poor contact
4 Low Battery Signal degradation caused by device power issues

Processing Pipeline

text ECG Signal ↓ Data Cleaning ↓ Missing Value Handling ↓ SMOTE Balancing ↓ MinMax Normalization ↓ Random Forest Training ↓ Classification


Technologies

  • Python
  • NumPy
  • Pandas
  • Scikit-Learn
  • Imbalanced-Learn (SMOTE)
  • Matplotlib
  • Joblib
  • Google Colab

Machine Learning Workflow

Data Preparation

  • Invalid labels are removed.
  • Missing values are handled before training.
  • Features and labels are separated.
  • Training data are balanced using SMOTE.

Data Normalization

A MinMaxScaler is applied to normalize all features into a common range.

Model Training

The classifier used is:

python RandomForestClassifier( n_estimators=100, random_state=42 )

The model consists of 100 decision trees whose predictions are combined through majority voting.

Evaluation

Performance is evaluated using:

  • Classification Report
  • Confusion Matrix
  • 5-Fold Cross Validation

Results

Cross Validation

Metric Value
Cross-validation Accuracy 93.83%

Validation Performance

Class Precision Recall F1-Score
Normal 1.00 0.86 0.92
Motion Artifact 0.73 1.00 0.84
Lead Disconnection 1.00 0.89 0.94
Low Battery 0.95 1.00 0.97

Test Set Performance

Class Precision Recall F1-Score
Normal 0.88 0.79 0.83
Motion Artifact 0.58 0.87 0.69
Lead Disconnection 0.95 0.69 0.80
Low Battery 1.00 1.00 1.00

Confusion Matrix

Confusion Matrix

Key Findings

  • Excellent performance detecting Low Battery events.
  • Strong precision for Normal and Lead Disconnection classes.
  • Most classification errors occur between Normal and Motion Artifact signals.
  • Cross-validation accuracy above 93% demonstrates robust model performance.
  • Low Battery signals achieved perfect classification on the independent test set.

Dataset

The original datasets used for model development are not included in this repository.

Sample datasets are provided in the data/ directory to demonstrate the expected format:

  • sample_training_data.csv
  • sample_testing_data.csv

Dataset format:

  • First column: class label
  • Remaining columns: ECG-derived features

Repository Structure

text ecg-classification-random-forest/ │ ├── data/ │ ├── sample_training_data.csv │ └── sample_testing_data.csv │ ├── docs/ │ └── images/ │ └── confusion_matrix.png │ ├── notebooks/ │ └── ModeloECG.ipynb │ ├── README.md ├── requirements.txt ├── LICENSE └── .gitignore


Installation

Clone the repository:

bash git clone https://github.com/CIMUXTECH/ecg-classification-random-forest.git cd ecg-classification-random-forest

Install dependencies:

bash pip install -r requirements.txt


Usage

Open the notebook:

text notebooks/ModeloECG.ipynb

Run all cells sequentially to:

  1. Load the ECG dataset.
  2. Perform preprocessing.
  3. Balance classes using SMOTE.
  4. Train the Random Forest model.
  5. Evaluate performance.
  6. Generate the confusion matrix.

Future Work

Potential improvements for future versions include:

  • Hyperparameter optimization using GridSearchCV.
  • Comparison against SVM classifiers.
  • Comparison against XGBoost.
  • ROC curve analysis for multiclass classification.
  • Real-time ECG signal quality monitoring.
  • Deployment as a standalone prediction application.

Version

Current stable release:

text v1.0.0


Author

Quehen Rodriguez

GitHub:

https://github.com/CIMUXTECH


License

This project is licensed under the MIT License.