Machine Learning project for automatic classification of ECG signal quality using a Random Forest classifier.
Electrocardiogram (ECG) monitoring systems are susceptible to signal quality degradation caused by patient movement, electrode disconnection, and hardware-related issues. Poor signal quality can negatively affect diagnosis and monitoring performance.
This project implements a Random Forest-based classification model capable of automatically identifying ECG signal quality conditions and categorizing recordings into four classes.
| Label | Class | Description |
|---|---|---|
| 1 | Normal | ECG signal with acceptable quality |
| 2 | Motion Artifact | Distortion caused by patient movement |
| 3 | Lead Disconnection | Electrode detachment or poor contact |
| 4 | Low Battery | Signal degradation caused by device power issues |
text ECG Signal ↓ Data Cleaning ↓ Missing Value Handling ↓ SMOTE Balancing ↓ MinMax Normalization ↓ Random Forest Training ↓ Classification
- Python
- NumPy
- Pandas
- Scikit-Learn
- Imbalanced-Learn (SMOTE)
- Matplotlib
- Joblib
- Google Colab
- Invalid labels are removed.
- Missing values are handled before training.
- Features and labels are separated.
- Training data are balanced using SMOTE.
A MinMaxScaler is applied to normalize all features into a common range.
The classifier used is:
python RandomForestClassifier( n_estimators=100, random_state=42 )
The model consists of 100 decision trees whose predictions are combined through majority voting.
Performance is evaluated using:
- Classification Report
- Confusion Matrix
- 5-Fold Cross Validation
| Metric | Value |
|---|---|
| Cross-validation Accuracy | 93.83% |
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Normal | 1.00 | 0.86 | 0.92 |
| Motion Artifact | 0.73 | 1.00 | 0.84 |
| Lead Disconnection | 1.00 | 0.89 | 0.94 |
| Low Battery | 0.95 | 1.00 | 0.97 |
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Normal | 0.88 | 0.79 | 0.83 |
| Motion Artifact | 0.58 | 0.87 | 0.69 |
| Lead Disconnection | 0.95 | 0.69 | 0.80 |
| Low Battery | 1.00 | 1.00 | 1.00 |
- Excellent performance detecting Low Battery events.
- Strong precision for Normal and Lead Disconnection classes.
- Most classification errors occur between Normal and Motion Artifact signals.
- Cross-validation accuracy above 93% demonstrates robust model performance.
- Low Battery signals achieved perfect classification on the independent test set.
The original datasets used for model development are not included in this repository.
Sample datasets are provided in the data/ directory to demonstrate the expected format:
- sample_training_data.csv
- sample_testing_data.csv
Dataset format:
- First column: class label
- Remaining columns: ECG-derived features
text ecg-classification-random-forest/ │ ├── data/ │ ├── sample_training_data.csv │ └── sample_testing_data.csv │ ├── docs/ │ └── images/ │ └── confusion_matrix.png │ ├── notebooks/ │ └── ModeloECG.ipynb │ ├── README.md ├── requirements.txt ├── LICENSE └── .gitignore
Clone the repository:
bash git clone https://github.com/CIMUXTECH/ecg-classification-random-forest.git cd ecg-classification-random-forest
Install dependencies:
bash pip install -r requirements.txt
Open the notebook:
text notebooks/ModeloECG.ipynb
Run all cells sequentially to:
- Load the ECG dataset.
- Perform preprocessing.
- Balance classes using SMOTE.
- Train the Random Forest model.
- Evaluate performance.
- Generate the confusion matrix.
Potential improvements for future versions include:
- Hyperparameter optimization using GridSearchCV.
- Comparison against SVM classifiers.
- Comparison against XGBoost.
- ROC curve analysis for multiclass classification.
- Real-time ECG signal quality monitoring.
- Deployment as a standalone prediction application.
Current stable release:
text v1.0.0
Quehen Rodriguez
GitHub:
This project is licensed under the MIT License.
