An Explainable Machine Learning Framework for Multi-Risk Security Assessment and Pre-Deployment Protection of Docker Images in Cloud Deployment Pipelines
This project is a Machine Learning-based Docker image security assessment system designed to detect insecure container images before deployment in cloud environments.
The system combines supervised learning, anomaly detection, and explainable AI techniques to improve container security within CI/CD pipelines.
It demonstrates Docker image feature extraction, vulnerability prediction, anomaly detection, explainable AI integration, and automated security analysis using Python, FastAPI, and Machine Learning models.
- Docker image security assessment using Machine Learning
- Vulnerability prediction for Docker container images
- Supervised classification using XGBoost
- Anomaly detection using Isolation Forest
- Explainable AI integration using SHAP
- Automated Docker image monitoring and scanning
- REST API integration using FastAPI
- Interactive frontend dashboard for scan visualization
- Pre-deployment protection for CI/CD pipelines
- Python
- FastAPI
- XGBoost
- Isolation Forest
- SHAP
- Pandas, NumPy, Scikit-learn
- MySQL
- Docker
- Next.js
- XGBoost Classifier for secure/insecure image classification
- XGBoost Regressor for vulnerability prediction
- Isolation Forest for anomaly detection
- SHAP for explainable AI visualization
- Dataset size: 1,053 Docker image instances
- Includes secure and insecure Docker images
- Features include image size, layers, dependencies, package manager, and vulnerability indicators
- Data collected from Kaggle Docker security datasets and augmented synthetic samples
- RMSE: 100.91
- R² Score: 0.84
- Cross-validation R²: 0.78
- Accuracy: 98.7%
- Cross-validation Accuracy: 98.39%
- Weighted F1-Score: 0.99
- Detected 39 anomalous Docker images
- Used for identifying unusual container behavior and security risks
SHAP was integrated to explain model predictions and identify the most influential Docker image features affecting security classification and vulnerability prediction.
- Image Size
- Number of Layers
- Installed Packages
- Dependency Count
- Package Manager
- Base Image Type
- Vulnerability Severity Counts
- This research introduces an explainable Machine Learning framework for identifying insecure Docker images before cloud deployment.
- The framework improves proactive container security analysis by combining vulnerability prediction, anomaly detection, and explainable AI techniques.
- Docker images used for testing are not included in the repository.
- Dataset files and trained model files may be excluded due to size and privacy limitations.
- You can add your own Docker image datasets for testing and experimentation.