An Explainable Machine Learning Framework for Multi-Risk Security Assessment and Pre-Deployment Protection of Docker Images in Cloud Deployment Pipelines

This project is a Machine Learning-based Docker image security assessment system designed to detect insecure container images before deployment in cloud environments.
The system combines supervised learning, anomaly detection, and explainable AI techniques to improve container security within CI/CD pipelines.

It demonstrates Docker image feature extraction, vulnerability prediction, anomaly detection, explainable AI integration, and automated security analysis using Python, FastAPI, and Machine Learning models.

Features

Docker image security assessment using Machine Learning
Vulnerability prediction for Docker container images
Supervised classification using XGBoost
Anomaly detection using Isolation Forest
Explainable AI integration using SHAP
Automated Docker image monitoring and scanning
REST API integration using FastAPI
Interactive frontend dashboard for scan visualization
Pre-deployment protection for CI/CD pipelines

Tech Stack

Python
FastAPI
XGBoost
Isolation Forest
SHAP
Pandas, NumPy, Scikit-learn
MySQL
Docker
Next.js

Machine Learning Models

XGBoost Classifier for secure/insecure image classification
XGBoost Regressor for vulnerability prediction
Isolation Forest for anomaly detection
SHAP for explainable AI visualization

Dataset Information

Dataset size: 1,053 Docker image instances
Includes secure and insecure Docker images
Features include image size, layers, dependencies, package manager, and vulnerability indicators
Data collected from Kaggle Docker security datasets and augmented synthetic samples

Model Performance

Regression Model (XGBoost Regressor)

RMSE: 100.91
R² Score: 0.84
Cross-validation R²: 0.78

Classification Model (XGBoost Classifier)

Accuracy: 98.7%
Cross-validation Accuracy: 98.39%
Weighted F1-Score: 0.99

Anomaly Detection (Isolation Forest)

Detected 39 anomalous Docker images
Used for identifying unusual container behavior and security risks

Explainable AI (SHAP)

SHAP was integrated to explain model predictions and identify the most influential Docker image features affecting security classification and vulnerability prediction.

Dataset Features

Image Size
Number of Layers
Installed Packages
Dependency Count
Package Manager
Base Image Type
Vulnerability Severity Counts

Research Contribution

This research introduces an explainable Machine Learning framework for identifying insecure Docker images before cloud deployment.
The framework improves proactive container security analysis by combining vulnerability prediction, anomaly detection, and explainable AI techniques.

Note

Docker images used for testing are not included in the repository.
Dataset files and trained model files may be excluded due to size and privacy limitations.
You can add your own Docker image datasets for testing and experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
notebooks		notebooks
scanner-dashboard		scanner-dashboard
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package.json		package.json
requirements.txt		requirements.txt
test_pull.py		test_pull.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Explainable Machine Learning Framework for Multi-Risk Security Assessment and Pre-Deployment Protection of Docker Images in Cloud Deployment Pipelines

Features

Tech Stack

Machine Learning Models

Dataset Information

Model Performance

Regression Model (XGBoost Regressor)

Classification Model (XGBoost Classifier)

Anomaly Detection (Isolation Forest)

Explainable AI (SHAP)

Dataset Features

Research Contribution

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

An Explainable Machine Learning Framework for Multi-Risk Security Assessment and Pre-Deployment Protection of Docker Images in Cloud Deployment Pipelines

Features

Tech Stack

Machine Learning Models

Dataset Information

Model Performance

Regression Model (XGBoost Regressor)

Classification Model (XGBoost Classifier)

Anomaly Detection (Isolation Forest)

Explainable AI (SHAP)

Dataset Features

Research Contribution

Note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages