This project builds a robust machine learning workflow by integrating DevOps principles, Docker for containerization, and MLFlow for tracking and model management. The goal is to build, ship, and test this workflow rapidly, ensuring that each component is modular, scalable, and adaptable to different ML projects. The workflow includes a MinIO server for data storage, an MLFlow tracking server, and an NGINX proxy for secure access.
- Project Overview
- Process Flow
- Architecture
- Components
- Data Source
- Getting Started
- Accessing the Services
In this project, we create a structured and efficient ML workflow adopting DevOps practices. By leveraging Docker for containerization and MLFlow for tracking experiments, this setup allows us to:
- Rapidly build, ship, and test models in a modular environment.
- Track and log ML experiments using MLFlow, storing metadata in MySQL and files in MinIO.
- Ensure security and ease of access using an NGINX proxy.
This workflow enables end-to-end management of ML experiments, from data preprocessing and training to deployment. The following diagram shows the process flow:
The core of this workflow includes Docker containers orchestrated with docker-compose to run various services:
- MLFlow Tracking Server: For experiment logging and model versioning.
- MinIO Server: S3-compatible object storage for model artifacts.
- MySQL Database: Stores MLFlow metadata.
- NGINX: Manages access control with basic authentication.
The MLFlow server allows us to track and visualize ML experiments. Data is stored in a MySQL database, and model artifacts are saved in a MinIO bucket. Access MLFlow at http://127.0.0.1:5000.
MinIO is an S3-compatible storage system used for storing model artifacts. This allows for easy management and access to large files, datasets, and model artifacts. Access MinIO at http://127.0.0.1:9000.
NGINX acts as a reverse proxy for secure access to the MLFlow server and MinIO. Basic authentication is implemented to restrict access. Access NGINX at http://127.0.0.1.
For this project, use the credit card fraud detection dataset from Kaggle:
- Dataset URL: Credit Card Fraud Detection
- Download
creditcard.csvand place it in themlflowfolder before starting the containers.
To set up and run this project, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/your-repo.git cd your-repo - Start the Docker containers:
docker-compose -f docker-compose.yml up -d
- Start training models:
docker exec -it [container ID] /bin/bash python train.py
- MLFlow Server: Accessible at http://127.0.0.1:5000
- MinIO Server: Accessible at http://127.0.0.1:9000
- NGINX Proxy: Accessible at http://127.0.0.1



