This repository contains an end-to-end MLOps pipeline designed to predict whether a customer will honor or cancel their hotel reservation. The system leverages cloud data storage, robust model tracking, automated CI/CD pipelines, and serverless container deployment.
- Data Layer: Raw reservation data is managed via automated ETL flows and stored securely in a Google Cloud Storage bucket.
- Experimentation: Version control handles small tracking files while heavy assets are tracked via Git. Models are monitored across iterations using an MLflow tracking server.
- Continuous Integration & Deployment: Commits to GitHub trigger automated Jenkins pipelines. Jenkins builds a Docker image via Docker-in-Docker (DinD), registers it to Google Container Registry (GCR), and ships it to Google Cloud Run.
├── src/ # Source code modules (Ingestion, Preprocessing, Training)
├── notebook/ # Jupyter Notebooks for EDA and prototype testing
├── templates/ # HTML files for the Flask UI
├── static/ # CSS and JavaScript assets
├── config/ # Configuration files (config.yaml, model_params.yaml)
├── artifacts/ # Local data splits and serialized model outputs
├── pipeline/ # Training and prediction orchestration scripts
├── utils/ # Common helper utilities
├── Dockerfile # Project container definition
├── requirements.txt # Python dependencies
└── setup.py # Project package installation settings
Isolate your development dependencies by initializing a clean virtual environment:
python -m venv venvActivate the environment:
| OS | Command |
|---|---|
| Windows (PowerShell) | venv\Scripts\activate |
| Linux / macOS | source venv/bin/activate |
Install required libraries (including imbalanced-learn) and package the source directory in editable mode:
pip install -r requirements.txt
pip install -e .To extract files from Cloud Storage, establish valid authentication configurations:
- Go to the GCP Console and navigate to IAM & Admin → Service Accounts.
- Create a service account with the Storage Admin and Storage Object Viewer roles.
- Whitelist the service account email within your target Cloud Storage bucket permissions panel.
If you encounter permission blockers or errors while downloading JSON keys from the console, authenticate locally via the Google Cloud CLI:
gcloud auth application-default loginThis maps credentials locally to:
C:\Users\vigna\AppData\Roaming\gcloud\application_default_credentials.json
- Run the ingestion module to extract the dataset from your GCP bucket and execute a structured train-test split.
- Add
data_preprocessingparameters toconfig/config.yaml. - Use preprocessing routines to balance target distribution flags using
imbalanced-learn.
Configure model training hyperparameters inside config/model_params.yaml. To launch your experiment tracker and compare iterations, spin up the MLflow server:
mlflow uiDashboard URL: http://127.0.0.1:5000
The deployment pipeline relies on a custom Docker-in-Docker (DinD) Jenkins image to assemble runtime environments.
cd custom_jenkins
docker build -t jenkins-dind .Launch your local automation server with exposed web management ports:
docker run -d --name jenkins-dind -p 8080:8080 -p 50000:50000 jenkins-dind:latest- Connect Jenkins to your GitHub repository webhook.
- Configure your pipeline stage to login to Docker, assemble your Flask web app image, and push it directly to the Google Container Registry (GCR).
- Extract the freshly built image from GCR and deploy it directly onto Google Cloud Run for public serverless hosting.
⚠️ Important: Ensure that the Artifact Registry API and Cloud Resource Manager API are enabled within your GCP Project console prior to executing the build pipeline.
| Tool | Purpose |
|---|---|
| Python 3.8+ | Core runtime |
| Docker | Containerization & local Jenkins |
| Google Cloud SDK | GCP authentication & deployment |
| MLflow | Experiment tracking |
| Git | Large file / data versioning |
| Jenkins | CI/CD automation |