🌳 ML Decision Tree

A comprehensive collection of Jupyter notebooks demonstrating decision tree algorithms for both classification and regression tasks with real-world datasets.

📚 Overview

This repository contains practical implementations and applications of decision tree machine learning algorithms. It includes examples of decision tree classifiers, regressors, and real-world use cases with comprehensive data preprocessing, model training, and evaluation workflows.

📓 Notebooks

1. decision_tree.ipynb

Classification using Decision Trees on heart disease data.

Loading and exploring the Cleveland heart disease dataset
Data preprocessing and column naming
Missing data detection and handling
Decision Tree Classifier implementation
Cross-validation and model evaluation
Confusion matrix analysis
Tree visualization
Key Concepts: Classification, Cross-validation, Heart disease prediction

2. customer_churn.ipynb

Predicting customer churn using Decision Trees.

Loading customer churn dataset
Data exploration and quality assessment
Categorical data encoding with one-hot encoding
Train-test split for model validation
Decision Tree Classifier for churn prediction
Model performance evaluation
Key Concepts: Classification, Categorical encoding, Business analytics, Customer retention

3. regression_tree.ipynb

Decision Tree Regression on synthetic data.

Generating quadratic dataset with noise
Decision Tree Regressor implementation
Hyperparameter tuning (max_depth)
Tree visualization and interpretation
Prediction on new data
Key Concepts: Regression, Decision tree depth, Model visualization

4. diabetes.ipynb

Classification and Regression on diabetes prediction dataset.

Loading diabetes dataset from CSV
Data exploration and info checking
Train-test split
Decision Tree Regressor with hyperparameter tuning (min_samples_leaf, max_depth)
Prediction and confidence thresholding
Mean Squared Error (MSE) calculation
Key Concepts: Regression, Hyperparameter optimization, Classification thresholding

📊 Datasets

This repository includes the following datasets:

processed.cleveland.data - Heart disease dataset with 13 features and binary target
customer_churn_dataset.csv - Customer churn data with demographic and service usage features
diabetes.csv - Diabetes prediction dataset with health metrics

🚀 Getting Started

Prerequisites

Python 3.5+
Jupyter Notebook or JupyterLab
NumPy
Pandas
Matplotlib
scikit-learn

Installation

# Clone the repository
git clone https://github.com/PR202111/ML_Decision_Tree.git
cd ML_Decision_Tree

# Install required packages
pip install numpy pandas matplotlib scikit-learn

Running the Notebooks

# Start Jupyter Lab
jupyter lab

# Or start Jupyter Notebook
jupyter notebook

Then navigate to the notebook you want to explore and open it.

📈 Learning Path

We recommend following these notebooks in order for optimal learning:

Start Here → regression_tree.ipynb - Understand basic decision trees with synthetic data
Next → decision_tree.ipynb - Learn classification on real medical data
Expand Skills → customer_churn.ipynb - Apply to business analytics with categorical data
Advanced → diabetes.ipynb - Combine regression with classification thresholding

🎯 Key Concepts Covered

Decision Tree Basics: Tree structure, node splitting, and leaf nodes
Classification: Binary and multi-class classification problems
Regression: Continuous value prediction with decision trees
Data Preprocessing: Handling categorical variables, encoding strategies
Hyperparameter Tuning: max_depth, min_samples_leaf, and their impact
Model Evaluation: Cross-validation, confusion matrices, MSE
Tree Visualization: Interpreting decision tree structures
Real-World Applications: Medical diagnosis, customer churn, diabetes prediction

🔧 Technologies & Libraries

Pandas: Data manipulation and exploration
NumPy: Numerical computing
Matplotlib: Data visualization and plotting
scikit-learn: Machine learning algorithms and utilities
Jupyter: Interactive computing environment

📝 Model Architecture & Workflow

Typical Workflow:

Data Loading → Load CSV files into Pandas DataFrames
Exploration → Understand data structure, types, and missing values
Preprocessing → Handle categorical data, encode features
Splitting → Train-test split (typically 80-20)
Training → Fit DecisionTreeClassifier or DecisionTreeRegressor
Evaluation → Cross-validation, metrics calculation, visualization
Interpretation → Visualize and understand decision trees

💡 Learning Outcomes

After exploring these notebooks, you'll be able to:

Build decision tree models for classification and regression
Preprocess and encode categorical features
Tune hyperparameters for optimal model performance
Evaluate models using appropriate metrics
Visualize and interpret decision tree structures
Apply decision trees to real-world business problems
Handle medical and customer data effectively

📊 Performance Metrics Used

Classification: Accuracy, Confusion Matrix, Cross-validation Score
Regression: Mean Squared Error (MSE), R² Score

🔑 Key Features of Decision Trees

✓ Interpretability - Easy to understand and explain to stakeholders
✓ No Feature Scaling - Works with raw features
✓ Handles Both Classification & Regression - Versatile algorithm
✓ Non-linear - Captures complex patterns
✓ Categorical Variables - Direct support without special encoding
✓ Feature Importance - Built-in feature ranking

📝 Notes

Each notebook is self-contained and can be run independently
Datasets are included in the repository for easy execution
Code includes comments explaining key concepts
Tree visualizations help understand decision-making processes
Hyperparameter tuning sections show impact on model performance

🤝 Contributing

Feel free to fork this repository, add improvements, or suggest enhancements. This is an educational project meant to help others learn decision tree algorithms.

📖 References

scikit-learn Decision Trees Documentation
Decision Trees - Wikipedia
Machine Learning textbooks and courses

📄 License

This project is open source and available under the MIT License.

Created by: PR202111
Last Updated: 2026
Status: Active ✨

Happy Learning! 🌳

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌳 ML Decision Tree

📚 Overview

📓 Notebooks

1. decision_tree.ipynb

2. customer_churn.ipynb

3. regression_tree.ipynb

4. diabetes.ipynb

📊 Datasets

🚀 Getting Started

Prerequisites

Installation

Running the Notebooks

📈 Learning Path

🎯 Key Concepts Covered

🔧 Technologies & Libraries

📝 Model Architecture & Workflow

Typical Workflow:

💡 Learning Outcomes

📊 Performance Metrics Used

🔑 Key Features of Decision Trees

📝 Notes

🤝 Contributing

📖 References

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
customer_churn.ipynb		customer_churn.ipynb
customer_churn_dataset.csv		customer_churn_dataset.csv
decision_tree.ipynb		decision_tree.ipynb
diabetes.csv		diabetes.csv
diabetes.ipynb		diabetes.ipynb
processed.cleveland.data		processed.cleveland.data
regression_tree.ipynb		regression_tree.ipynb

Folders and files

Latest commit

History

Repository files navigation

🌳 ML Decision Tree

📚 Overview

📓 Notebooks

1. decision_tree.ipynb

2. customer_churn.ipynb

3. regression_tree.ipynb

4. diabetes.ipynb

📊 Datasets

🚀 Getting Started

Prerequisites

Installation

Running the Notebooks

📈 Learning Path

🎯 Key Concepts Covered

🔧 Technologies & Libraries

📝 Model Architecture & Workflow

Typical Workflow:

💡 Learning Outcomes

📊 Performance Metrics Used

🔑 Key Features of Decision Trees

📝 Notes

🤝 Contributing

📖 References

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages