Skip to content

PR202111/ML_Decision_Tree

Repository files navigation

🌳 ML Decision Tree

A comprehensive collection of Jupyter notebooks demonstrating decision tree algorithms for both classification and regression tasks with real-world datasets.

📚 Overview

This repository contains practical implementations and applications of decision tree machine learning algorithms. It includes examples of decision tree classifiers, regressors, and real-world use cases with comprehensive data preprocessing, model training, and evaluation workflows.

📓 Notebooks

1. decision_tree.ipynb

Classification using Decision Trees on heart disease data.

  • Loading and exploring the Cleveland heart disease dataset
  • Data preprocessing and column naming
  • Missing data detection and handling
  • Decision Tree Classifier implementation
  • Cross-validation and model evaluation
  • Confusion matrix analysis
  • Tree visualization
  • Key Concepts: Classification, Cross-validation, Heart disease prediction

2. customer_churn.ipynb

Predicting customer churn using Decision Trees.

  • Loading customer churn dataset
  • Data exploration and quality assessment
  • Categorical data encoding with one-hot encoding
  • Train-test split for model validation
  • Decision Tree Classifier for churn prediction
  • Model performance evaluation
  • Key Concepts: Classification, Categorical encoding, Business analytics, Customer retention

3. regression_tree.ipynb

Decision Tree Regression on synthetic data.

  • Generating quadratic dataset with noise
  • Decision Tree Regressor implementation
  • Hyperparameter tuning (max_depth)
  • Tree visualization and interpretation
  • Prediction on new data
  • Key Concepts: Regression, Decision tree depth, Model visualization

4. diabetes.ipynb

Classification and Regression on diabetes prediction dataset.

  • Loading diabetes dataset from CSV
  • Data exploration and info checking
  • Train-test split
  • Decision Tree Regressor with hyperparameter tuning (min_samples_leaf, max_depth)
  • Prediction and confidence thresholding
  • Mean Squared Error (MSE) calculation
  • Key Concepts: Regression, Hyperparameter optimization, Classification thresholding

📊 Datasets

This repository includes the following datasets:

  • processed.cleveland.data - Heart disease dataset with 13 features and binary target
  • customer_churn_dataset.csv - Customer churn data with demographic and service usage features
  • diabetes.csv - Diabetes prediction dataset with health metrics

🚀 Getting Started

Prerequisites

  • Python 3.5+
  • Jupyter Notebook or JupyterLab
  • NumPy
  • Pandas
  • Matplotlib
  • scikit-learn

Installation

# Clone the repository
git clone https://github.com/PR202111/ML_Decision_Tree.git
cd ML_Decision_Tree

# Install required packages
pip install numpy pandas matplotlib scikit-learn

Running the Notebooks

# Start Jupyter Lab
jupyter lab

# Or start Jupyter Notebook
jupyter notebook

Then navigate to the notebook you want to explore and open it.

📈 Learning Path

We recommend following these notebooks in order for optimal learning:

  1. Start Hereregression_tree.ipynb - Understand basic decision trees with synthetic data
  2. Nextdecision_tree.ipynb - Learn classification on real medical data
  3. Expand Skillscustomer_churn.ipynb - Apply to business analytics with categorical data
  4. Advanceddiabetes.ipynb - Combine regression with classification thresholding

🎯 Key Concepts Covered

  • Decision Tree Basics: Tree structure, node splitting, and leaf nodes
  • Classification: Binary and multi-class classification problems
  • Regression: Continuous value prediction with decision trees
  • Data Preprocessing: Handling categorical variables, encoding strategies
  • Hyperparameter Tuning: max_depth, min_samples_leaf, and their impact
  • Model Evaluation: Cross-validation, confusion matrices, MSE
  • Tree Visualization: Interpreting decision tree structures
  • Real-World Applications: Medical diagnosis, customer churn, diabetes prediction

🔧 Technologies & Libraries

  • Pandas: Data manipulation and exploration
  • NumPy: Numerical computing
  • Matplotlib: Data visualization and plotting
  • scikit-learn: Machine learning algorithms and utilities
  • Jupyter: Interactive computing environment

📝 Model Architecture & Workflow

Typical Workflow:

  1. Data Loading → Load CSV files into Pandas DataFrames
  2. Exploration → Understand data structure, types, and missing values
  3. Preprocessing → Handle categorical data, encode features
  4. Splitting → Train-test split (typically 80-20)
  5. Training → Fit DecisionTreeClassifier or DecisionTreeRegressor
  6. Evaluation → Cross-validation, metrics calculation, visualization
  7. Interpretation → Visualize and understand decision trees

💡 Learning Outcomes

After exploring these notebooks, you'll be able to:

  • Build decision tree models for classification and regression
  • Preprocess and encode categorical features
  • Tune hyperparameters for optimal model performance
  • Evaluate models using appropriate metrics
  • Visualize and interpret decision tree structures
  • Apply decision trees to real-world business problems
  • Handle medical and customer data effectively

📊 Performance Metrics Used

  • Classification: Accuracy, Confusion Matrix, Cross-validation Score
  • Regression: Mean Squared Error (MSE), R² Score

🔑 Key Features of Decision Trees

Interpretability - Easy to understand and explain to stakeholders
No Feature Scaling - Works with raw features
Handles Both Classification & Regression - Versatile algorithm
Non-linear - Captures complex patterns
Categorical Variables - Direct support without special encoding
Feature Importance - Built-in feature ranking

📝 Notes

  • Each notebook is self-contained and can be run independently
  • Datasets are included in the repository for easy execution
  • Code includes comments explaining key concepts
  • Tree visualizations help understand decision-making processes
  • Hyperparameter tuning sections show impact on model performance

🤝 Contributing

Feel free to fork this repository, add improvements, or suggest enhancements. This is an educational project meant to help others learn decision tree algorithms.

📖 References

📄 License

This project is open source and available under the MIT License.


Created by: PR202111
Last Updated: 2026
Status: Active ✨

Happy Learning! 🌳

About

Decision Trees Implementation with Question on Resgression and Classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors