A comprehensive collection of Jupyter notebooks demonstrating decision tree algorithms for both classification and regression tasks with real-world datasets.
This repository contains practical implementations and applications of decision tree machine learning algorithms. It includes examples of decision tree classifiers, regressors, and real-world use cases with comprehensive data preprocessing, model training, and evaluation workflows.
Classification using Decision Trees on heart disease data.
- Loading and exploring the Cleveland heart disease dataset
- Data preprocessing and column naming
- Missing data detection and handling
- Decision Tree Classifier implementation
- Cross-validation and model evaluation
- Confusion matrix analysis
- Tree visualization
- Key Concepts: Classification, Cross-validation, Heart disease prediction
Predicting customer churn using Decision Trees.
- Loading customer churn dataset
- Data exploration and quality assessment
- Categorical data encoding with one-hot encoding
- Train-test split for model validation
- Decision Tree Classifier for churn prediction
- Model performance evaluation
- Key Concepts: Classification, Categorical encoding, Business analytics, Customer retention
Decision Tree Regression on synthetic data.
- Generating quadratic dataset with noise
- Decision Tree Regressor implementation
- Hyperparameter tuning (max_depth)
- Tree visualization and interpretation
- Prediction on new data
- Key Concepts: Regression, Decision tree depth, Model visualization
Classification and Regression on diabetes prediction dataset.
- Loading diabetes dataset from CSV
- Data exploration and info checking
- Train-test split
- Decision Tree Regressor with hyperparameter tuning (min_samples_leaf, max_depth)
- Prediction and confidence thresholding
- Mean Squared Error (MSE) calculation
- Key Concepts: Regression, Hyperparameter optimization, Classification thresholding
This repository includes the following datasets:
- processed.cleveland.data - Heart disease dataset with 13 features and binary target
- customer_churn_dataset.csv - Customer churn data with demographic and service usage features
- diabetes.csv - Diabetes prediction dataset with health metrics
- Python 3.5+
- Jupyter Notebook or JupyterLab
- NumPy
- Pandas
- Matplotlib
- scikit-learn
# Clone the repository
git clone https://github.com/PR202111/ML_Decision_Tree.git
cd ML_Decision_Tree
# Install required packages
pip install numpy pandas matplotlib scikit-learn# Start Jupyter Lab
jupyter lab
# Or start Jupyter Notebook
jupyter notebookThen navigate to the notebook you want to explore and open it.
We recommend following these notebooks in order for optimal learning:
- Start Here →
regression_tree.ipynb- Understand basic decision trees with synthetic data - Next →
decision_tree.ipynb- Learn classification on real medical data - Expand Skills →
customer_churn.ipynb- Apply to business analytics with categorical data - Advanced →
diabetes.ipynb- Combine regression with classification thresholding
- Decision Tree Basics: Tree structure, node splitting, and leaf nodes
- Classification: Binary and multi-class classification problems
- Regression: Continuous value prediction with decision trees
- Data Preprocessing: Handling categorical variables, encoding strategies
- Hyperparameter Tuning: max_depth, min_samples_leaf, and their impact
- Model Evaluation: Cross-validation, confusion matrices, MSE
- Tree Visualization: Interpreting decision tree structures
- Real-World Applications: Medical diagnosis, customer churn, diabetes prediction
- Pandas: Data manipulation and exploration
- NumPy: Numerical computing
- Matplotlib: Data visualization and plotting
- scikit-learn: Machine learning algorithms and utilities
- Jupyter: Interactive computing environment
- Data Loading → Load CSV files into Pandas DataFrames
- Exploration → Understand data structure, types, and missing values
- Preprocessing → Handle categorical data, encode features
- Splitting → Train-test split (typically 80-20)
- Training → Fit DecisionTreeClassifier or DecisionTreeRegressor
- Evaluation → Cross-validation, metrics calculation, visualization
- Interpretation → Visualize and understand decision trees
After exploring these notebooks, you'll be able to:
- Build decision tree models for classification and regression
- Preprocess and encode categorical features
- Tune hyperparameters for optimal model performance
- Evaluate models using appropriate metrics
- Visualize and interpret decision tree structures
- Apply decision trees to real-world business problems
- Handle medical and customer data effectively
- Classification: Accuracy, Confusion Matrix, Cross-validation Score
- Regression: Mean Squared Error (MSE), R² Score
✓ Interpretability - Easy to understand and explain to stakeholders
✓ No Feature Scaling - Works with raw features
✓ Handles Both Classification & Regression - Versatile algorithm
✓ Non-linear - Captures complex patterns
✓ Categorical Variables - Direct support without special encoding
✓ Feature Importance - Built-in feature ranking
- Each notebook is self-contained and can be run independently
- Datasets are included in the repository for easy execution
- Code includes comments explaining key concepts
- Tree visualizations help understand decision-making processes
- Hyperparameter tuning sections show impact on model performance
Feel free to fork this repository, add improvements, or suggest enhancements. This is an educational project meant to help others learn decision tree algorithms.
- scikit-learn Decision Trees Documentation
- Decision Trees - Wikipedia
- Machine Learning textbooks and courses
This project is open source and available under the MIT License.
Created by: PR202111
Last Updated: 2026
Status: Active ✨
Happy Learning! 🌳