Tabular machine learning, end to end. Classification, regression, and the kind of feature work that decides whether a model is useful or just technically working.
This is the home for traditional ML — the gradient-boosted trees, regularized linear models, and feature-engineering pipelines that still win most real-world tabular problems. Each notebook follows a complete cycle: EDA → preprocessing → modeling → evaluation → reflection.
A highly imbalanced classification problem (fraud is ~0.17% of transactions). The notebook does not stop at 99% accuracy — it then digs into precision, recall, PR-AUC, and the operational question of "what is this model actually good for in production?" Resampling strategies, threshold tuning, and an honest discussion of the cost of false negatives.
Kaggle's House Prices Advanced Regression dataset done thoroughly. Feature engineering, target transformation (log of SalePrice), encoding strategy, and an ensemble of regularized linear and gradient-boosted models. R² = 0.9337 with calibrated uncertainty.
A regression problem on flood-risk forecasting, written as a teaching notebook. Walks through every feature, every transformation, and every modeling decision so a junior data scientist can replicate the approach on a similar problem.
A classification problem on apple quality. Smaller dataset, so the notebook focuses on robust validation (proper cross-validation, not just train/test split) and avoiding the overfitting that often catches people off-guard on small tabular problems.
Predicting student success / dropout from academic and demographic features. The notebook treats this like a real-world deployment scenario — what features are actionable for an educator vs. what's just noise, and how do you build a model that suggests interventions rather than just verdicts.
A clean regression baseline on used-car listings. Categorical encoding, outlier handling, and a comparison of linear and tree-based models. A good starting point if you're new to regression problems.
A clean, simplified classification workflow that's deliberately stripped down to the essentials. Useful as a "first model" template — no exotic tricks, just the pipeline done correctly.
Python · scikit-learn · XGBoost · LightGBM · CatBoost · pandas · NumPy · Matplotlib · Seaborn
Each notebook is standalone with its dataset linked from Kaggle. To run locally:
git clone https://github.com/samanfatima7/machine-learning-classical.git
cd machine-learning-classical
pip install -r requirements.txt
jupyter notebookDeep learning gets the headlines, but most production tabular problems are still won by gradient-boosted trees with thoughtful feature engineering. These notebooks are deliberately not flashy — they're the bread and butter, and they're the work that actually pays off when you're solving a real business problem with messy data.
Saman Fatima — Kaggle Grandmaster, data scientist from Pakistan. More work on Kaggle · LinkedIn.
⭐ if you found something useful, and reach out if you want to collaborate.