Predicts whether the home team wins an NBA game from a minimal set of box‑score stats.
Tech stack: Python · Pandas · Scikit‑learn · Google Colab
| Data | 26 000+ games (2004 – 2020) from public box-score archives |
|---|---|
| Pre-processing | Full-row NaN purge · automatic binary Win label generation |
| Features | FG % (home & away) · rebounds (home) · assists (home) |
| Model |
Random Forest, 100 trees, max_depth=None
|
| Validation | Stratified 20 % hold-out · 79 % accuracy |
| Diagnostics | Confusion-matrix heat map to verify class balance |
| Extras | Match-up Simulator – plug your stats ⇒ instant W/L call |
Click the badge or run:
!git clone https://github.com/dcheongsee/NBA‑Predictor.gitOpen NBA_Predictor.ipynb in Colab and execute Run All – the notebook rebuilds the entire pipeline end‑to‑end in under two minutes on free tier hardware.
python -m venv .venv && source .venv/bin/activate
pip install pandas scikit-learn matplotlib notebook
jupyter notebook NBA_Predictor.ipynb.
├── NBA_Predictor.ipynb # Reproducible workflow
| # CLI + Match‑up simulator
├── nba_games.csv # cached CSV
└── README.md
- Accuracy 79 %
- F1‑score 0.78
See NBA_Predictor.ipynb for full metrics.
A compact but end‑to‑end example of sports analytics: raw CSV ingestion through cleaning, feature engineering, modelling and live inference. Reproducible in a single notebook.
MIT-licensed. Keep the copyright header, otherwise do whatever you like