A CRISP-DM-driven supervised classification project on a 42K+-item Amazon Electronics sales dataset (2025) to predict best-seller status and surface the product-level features most predictive of best-seller likelihood.
Final project for DS1312 — Data Mining, BS Data Science, University of Asia and the Pacific. Group: Riego + Camacho.
Full title: Predicting Amazon Electronics Best-Sellers: Identifying Key Features Through Machine Learning
notebook/ Main project notebook (end-to-end pipeline)
data/
raw/ Original + cleaned Amazon Electronics datasets (~36-39 MB each)
ABOUT THE DATASET.pdf
documents/ Final paper, project proposal, earlier iterations, sample PDF
- Business understanding — "what makes an Amazon Electronics product a best-seller?"
- Data understanding — profile 42K+ rows; identify key product features.
- Data preparation — cleaning, encoding, feature engineering.
- Modeling — supervised classification (multiple algorithms compared).
- Evaluation — metrics, feature-importance interpretation.
- Deployment / Communication — paper + presentation.
Python · pandas · scikit-learn · matplotlib · seaborn
Course final project, completed Jan–Apr 2026.
David Nathaniel P. Riego · BS Data Science, UA&P (Aug 2023 – Aug 2027 expected) · LinkedIn