This project builds a collaborative filtering recommender system using the Steam 200k dataset and PySpark MLlib ALS.
The dataset contains:
- member_id
- game
- behavior
- hoursOfPlay
- Python
- PySpark
- Databricks
- MLflow
- Matplotlib
- Data preprocessing
- Exploratory data analysis
- Model training using ALS
- Hyperparameter tuning
- Model evaluation using RMSE
- Generating recommendations
The final tuned ALS model achieved an RMSE of 1.23054 on log-transformed play hours.
steam_recommender_system.ipynb– full project notebookdata/steam_200k.csv– datasetREADME.md– project documentation
Open the notebook in Databricks or Jupyter and update the dataset path if needed.