Brief demand-prediction pipeline for Bengaluru traffic using CatBoost and a compact feature-engineering pipeline.
Overview
- Goal: Predict
demandfor geohash/time combinations using engineered time features, target encodings, and a CatBoost regressor. - Main script:
traffic.py— contains preprocessing, model validation, final training, and submission generation (submission.csv).
Requirements
- Python 3.8+ and packages listed in
requirements.txt.
Run (local)
- Place
train.csvandtest.csvin the project root (same directory astraffic.py) - Create and activate a virtual environment, then install dependencies:
python -m venv venv
venv\\Scripts\\activate # Windows
source venv/bin/activate # macOS / Linux
pip install -r requirements.txt
- Run the pipeline:
python traffic.py
What to expect:
- The script will validate a model on a hold-out split, train a final CatBoost model on the full training set, and write
submission.csvcontainingIndexand predicteddemand.
CLI usage:
python traffic.py --train train.csv --test test.csv --output submission.csv
- Defaults:
--traindefaults to./train.csv,--testdefaults to./test.csv, and--outputdefaults tosubmission.csv. - You can also tune iterations used by CatBoost at runtime:
python traffic.py --iterations 3000 --val-iterations 1500
Notes & Recommendations
- The original notebook was developed in Colab and used
/content/train.csvand/content/test.csv. If running locally, change those paths to./train.csvand./test.csv(or to their full paths). traffic.pycurrently reads/writes CSVs directly and uses CatBoost categorical features. If you plan to productionize, consider parameterizing file paths and hyperparameters.- Suggested reading for better understanding - Presentation
Files
traffic.py— main pipeline and model training.requirements.txt— Python dependencies.
Contact / Origin
- Original notebook: exported from a Colab notebook (author contact available in the script header).