This repository contains the Data Storm 7.0 hackathon pipeline for Team Syndicate.
The original notebook work has been converted into runnable Python scripts under scripts/, and main.py ties them together into one end-to-end flow.
Large generated files are intentionally excluded from git. The repo keeps small showcase samples under *.sample.csv, while the real pipeline outputs are created locally and ignored by .gitignore.
Create a new Conda environment or update an existing one, then install dependencies.
To create a fresh environment:
conda create -n data-storm-7 python=3.14 -y
conda activate data-storm-7If you already have the environment, update it from the project file:
conda env update -f environment.yml --pruneIf you want to create the environment directly from environment.yml, use:
conda env create -f environment.ymlIf Kaggle is not available in your current channels, install it from conda-forge:
conda install -y -c conda-forge kaggleAuthenticate before downloading the competition files:
kaggle auth loginRun the downloader:
python dataset_downloader.pyThis downloads the competition archive into downloads/ and stages the required Bronze CSV files into data/bronze/.
If the Bronze files are already staged, run the pipeline like this:
python main.py --skip-downloadIf you want the script to download everything first, run:
python main.pyWhat this does:
- builds the Silver tables from the notebook logic converted into
scripts/silver_pipeline.py - builds the Gold dataset from
scripts/gold_pipeline.py - writes the rebuilt Silver files into
data/silver/ - writes the model-ready Gold file to
data/gold/gold_final_v1.csv
Use the Gold dataset as the model input. The final submission or prediction CSV should be written to outputs/, for example:
outputs/team_syndicate_predictions.csv
That keeps the data layers separate:
data/gold/for model-ready featuresoutputs/for final predictions and submission files
The pipeline outputs are written here:
data/silver/data/gold/gold_final_v1.csvoutputs/
If you need the tracked showcase version of the Gold file, use data/gold/gold_final_v1.sample.csv.
To create a portable environment file for sharing or reproducing this setup, export the active Conda environment without build strings:
conda env export --no-builds > environment.ymlUsing --no-builds helps make the exported environment.yml more portable across platforms and different Conda setups.