Data Storm 7.0 - Team Syndicate

This repository contains the Data Storm 7.0 hackathon pipeline for Team Syndicate.

The original notebook work has been converted into runnable Python scripts under scripts/, and main.py ties them together into one end-to-end flow.

Large generated files are intentionally excluded from git. The repo keeps small showcase samples under *.sample.csv, while the real pipeline outputs are created locally and ignored by .gitignore.

Quick Start

1. Set up the environment

Create a new Conda environment or update an existing one, then install dependencies.

To create a fresh environment:

conda create -n data-storm-7 python=3.14 -y
conda activate data-storm-7

If you already have the environment, update it from the project file:

conda env update -f environment.yml --prune

If you want to create the environment directly from environment.yml, use:

conda env create -f environment.yml

If Kaggle is not available in your current channels, install it from conda-forge:

conda install -y -c conda-forge kaggle

2. Log in to Kaggle

Authenticate before downloading the competition files:

kaggle auth login

3. Download the dataset

Run the downloader:

python dataset_downloader.py

This downloads the competition archive into downloads/ and stages the required Bronze CSV files into data/bronze/.

4. Run the full pipeline

If the Bronze files are already staged, run the pipeline like this:

python main.py --skip-download

If you want the script to download everything first, run:

python main.py

What this does:

builds the Silver tables from the notebook logic converted into scripts/silver_pipeline.py
builds the Gold dataset from scripts/gold_pipeline.py
writes the rebuilt Silver files into data/silver/
writes the model-ready Gold file to data/gold/gold_final_v1.csv

5. Train and predict

Use the Gold dataset as the model input. The final submission or prediction CSV should be written to outputs/, for example:

outputs/team_syndicate_predictions.csv

That keeps the data layers separate:

data/gold/ for model-ready features
outputs/ for final predictions and submission files

6. Find the outputs

The pipeline outputs are written here:

data/silver/
data/gold/gold_final_v1.csv
outputs/

If you need the tracked showcase version of the Gold file, use data/gold/gold_final_v1.sample.csv.

Environment export

To create a portable environment file for sharing or reproducing this setup, export the active Conda environment without build strings:

conda env export --no-builds > environment.yml

Using --no-builds helps make the exported environment.yml more portable across platforms and different Conda setups.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.vscode		.vscode
data		data
docs		docs
guidelines		guidelines
notebooks		notebooks
outputs		outputs
scripts		scripts
.gitignore		.gitignore
README.md		README.md
dataset_downloader.py		dataset_downloader.py
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Storm 7.0 - Team Syndicate

Quick Start

1. Set up the environment

2. Log in to Kaggle

3. Download the dataset

4. Run the full pipeline

5. Train and predict

6. Find the outputs

Environment export

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Storm 7.0 - Team Syndicate

Quick Start

1. Set up the environment

2. Log in to Kaggle

3. Download the dataset

4. Run the full pipeline

5. Train and predict

6. Find the outputs

Environment export

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages