Repository for the Applied Machine Learning course (WBAI065-05) at the University of Groningen.
We use uv for project management.
- Clone the project.
- Synchronise the project.
uv sync- Create a copy of
example.config.yamland rename it toconfig.yaml. Update the configuration, if desired.
The project can be run via a CLI, for convenient usage and testing.
uv run -m src.data.download [--force]--force: Forces a redownload of the data, in the event of missing or corrupted raw data. Defaults toFalse.
This requires a Kaggle API token to be set up on your device: https://www.kaggle.com/settings/api
Dataset: https://www.kaggle.com/datasets/tolgadincer/labeled-chest-xray-images
- Extract the archive and place it in a directory named "
DATA_DIR/raw/" (ex.data/raw/<the extracted folder>). - Run the download script for automated reorganisation.
uv run -m src.features.preprocess_data [--pipeline] [--lgb-size]--pipeline: Chooses which pipeline to run:pytorch,lightgbm,all. Running thepytorchpipeline is required in order to run thelightgbmpipeline. Defaults toall.--lgb-size: Determines the edge size for downsampling in LightGBM feature extraction. Defaults to 64.
uv run -m src.training.train --model <model_name> [options]--model: The model architecture to train:cnn,resnet,lgbm.--epochs: Number of training epochs. Defaults dynamically.--batch-size: Batch size for PyTorch models. Defaults to 32.--lr: Learning rate. Defaults dynamically.--patience: Epochs to wait for improvement before early stopping. Defaults to 3.--num-leaves: Number of leaves for LightGBM. Defaults to 31.--max-depth: Maximum tree depth for LightGBM. Defaults to -1.- ``--weight-decay`: Weight decay for PyTorch models. Defaults to 0.0.
--device: Device for PyTorch models (cuda,mps,cpu). Defaults to auto-detection.
uv run -m src.training.cv --model <model_name> [options]--model: The model to cross-validate:cnn,resnet,lgbm.--splits: Number of folds (k). Defaults to 5.--epochs: Number of training epochs. Defaults dynamically.- ``-batch-size`: Batch size for PyTorch models. Defaults to 32.
--lr: Learning rate. Defaults dynamically.--weight-decay: Weight decay for PyTorch models. Defaults to 0.0.--device: Device for PyTorch models (cuda,mps,cpu). Defaults to auto-detection.--grid-search: Enable hyperparameter grid search cross-validation.
uv run tensorboard --logdir logs/tensorboarduv run pytest tests