Code for performance benchmarking of Llama model finetuning
Systems tested:
- Della (Princeton HPC)
- Snellius (Dutch HPC)
- OSSC (Dutch secure HPC)
cbs/contains a README I sent along the model uploaddella-scripts,snellius-scriptsandossc-scriptscontain scripts for running code on the respective systems. They also have READMEs describing details for running the scriptsplots/contains some code from Matt for comparing the runs across systemsrequirements/contains files related to managing dependenciesconfigs/contains configuration files for torchtune.datasets/contains training datasets.alpaca_data_cleaned.jsoncontains text that is fed to the model for updating the parameters.- The dataset is licensed under
datasets/LICENSE, while the remaining code in this repository falls under./LICENSE.
| Batch Size 6 Comparison On: | Della | Snellius | OSSC |
|---|---|---|---|
| 1 A100 | 19422 | 17500 | 17700 |
| 2 A100s | 18247 | 16500 | 16500 |
| 4 A100s | 18019 | 16400 | 16400 |
| 1 H100 | 36668 | 31100 | 31000 |
| 2 H100s | 34228 | 28800 | 28600 |
| 4 H100s | 33650 | 28600 | 28500 |
The difference between Snellius and Della is down to memory clock speeds:
| Della | Snellius | |
|---|---|---|
| A100 | 1600 MHz | 1215 MHz |
| H100 | 2600 MHz | 1590 MHz |
-
Create an account on Weights & Biases.
-
Download the foundation model and place it in the
modelsdirectory. This project uses the Llama-3.2-1B-Instruct model. -
Install the required Python dependencies listed in
requirements/baseline.txt. -
Adapt the SLURM job scripts to match your system configuration.
-
First run the models in an interactive mode so that it prompts for Weights & Biases login details.
-
Run SLURM jobs.