A modular, open-source framework for building, training, and deploying machine-learning interatomic potentials — from quick experiments to production-scale distributed workflows.
Python 3.14 · GIL-free interpreter with true multithreading for data loading and preprocessing.
🔵 Getting started · 🟢 Core workflow · 🟡 Advanced features · 🟣 Infrastructure
GOAL (General Open Atomistic Laboratory) is a modular framework for training machine-learning interatomic potentials (MLIPs). Built on PyTorch Lightning 2.6+ and Hydra, it provides:
| Feature | Details | |
|---|---|---|
| 🔬 | Equivariant & invariant backbones | HyperSpec (E(3)-equivariant) and SchNet-like invariant GNN |
| 🧱 | Modular & monolithic models | Backbone→head pipeline (HyperSpec, InvariantGNN, DeepSet, HyperSet, LucidSet) or self-contained monolithic models for external architectures |
| 🎯 | Multiple task heads | Energy, forces, stress, dipole, direct forces, generic scalar, multi-head |
| 🧠 | Foundation model adapters | MACE and FairChem (UMA) pre-trained models |
| 📂 | Flexible data loading | XYZ, HDF5, LMDB, ASE trajectory; multi-file merge, directory-based loading, auto-splitting |
| ⚡ | Distributed training | DDP, FSDP, FSDP2 (ModelParallel), DeepSpeed ZeRO (Stages 1/2/3 + CPU offload) |
| 🧮 | Configurable loss | Per-property loss type (MSE, MAE, Huber, Smooth L1) + composite sub-losses |
| 🔧 | Strategy factory | Unified build_strategy(cfg) for all distributed strategies |
| 🚀 | Performance engineering | TF32, cuDNN benchmark, torch.compile, gradient accumulation, EMA/SWA |
| 📊 | Experiment management | Hydra config composition · W&B · TensorBoard · CSV · MLflow · Neptune · Aim · Comet |
| 🖥️ | SLURM-aware | Auto checkpoint resumption and completion sentinels |
| 🧪 | ASE integration | Use any trained model as an ASE Calculator for MD, geometry optimisation, phonons |
| 🐍 | Python 3.14 | GIL-free interpreter with real multithreading for data loading |
| 🔬 | Mini Trainer | Standalone notebook-friendly training loop for rapid prototyping on extracted features |
| 🧰 | Custom training loops | Three levels of loop customisation: GOALModule hooks, Fabric-based multi-GPU, or pure PyTorch |
Package layout · The top-level namespace is
goal. The ML training module lives atgoal.ml:from goal.ml.training.module import GOALModule from goal.ml.data.datamodule import GOALDataModule from goal.ml.utils.calculator import GOALCalculator
git clone https://github.com/Nourollah/GOAL.git
cd GOAL
pip install -e .Optional extras:
pip install -e ".[mace]" # MACE adapter
pip install -e ".[fairchem]" # FairChem/UMA adapter
pip install -e ".[deepspeed]" # DeepSpeed ZeRO strategies
pip install -e ".[all]" # All optional dependencies
pip install -e ".[dev]" # pytest, ruff, mypy💡 What is pixi?
Pixi is a fast, cross-platform package manager built on top of conda-forge. It manages both conda and pip dependencies in a single lockfile, giving you:
- Reproducible environments — a
pixi.lockpins every package version (conda and pip) - Named environments — switch between CPU, CUDA, dev, and adapter-specific setups instantly
- No
conda activate— justpixi run <task>orpixi shell - Fast solves — written in Rust; resolves environments in seconds
Install pixi (one-liner):
curl -fsSL https://pixi.sh/install.sh | bashOr see the official installation guide for Homebrew, Windows, and other methods.
GOAL ships a pixi workspace configuration in pyproject.toml. After installing pixi:
pixi install # default (CPU)
pixi install -e cuda # CUDA 12+
pixi install -e dev # CPU + dev tools (pytest, ruff, mypy)
pixi install -e dev-cuda # CUDA + dev tools
pixi install -e fairchem # CUDA + FairChem adapter
pixi install -e cuda-deepspeed # CUDA + DeepSpeedNote: Some optional dependencies have compatibility constraints:
- MACE adapter — pins
e3nn==0.4.4which conflicts with the coree3nn>=0.5requirement. Install viapip install -e ".[mace]"instead.- Ray Tune / Optuna — no Python 3.14 wheels yet. Install via
pip install -e ".[tune]"on Python ≤3.13.
📁 Click to expand full project tree
├── configs/ # Hydra configuration groups
│ ├── train.yaml # Training defaults composition
│ ├── eval.yaml # Evaluation defaults composition
│ ├── callbacks/ # Callback configs (checkpoint, EMA, SWA, …)
│ ├── data/ # Dataset configs (xyz, hdf5, lmdb, trajectory, benchmarks)
│ ├── hparams_search/ # Hyperparameter search (basic, ray_tune, wandb_sweep)
│ ├── logger/ # Logger configs (wandb, tensorboard, csv, …)
│ ├── model/ # Model configs (hyperspec, invariant_gnn)
│ ├── strategy/ # Strategy configs (ddp, fsdp, fsdp2, deepspeed_*)
│ ├── trainer/ # Trainer configs (gpu, ddp, fsdp, model_parallel, …)
│ ├── training/ # Training hyperparameters (optimizer, EMA, losses, …)
│ ├── paths/ # Path definitions
│ └── hydra/ # Hydra runtime settings
├── src/
│ └── goal/ # Top-level namespace package
│ └── ml/ # ML training module
│ ├── cli/ # Entry points: train, evaluate, finetune, tune
│ ├── data/ # DataModule, datasets (xyz, hdf5, lmdb, trajectory, concat)
│ ├── nn/ # Neural network components
│ │ ├── models/ # Backbones: HyperSpec, InvariantGNN, DeepSet, HyperSet, LucidSet; MonolithicExample
│ │ ├── heads/ # Task heads: energy, forces, stress, dipole, scalar, multi
│ │ ├── blocks/ # Building blocks: embedding, interaction, readout
│ │ └── primitives/ # Low-level ops: tensor products, radial basis, norms
│ ├── adapters/ # Foundation model wrappers: MACE, FairChem
│ ├── training/ # LightningModule, loss, EMA, tuning
│ │ ├── callbacks/ # Checkpoint, logging callbacks
│ │ └── strategies/ # Strategy factory: DDP, FSDP, FSDP2, DeepSpeed
│ ├── utils/ # ASE calculator, feature extraction, mini trainer
│ └── registry.py # Lazy component registry
├── examples/
│ └── datasets/ # Benchmark dataset loaders (MD17, ANI-1, QM9, SPICE)
├── notebooks/ # Tutorials & demos (getting started, feature extraction, mini trainer)
├── scripts/ # SLURM job scripts
├── tests/ # Test suite
├── data/ # Dataset storage
├── logs/ # Training outputs (checkpoints, metrics)
└── pyproject.toml # Package metadata + pixi workspace config
Train the default model (HyperSpec) on XYZ data:
goal-train data.root=/path/to/datasetOr equivalently via module:
python -m goal.ml.cli.train data.root=/path/to/datasetThis loads configs/train.yaml which composes: data=xyz, model=hyperspec, training=default, trainer=default.
📓 New to GOAL? Work through
notebooks/getting_started.ipynb— a step-by-step tutorial covering model building (modular & monolithic), dataset loading, training, and using trained models as ASE calculators.
goal-train trainer=gpu data.root=/path/to/datasetThe gpu trainer config sets accelerator: gpu and devices: 1.
Distributed Data Parallel — replicates the full model on each GPU and synchronizes gradients. Use when the model fits in a single GPU's memory.
goal-train trainer=ddp data.root=/path/to/datasetOverride the number of GPUs:
goal-train trainer=ddp trainer.devices=8Multi-node:
goal-train trainer=ddp trainer.devices=4 trainer.num_nodes=2DDP key settings (in configs/trainer/ddp.yaml):
find_unused_parameters: false— settrueif you have frozen layersstatic_graph: false— settruefor models with fixed computation graphs (faster)gradient_as_bucket_view: true— minor memory optimisationsync_batchnorm: true— synchronize batch norm statistics across GPUs
Fully Sharded Data Parallel — shards model parameters, gradients, and optimizer states across GPUs. Use when the model doesn't fit in a single GPU's memory.
goal-train trainer=fsdp data.root=/path/to/datasetFSDP settings (in configs/trainer/fsdp.yaml):
auto_wrap_policy— controls how modules are wrapped for shardingactivation_checkpointing_policy— trade compute for memory by recomputing activationscpu_offload: false— offload parameters to CPU (slower, saves GPU memory)precision: "bf16-mixed"— recommended for FSDP on Ampere+ GPUs
ModelParallelStrategy (Lightning 2.4+) — supports FSDP2, tensor parallelism, torch.compile, and FP8. Recommended for very large models (500M+ parameters).
goal-train trainer=model_parallel data.root=/path/to/datasetOr via the strategy factory:
goal-train +strategy=fsdp2 data.root=/path/to/datasetDeepSpeed ZeRO enables training of very large models by partitioning optimizer states, gradients, and parameters across GPUs. Requires pip install -e ".[deepspeed]".
ZeRO Stage 1 — optimizer state partitioning only (lowest communication overhead):
goal-train +strategy=deepspeed_zero1 data.root=/path/to/datasetZeRO Stage 2 — optimizer state + gradient partitioning:
goal-train +strategy=deepspeed_zero2 data.root=/path/to/datasetZeRO Stage 3 — full parameter partitioning (maximum memory savings):
goal-train +strategy=deepspeed_zero3 data.root=/path/to/datasetZeRO Stage 3 + CPU offload — offload parameters to CPU (for extremely large models):
goal-train +strategy=deepspeed_zero3_offload data.root=/path/to/dataset📋 DeepSpeed configuration options
# configs/strategy/deepspeed_zero3.yaml
name: deepspeed_zero3
stage: 3
allgather_bucket_size: 200_000_000
reduce_bucket_size: 200_000_000
logging_level: WARNINGGOAL provides a unified strategy factory (build_strategy()) that maps config to Lightning strategies. When cfg.strategy is present, it takes priority over the trainer's built-in strategy.
# Two ways to select a strategy:
# 1. Via trainer config group (backward compatible):
goal-train trainer=ddp
# 2. Via strategy config group (new, more options):
goal-train +strategy=fsdp2
goal-train +strategy=deepspeed_zero3_offload| Strategy Config | Lightning Strategy | Use Case |
|---|---|---|
ddp |
DDPStrategy |
Model fits on one GPU |
fsdp |
FSDPStrategy |
Model too large for one GPU |
fsdp2 |
ModelParallelStrategy |
Very large models, torch.compile |
deepspeed_zero1 |
DeepSpeedStrategy (stage 1) |
Optimizer state partitioning |
deepspeed_zero2 |
DeepSpeedStrategy (stage 2) |
+ gradient partitioning |
deepspeed_zero3 |
DeepSpeedStrategy (stage 3) |
Full parameter partitioning |
deepspeed_zero3_offload |
DeepSpeedStrategy (stage 3) |
+ CPU offload |
goal-train trainer=mps data.root=/path/to/datasetgoal-train trainer=cpu data.root=/path/to/datasetIf training is interrupted (crash, preemption, timeout), GOAL automatically resumes from the latest checkpoint. auto_resume is enabled by default — on every launch the framework scans previous run directories for the most recent last.ckpt that matches the current dataset + model combination:
logs/train/runs/
2026-04-08_10-30-00_xyz_deepset/checkpoints/last.ckpt ← found & resumed
2026-04-07_09-00-00_xyz_deepset/checkpoints/last.ckpt ← older, skipped
2026-04-08_11-00-00_xyz_hyperspec/checkpoints/last.ckpt ← different model, ignored
Lightning restores the full training state (model weights, optimiser, scheduler, epoch counter, dataloader position) so training continues exactly where it left off.
Override behaviour:
# Disable auto-resume (always start fresh)
goal-train auto_resume=false
# Resume from a specific checkpoint (takes priority over auto-resume)
goal-train ckpt_path=/path/to/specific/checkpoint.ckptEvaluate a trained checkpoint on the test split:
goal-eval ckpt_path=/path/to/checkpoint.ckpt data.root=/path/to/datasetThe evaluation entry point supports the same trainer configs for distributed evaluation:
goal-eval trainer=ddp ckpt_path=/path/to/checkpoint.ckpt data.root=/path/to/datasetAny trained GOAL model can be used as an ASE Calculator for molecular dynamics, geometry optimisation, phonons, and more.
from goal.ml.utils.calculator import GOALCalculator
calc = GOALCalculator(checkpoint_path="logs/train/runs/.../last.ckpt")from goal.ml.utils.calculator import GOALCalculator
calc = GOALCalculator(module=my_module, cutoff=5.0, device="cuda")from ase.build import molecule
atoms = molecule("H2O")
atoms.calc = calc
energy = atoms.get_potential_energy() # eV
forces = atoms.get_forces() # eV/Å
stress = atoms.get_stress() # eV/ų (Voigt, 6-component)from ase.optimize import BFGS
opt = BFGS(atoms)
opt.run(fmax=0.01)from ase.md.langevin import Langevin
from ase.md.velocitydistribution import MaxwellBoltzmannDistribution
from ase import units
MaxwellBoltzmannDistribution(atoms, temperature_K=300)
dyn = Langevin(atoms, 1.0 * units.fs, temperature_K=300, friction=0.01)
dyn.run(1000)| Parameter | Default | Description |
|---|---|---|
checkpoint_path |
— | Path to .ckpt file (mutually exclusive with module) |
module |
— | Pre-loaded GOALModule instance |
cutoff |
from config | Neighbour-list cutoff (Å). Auto-detected from checkpoint |
device |
"cpu" |
"cpu", "cuda", "cuda:0", etc. |
dtype |
float64 |
Precision for positions and cell |
head |
None |
Multi-head tag for multi-task models |
Fine-tune a pre-trained foundation model on a downstream dataset:
goal-finetune model.backbone.name=mace-large model.backbone.pretrained=true data.root=/path/to/dataset1. Pre-trained hub model:
goal-finetune model.backbone.name=mace-large model.backbone.pretrained=true model.backbone.variant=large2. Local checkpoint:
goal-finetune model.backbone.name=mace-large model.backbone.local_checkpoint=/path/to/model.pt3. Fresh backbone (train from scratch):
goal-finetune model.backbone.name=mace-largegoal-finetune training.freeze_backbone=true model.backbone.name=mace-large model.backbone.pretrained=trueUse the backbone finetuning callback:
goal-finetune callbacks=backbone_finetuning model.backbone.name=mace-large model.backbone.pretrained=trueThis freezes the backbone initially, then unfreezes at epoch 10 with a reduced learning rate (10% of head LR).
| Adapter | Registry Names | Source |
|---|---|---|
| MACE | mace-large, mace-medium, mace-small |
mace-torch |
| FairChem/UMA | uma-small |
fairchem-core |
| Format | Config | File Types | Description |
|---|---|---|---|
| ExtXYZ | data=xyz |
.xyz, .extxyz |
ASE-readable extended XYZ files |
| HDF5 | data=hdf5 |
.h5, .hdf5 |
Pre-processed atomic graphs with random access |
| LMDB | data=lmdb |
data.mdb |
FairChem/OCP-compatible format |
| Trajectory | data=trajectory |
.traj |
ASE trajectory files from MD simulations |
GOAL supports four data loading modes, automatically detected from the config:
Point to a single directory. GOAL first looks for named split files (train.xyz, val.xyz, test.xyz). If those don't exist, it loads everything and splits by ratio.
goal-train data.root=/path/to/dataset# configs/data/xyz.yaml
data:
dataset_type: xyz
root: ${paths.data_dir}
split_ratio: [0.8, 0.1, 0.1]
split_seed: 42Specify separate file lists for train, validation, and test. Each split can load from multiple files.
goal-train data.train_paths='[/data/A/train.xyz,/data/B/train.xyz]' \
data.val_paths='[/data/A/val.xyz]' \
data.test_paths='[/data/A/test.xyz]'Or in a config file:
data:
dataset_type: xyz
train_paths:
- /data/dataset_A/train.xyz
- /data/dataset_B/train.xyz
val_paths:
- /data/dataset_A/val.xyz
test_paths:
- /data/dataset_A/test.xyzProvide a list of roots. All datasets are loaded, merged into one, then split by ratio.
data:
dataset_type: xyz
root:
- /data/dataset_A
- /data/dataset_B
- /data/dataset_C
merge_strategy: random
split_ratio: [0.8, 0.1, 0.1]
split_seed: 42Point to directories containing data files. All matching files (.xyz, .extxyz, .h5, .hdf5, .lmdb, .traj, .db) inside each directory are automatically discovered and loaded.
goal-train data.train_dir=/data/train/ \
data.val_dir=/data/val/ \
data.test_dir=/data/test/data:
dataset_type: xyz
train_dir: /data/splits/train/
val_dir: /data/splits/val/
test_dir: /data/splits/test/ # optionalTip: Mode 4 is ideal when you have pre-organized split directories. Files are loaded in sorted order for reproducibility.
When loading multiple files (Mode 2 or Mode 3), datasets are merged using one of two strategies:
| Strategy | Behaviour |
|---|---|
sequential |
Concatenate datasets in order (default) |
random |
Shuffle all indices after concatenation (seed-controlled) |
goal-train data.merge_strategy=random data.split_seed=123When splits aren't provided explicitly, GOAL splits the dataset numerically:
data:
split_ratio: [0.8, 0.1, 0.1] # train / val / test
split_seed: 42 # reproducible splitsA two-element ratio creates train/val only (no test split):
data:
split_ratio: [0.9, 0.1] # train / val onlyAll data configs support these performance options:
data:
batch_size: 32
num_workers: 4 # parallel data loading workers
pin_memory: true # pin tensors in CPU memory for faster GPU transfer
persistent_workers: true # keep workers alive between epochs
prefetch_factor: 2 # batches prefetched per workerReady-to-use benchmark datasets for training and evaluating MLIPs. Completely optional — the core framework works without them.
| Dataset | Structures | Elements | Properties | Size | Config |
|---|---|---|---|---|---|
| MD17 | ~10k/mol | H, C, N, O | energy, forces | ~100 MB | data=md17_aspirin |
| rMD17 | ~10k/mol | H, C, N, O | energy, forces | ~100 MB | data=rmd17_aspirin |
| ANI-1 | ~20M | H, C, N, O | energy, forces | ~30 GB | data=ani1 |
| ANI-1x | ~5M | H, C, N, O | energy, forces | ~7 GB | data=ani1x |
| QM9 | 134k | H, C, N, O, F | 19 properties | ~1 GB | data=qm9 |
| SPICE | ~1.1M | 10 elements | energy, forces | ~15 GB | — |
# Train on MD17 aspirin
goal-train data=md17_aspirin
# Train on ANI-1x, subsample 50k for quick experiment
goal-train data=ani1x data.max_structures=50000
# Train on QM9 predicting HOMO-LUMO gap
goal-train data=qm9 data.target=gap
# Override cutoff
goal-train data=md17_aspirin data.cutoff=6.0Install optional dependencies for SPICE (HDF5):
pip install -e ".[examples]"See examples/datasets/README.md for full documentation, citations, and unit conversion details.
GOAL supports two model paradigms:
| Paradigm | How it works | Config | Best for |
|---|---|---|---|
| Modular | Backbone → NodeFeatures → Head → property dict | model.backbone + model.head |
Mixing backbones and heads freely |
| Monolithic | Model → property dict directly | model.backbone + model.head: null |
External self-contained architectures |
Modular models separate the backbone (feature extraction) from the head (property prediction). Any backbone can be paired with any compatible head. All built-in backbones (HyperSpec, InvariantGNN, DeepSet, HyperSet, LucidSet) are modular.
Monolithic models handle everything internally — embedding, interaction, readout, and property prediction — in a single forward() call. They return a dictionary of predicted properties directly (the same format heads produce). Set head: null in the config to use a monolithic model. This capability exists for users who want to bring their own self-contained architecture and use GOAL's training infrastructure without adopting the backbone→head split.
The MonolithicModel protocol in goal.ml.nn.models.base defines the contract: forward(graph) → dict[str, Tensor] and an output_keys property declaring which keys consumers can expect.
E(3)-equivariant graph neural network using spherical harmonics and tensor products.
goal-train model=hyperspecKey parameters:
hidden_channels: 128— feature dimensionnum_interactions: 3— message passing layerslmax: 2— maximum spherical harmonics ordercutoff: 5.0— interaction radius (Å)num_radial_basis: 8— radial basis functions
Output irreps: 128x0e+128x1o+128x2e (scalars + vectors + rank-2 tensors)
SchNet-like invariant backbone using only scalar features. Faster than equivariant models; use for baselines or when equivariance isn't needed.
goal-train model=invariant_gnnOutput irreps: 128x0e (scalars only)
Edge-based invariant backbone inspired by the SCAI project. Embeds atoms, expands edge distances with Bessel radial basis, projects source/target atoms and distances into a shared feature space, applies an edge interaction MLP, and scatter-aggregates to per-node invariant features.
goal-train model=deepsetKey parameters:
embedding_dim: 128— atomic embedding dimensionhidden_channels: 128— feature dimensionnum_filters: 128— projected feature space sizenum_radial_basis: 20— Bessel radial basis functionstransform_depth: 2— layers in projection MLPscutoff: 5.0— interaction radius (Å)
Output irreps: 128x0e (scalars only)
Not implemented. The original SCAI HyperSet was intended to route edge features through atom-type–specific expert MLPs, but the implementation never diverged from DeepSet. Instantiation raises
NotImplementedError. Usedeepsetinstead.
# Will raise NotImplementedError at instantiation
goal-train model=hypersetNot implemented. The pairwise distance-binned mixture-of-experts approach requires an external atom-references dictionary and creates O(Z² × bins) expert modules, which does not scale within GOAL's paradigm. Instantiation raises
NotImplementedError. Usedeepsetinstead.
# Will raise NotImplementedError at instantiation
goal-train model=lucidsetMonolithic models bypass the backbone→head split. They take an AtomicGraph and return a property dictionary directly. This capability is provided for external users who want to bring their own self-contained architecture and use GOAL's training loop.
A minimal demonstration model that embeds atoms, applies a small MLP readout to obtain per-atom energy contributions, sums to total energy, and derives forces via autograd.
goal-train model=monolithic_exampleNote: This is intentionally simplistic. For real tasks, use a modular backbone + head combination.
Create a model that satisfies the MonolithicModel protocol:
import torch
import torch.nn as nn
from goal.ml.data.graph import AtomicGraph
from goal.ml.registry import MODEL_REGISTRY
@MODEL_REGISTRY.register("my_monolithic")
class MyModel(nn.Module):
def __init__(self, cutoff: float = 5.0) -> None:
super().__init__()
# Use any GOAL modules: AtomicNumberEmbedding, BesselBasis, etc.
...
@property
def output_keys(self) -> list[str]:
return ["energy", "forces"]
def forward(self, graph: AtomicGraph) -> dict[str, torch.Tensor]:
# Compute everything internally, return property dict
return {"energy": energy, "forces": forces}Then in the config, set head: null:
model:
backbone:
name: my_monolithic
cutoff: 5.0
head: nullTask-specific output heads registered via the head registry. Used with modular backbones — monolithic models set head: null and skip the head entirely.
| Head | Description | Config key |
|---|---|---|
energy_forces |
Energy prediction + force via autograd | head.name: energy_forces |
energy |
Energy prediction only | head.name: energy |
direct_forces |
Direct force prediction (no autograd) | head.name: direct_forces |
stress |
Stress tensor prediction | head.name: stress |
dipole |
Dipole moment prediction | head.name: dipole |
scalar |
Generic scalar property (any name) | head.name: scalar |
multi |
Compose multiple heads | head.name: multi |
Override the head:
goal-train model.head.name=stress model.head.compute_stress=trueThe scalar head predicts any per-structure scalar property. The property_name parameter sets the output key (and must match the target key on the graph):
head:
name: scalar
irreps_in: "128x0e"
hidden_dim: 64
property_name: band_gap # ← becomes the key in predictions dict
reduction: mean # "mean" (intensive) or "sum" (extensive)The multi head composes several sub-heads that share the same backbone features. Each sub-head independently produces its output keys, which are merged into a single dictionary:
model:
backbone:
name: invariant_gnn
hidden_channels: 128
# ...
head:
name: multi
heads:
- name: energy_forces
irreps_in: "128x0e"
hidden_dim: 64
- name: scalar
irreps_in: "128x0e"
hidden_dim: 64
property_name: homo
reduction: mean
- name: scalar
irreps_in: "128x0e"
hidden_dim: 64
property_name: lumo
reduction: meanEach sub-head has its own readout MLP, so they learn separate representations for each property. The corresponding losses reference the property names:
training:
losses:
- name: energy
weight: 4.0
- name: forces
weight: 100.0
- name: scalar_property
property_name: homo
weight: 1.0
- name: scalar_property
property_name: lumo
weight: 1.0See configs/model/invariant_gnn_qm9.yaml for a complete QM9 multi-property example.
Each property loss supports a configurable loss function via the fn parameter:
| Key | Function | Notes |
|---|---|---|
mse |
Mean Squared Error | Default — good for smooth regression |
mae / l1 |
Mean Absolute Error | Robust to outliers |
rmse |
Root Mean Squared Error | Penalises large errors more than MAE |
huber |
Huber Loss | Combines MSE + MAE (delta = 1.0) |
smooth_l1 |
Smooth L1 | Like Huber with beta = 1.0 |
Configure per-property in configs/training/default.yaml:
losses:
- name: energy
weight: 4.0
fn: mse # ← loss function
- name: forces
weight: 1.0
fn: huber # ← robust to noisy forces
- name: stress
weight: 0.01
fn: maeUse multiple loss functions simultaneously for the same property, each with its own weight and separate logging panel:
losses:
- name: energy
weight: 4.0
fn: mse
- name: forces
fn: # ← list of sub-losses
- name: mse
weight: 4.0
- name: rmse
weight: 8.0This produces five logged metrics in W&B / TensorBoard:
| Logged metric | Description |
|---|---|
train/energy |
Energy MSE × 4.0 |
train/forces_mse |
Forces MSE × 4.0 |
train/forces_rmse |
Forces RMSE × 8.0 |
train/forces |
Sum of forces sub-losses |
train/total |
Grand total |
Each sub-loss gets its own chart in W&B automatically.
Use any callable via a dotted import path:
losses:
- name: forces
fn:
- name: mse
weight: 4.0
- name: torchmetrics.functional.mean_squared_error
weight: 2.0Install torchmetrics first: pip install -e ".[torchmetrics]"
Override from the CLI:
# Switch forces loss to MAE
goal-train 'training.losses=[{name: energy, weight: 4.0, fn: mse}, {name: forces, weight: 1.0, fn: mae}]'Tip: Use
huberormaefor forces when your dataset has noisy DFT reference forces — they're more robust to outliers than MSE.
Adapters wrap pre-trained foundation models (MACE, FairChem/UMA) as GOAL backbones. They translate between the foundation model's interface and GOAL's backbone protocol.
# Fine-tune MACE-large
goal-finetune model.backbone.name=mace-large model.backbone.pretrained=true
# Fine-tune UMA-small
goal-finetune model.backbone.name=uma-small model.backbone.pretrained=trueInstall adapter dependencies:
pip install -e ".[mace]" # for MACE adapters
pip install -e ".[fairchem]" # for FairChem/UMA adaptersExtract intermediate node features from any backbone for downstream analysis, transfer learning, or custom heads.
Attach forward hooks to interaction blocks — works with any model whose layers are a nn.ModuleList:
from goal.ml.utils.extraction import HookBasedExtractor
with HookBasedExtractor(model, blocks_attr="interactions", output_index=0) as ext:
output = model(batch)
features = ext.captured # {"layer_0": Tensor, "layer_1": Tensor, ...}| Wrapper | Description |
|---|---|
LayerBackbone |
Returns features from a single interaction layer |
MultiScaleBackbone |
Concatenates features from multiple layers |
FrozenBackbone |
Freezes all backbone parameters for feature extraction |
from goal.ml.utils.extraction import extract_scalars, extract_irrep_channels, pool_nodes
scalars = extract_scalars(node_feats, irreps) # l=0 channels only
channels = extract_irrep_channels(node_feats, irreps) # dict by irrep type
graph_feats = pool_nodes(node_feats, batch_idx) # per-graph poolingRegistered as Hydra targets for zero-code feature extraction:
backbone:
_target_: goal.ml.utils.extraction._build_mace_large_final # last layer
# or: goal.ml.utils.extraction._build_mace_large_multiscale # all layers
# or: goal.ml.utils.extraction._build_mace_large_frozen # frozen weightsA standalone, lightweight training loop for rapid prototyping in Jupyter notebooks. Completely decoupled from the Lightning / Hydra pipeline — operates on raw PyTorch primitives.
Typical workflow:
- Freeze a foundation model (MACE, FairChem, etc.) and extract representations
- Cache extracted features as a
TensorDataset - Train a downstream head with
MiniTrainer— iterate fast without re-running the backbone
from goal.ml.utils.mini_trainer import MiniTrainer
trainer = MiniTrainer(
model=my_head,
loss_fn=torch.nn.MSELoss(),
optimizer=torch.optim.Adam(my_head.parameters(), lr=1e-3),
device="auto",
)
history = trainer.fit(train_loader, val_loader=val_loader, epochs=50)
history.plot() # loss curves in the notebook| Feature | Description |
|---|---|
| Early stopping | Stop when validation loss plateaus (early_stopping_patience) |
| Best checkpoint | In-memory best model state, restore with trainer.load_best() |
| LR scheduling | Any PyTorch scheduler (ReduceLROnPlateau, cosine, etc.) |
| Gradient clipping | Max-norm clipping via grad_clip parameter |
| Progress bars | tqdm.auto progress bars per epoch |
| History | TrainingHistory with .plot(), .best_val_loss, .best_epoch |
| Prediction | trainer.predict(loader) returns (preds, targets) tensors |
| Custom step | Plug in step_fn for AtomicGraph batches or arbitrary logic |
For training on graph data with CompositeLoss, use the built-in graph_step:
from goal.ml.utils.mini_trainer import MiniTrainer, graph_step
trainer = MiniTrainer(
model=my_backbone_plus_head,
loss_fn=composite_loss,
optimizer=optimizer,
step_fn=graph_step, # handles AtomicGraph batches
)
history = trainer.fit(graph_train_loader, graph_val_loader, epochs=50)See notebooks/mini_trainer_demo.ipynb for a complete walkthrough — from feature extraction to model evaluation with parity plots.
GOAL provides three levels of training loop customisation, from least to most control:
| Level | Tool | Multi-GPU | Loop Control | Best For |
|---|---|---|---|---|
| 1 | GOALModule hooks + callbacks | ✅ | Partial — override hooks | Standard workflows with minor tweaks |
| 2 | FabricTrainer | ✅ | Full — write your own for loop |
Custom optimisation, multi-optimiser, GAN-style |
| 3 | MiniTrainer | ❌ | Full — pure PyTorch | Quick notebook prototyping on extracted features |
The standard Lightning path. Subclass GOALModule and override any hook:
from goal.ml.training.module import GOALModule
class MyModule(GOALModule):
"""Custom training step with auxiliary loss."""
def training_step(self, batch, batch_idx):
predictions = self(batch)
losses = self.loss(predictions, batch)
# --- Your custom logic here ---
aux_loss = self.compute_auxiliary_loss(predictions, batch)
losses["total"] = losses["total"] + 0.1 * aux_loss
# --------------------------------
self.log_dict(
{f"train/{k}": v for k, v in losses.items()},
batch_size=batch.num_graphs, sync_dist=True,
)
return losses["total"]Register it in Hydra and use the standard goal-train CLI as usual.
What you can override:
| Hook | When it runs |
|---|---|
training_step(batch, batch_idx) |
Each training batch |
validation_step(batch, batch_idx) |
Each validation batch |
configure_optimizers() |
Optimizer + scheduler setup |
configure_model() |
Pre-training model transforms (compile, FSDP wrap) |
on_before_optimizer_step(optimizer) |
Before each optimizer step (gradient clipping) |
on_train_batch_end(outputs, batch, batch_idx) |
After each training step (EMA update) |
You can also inject logic via Lightning callbacks without subclassing:
from lightning import Callback
class GradientMonitorCallback(Callback):
def on_before_optimizer_step(self, trainer, pl_module, optimizer):
grad_norm = torch.nn.utils.clip_grad_norm_(pl_module.parameters(), float("inf"))
pl_module.log("grad_norm", grad_norm)When Lightning hooks are not enough — you need full control over the for loop and distributed training. Built on Lightning Fabric.
from goal.ml.utils.fabric_trainer import FabricTrainer, graph_fabric_step
ft = FabricTrainer(
model=my_model,
loss_fn=composite_loss,
optimizer=optimizer,
train_loader=train_loader,
val_loader=val_loader,
# --- Distributed config (same options as Lightning Trainer) ---
accelerator="gpu",
strategy="ddp", # or "fsdp", "deepspeed", etc.
devices=4,
precision="bf16-mixed",
# --- Loop options ---
step_fn=graph_fabric_step,
grad_clip=10.0,
grad_accumulation_steps=4,
)
history = ft.fit(epochs=100, early_stopping_patience=20)Or write the loop from scratch using the setup_fabric() helper:
from goal.ml.utils.fabric_trainer import setup_fabric
fabric = setup_fabric(strategy="ddp", devices=4, precision="bf16-mixed")
model, optimizer = fabric.setup(model, optimizer)
train_loader = fabric.setup_dataloaders(train_loader)
for epoch in range(100):
model.train()
for batch in train_loader:
optimizer.zero_grad()
predictions = model(batch)
losses = loss_fn(predictions, batch)
fabric.backward(losses["total"])
# Your custom logic — anything goes:
if epoch > 50:
fabric.clip_gradients(model, optimizer, max_norm=1.0)
optimizer.step()
# Validation, logging, checkpointing — all under your control
fabric.save("checkpoint.pt", {"model": model, "optimizer": optimizer})FabricTrainer features:
| Feature | Description |
|---|---|
| Multi-GPU / multi-node | DDP, FSDP, DeepSpeed — same strategies as Lightning |
| Mixed precision | bf16, fp16, fp64 |
| Gradient accumulation | Efficient sync-skipping via fabric.no_backward_sync() |
| Gradient clipping | fabric.clip_gradients() |
| Checkpointing | save_checkpoint() / load_checkpoint() — handles sharded saves |
| Early stopping | Built-in patience counter |
| History | Reuses TrainingHistory from MiniTrainer (.plot(), .best_val_loss) |
Single-device, no Lightning dependency at all. Ideal for notebook prototyping on pre-extracted features. See the Mini Trainer section above.
Need multi-GPU?
├── No → MiniTrainer (Level 3)
└── Yes
├── Standard loop is fine, just need custom loss/hook? → GOALModule (Level 1)
└── Need full loop control? → FabricTrainer (Level 2)
On Ampere+ GPUs (A100, H100, RTX 30xx/40xx), TF32 tensor cores provide ~3× speedup for float32 operations with negligible precision loss:
# configs/training/default.yaml
training:
performance:
float32_matmul_precision: high # "highest" = fp32, "high" = TF32+fp32, "medium" = TF32Auto-tunes convolution algorithms for fixed input sizes:
training:
performance:
cudnn_benchmark: true
cudnn_deterministic: false # set true only for debuggingCompile the backbone with torch.compile for faster training (PyTorch 2.0+):
goal-train training.compile_model=trueConfigure compilation mode:
training:
compile_model: true
compile:
mode: default # 'default', 'reduce-overhead', 'max-autotune'
fullgraph: false # true = compile the entire graph (faster, stricter)
dynamic: null # null, true, false — dynamic shape supportgoal-train trainer.precision=bf16-mixed # bfloat16 (Ampere+, recommended)
goal-train trainer.precision=16-mixed # float16
goal-train trainer.precision=64-true # double precisionSimulate larger batch sizes without increasing GPU memory:
goal-train trainer.accumulate_grad_batches=4 # effective batch = batch_size × 4Or use the dynamic scheduler callback:
goal-train callbacks=grad_accumulationMaintains a shadow copy of weights for more stable evaluation:
training:
ema:
enabled: true
decay: 0.999Alternative to EMA — averages weights during the last portion of training:
goal-train callbacks=swaBefore the first training epoch, Lightning runs a short validation sanity check to catch data loading, metric computation, or model errors early. This is enabled by default:
# configs/trainer/default.yaml
num_sanity_val_steps: 2 # run 2 val batches before training
# 0 = skip, -1 = full validation setOverride from the command line:
# Skip sanity check (faster startup)
goal-train trainer.num_sanity_val_steps=0
# Full validation run before training (thorough check)
goal-train trainer.num_sanity_val_steps=-1GOAL provides three levels of hyperparameter optimisation, all fully config-driven.
Built-in learning rate and batch size auto-discovery. Zero extra dependencies.
goal-tune hparams_search=basic# configs/hparams_search/basic.yaml
hparams_search:
method: tuner
tuner:
lr_find: true # find optimal learning rate
scale_batch_size: true # find max batch size that fits in memoryFull hyperparameter search with ASHA early stopping, Optuna Bayesian optimisation, or Population-Based Training. Requires optional dependencies.
pip install -e ".[tune]" # installs ray[tune] + optuna
goal-tune hparams_search=ray_tuneExample Ray Tune config
# configs/hparams_search/ray_tune.yaml
hparams_search:
method: ray
num_samples: 20
max_epochs: 100
metric: val/total
mode: min
scheduler: asha
search_algorithm: optuna
search_space:
training.optimizer.lr:
type: loguniform
lower: 1.0e-5
upper: 1.0e-2
training.optimizer.weight_decay:
type: loguniform
lower: 1.0e-8
upper: 1.0e-3
training.ema.decay:
type: uniform
lower: 0.99
upper: 0.9999| Scheduler | Description |
|---|---|
asha |
Asynchronous Successive Halving — prunes bad trials early (recommended) |
pbt |
Population-Based Training — mutates hyperparams during training |
| Search algorithm | Description |
|---|---|
null |
Random search (no extra deps) |
optuna |
Bayesian optimisation via Optuna |
hyperopt |
Tree-structured Parzen Estimators |
Cloud-managed hyperparameter search via Weights & Biases. Supports Bayesian, grid, and random search with Hyperband early termination. Requires W&B (already a core dependency).
goal-tune hparams_search=wandb_sweepExample W&B Sweep config
# configs/hparams_search/wandb_sweep.yaml
hparams_search:
method: wandb
project: goal
sweep_method: bayes # 'bayes', 'grid', 'random'
metric: val/total
mode: min
count: 20
early_terminate:
type: hyperband
min_iter: 10
eta: 3
parameters:
training.optimizer.lr:
distribution: log_uniform_values
min: 1.0e-5
max: 1.0e-2
training.optimizer.weight_decay:
distribution: log_uniform_values
min: 1.0e-8
max: 1.0e-3Resume an existing sweep:
goal-tune hparams_search=wandb_sweep hparams_search.sweep_id=<SWEEP_ID>| Sweep method | Description |
|---|---|
bayes |
Bayesian optimisation (Gaussian process) — recommended |
grid |
Exhaustive grid search |
random |
Random search |
The default callback group (callbacks=default) includes:
- ModelCheckpoint — save top-k checkpoints by validation loss, plus
last.ckpt - EarlyStopping — stop training after 100 epochs with no improvement
- RichModelSummary — rich-formatted model summary
- RichProgressBar — rich-formatted training progress
| Callback | Config | Description |
|---|---|---|
| Stochastic Weight Averaging | callbacks=swa |
Average weights during late training |
| Backbone Finetuning | callbacks=backbone_finetuning |
Gradual unfreezing for fine-tuning |
| Gradient Accumulation Scheduler | callbacks=grad_accumulation |
Dynamic accumulation steps |
Override callback parameters:
goal-train callbacks.model_checkpoint.save_top_k=5
goal-train callbacks.early_stopping.patience=200GOAL supports all Lightning loggers. Enable via the logger config group:
goal-train logger=wandb
goal-train logger=tensorboard
goal-train logger=csv| Logger | Config | Notes |
|---|---|---|
| Weights & Biases | logger=wandb |
Project: goal, requires wandb login |
| TensorBoard | logger=tensorboard |
Saves to output_dir/tensorboard/ |
| CSV | logger=csv |
Simple CSV file logging |
| MLflow | logger=mlflow |
MLflow tracking server |
| Neptune | logger=neptune |
Requires NEPTUNE_API_TOKEN |
| Aim | logger=aim |
Local .aim repo, open with aim up |
| Comet | logger=comet |
Comet.ml experiment tracking |
Use multiple loggers:
goal-train logger=wandb,csvEvery run is automatically named with a timestamp + dataset + model pattern:
{date}_{time}_{dataset_type}_{model_backbone}
For example: 2026-04-09_14-30-45_xyz_hyperspec
This naming is applied consistently to:
- Output directories (
logs/train/runs/...) - Logger run names (W&B, TensorBoard, MLflow, etc.)
- Hydra sweep directories
Override the name from the CLI:
goal-train run_name=my_custom_experimentGOAL uses Hydra for composable configuration. Every aspect of training is controlled by YAML config files that can be overridden from the command line.
| Group | Path | Options |
|---|---|---|
| Data | configs/data/ |
xyz, hdf5, lmdb, trajectory, md17_aspirin, md17_ethanol, rmd17_aspirin, ani1, ani1x, qm9 |
| Model | configs/model/ |
hyperspec, invariant_gnn, deepset, hypersetlucidsetmonolithic_example |
| Trainer | configs/trainer/ |
default, gpu, ddp, fsdp, model_parallel, cpu, mps, ddp_sim |
| Training | configs/training/ |
default |
| Strategy | configs/strategy/ |
ddp, fsdp, fsdp2, deepspeed_zero1, deepspeed_zero2, deepspeed_zero3 |
| Callbacks | configs/callbacks/ |
default, none, swa, backbone_finetuning, grad_accumulation |
| Logger | configs/logger/ |
wandb, tensorboard, csv, mlflow, neptune, aim, comet |
| Hparams Search | configs/hparams_search/ |
basic, ray_tune, wandb_sweep |
# Change model and data format
goal-train model=invariant_gnn data=hdf5
# Override nested parameters
goal-train training.optimizer.lr=0.0005 training.ema.decay=0.9999
# Change loss weights
goal-train training.losses.0.weight=1.0 training.losses.1.weight=50.0
# Multi-run sweep
goal-train -m training.optimizer.lr=0.001,0.0005,0.0001
# Disable callbacks
goal-train callbacks=noneEach run creates a timestamped output directory:
logs/train/runs/2026-04-09_14-30-45_xyz_hyperspec/
├── checkpoints/
│ ├── epoch_001.ckpt
│ └── last.ckpt
├── train.log
└── .hydra/
├── config.yaml # resolved config
├── hydra.yaml
└── overrides.yaml # command-line overrides
| Command | Description |
|---|---|
goal-train |
Train a model |
goal-eval |
Evaluate a checkpoint on test data |
goal-finetune |
Fine-tune a pre-trained model |
goal-tune |
Hyperparameter search (LR finder, Ray Tune, W&B Sweeps) |
All commands accept Hydra overrides:
goal-train trainer=ddp data=hdf5 model=invariant_gnn logger=wandb seed=42Module-based invocation (equivalent):
python -m goal.ml.cli.train trainer=ddp data=hdf5
python -m goal.ml.cli.evaluate ckpt_path=/path/to/ckpt
python -m goal.ml.cli.finetune model.backbone.pretrained=true
python -m goal.ml.cli.tune hparams_search=basicIf using pixi as your environment manager, these tasks are available:
| Task | Command | Description |
|---|---|---|
pixi run train |
python -m goal.ml.cli.train |
Train a model |
pixi run eval |
python -m goal.ml.cli.evaluate |
Evaluate a checkpoint |
pixi run finetune |
python -m goal.ml.cli.finetune |
Fine-tune a model |
pixi run test |
pytest -k 'not slow' |
Run fast tests |
pixi run test-full |
pytest |
Run all tests |
pixi run lint |
ruff check src/ tests/ |
Lint code |
pixi run format |
ruff format src/ tests/ |
Format code |
pixi run typecheck |
mypy src/goal/ml/ |
Type check |
pixi run clean |
— | Remove build artifacts |
pixi run clean-logs |
rm -rf logs/** |
Remove training logs |
Pass Hydra overrides through pixi:
pixi run train trainer=ddp data.root=/path/to/dataUse the cuda-deepspeed environment for DeepSpeed training:
pixi run -e cuda-deepspeed train strategy=deepspeed_zero2| Package | Version |
|---|---|
| Python | 3.14.4 |
| PyTorch | 2.10.0 |
| Lightning | 2.6.1 |
| e3nn | 0.6.0 |
| PyG (torch-geometric) | 2.7.0 |
| Hydra | 1.3.2 |
| ASE | 3.28.0 |
| W&B | 0.25.1 |
| Rich | 13.9.4 |
This project is licensed under the MIT License.