⚛️ GOAL

Your atoms. Your rules. Your laboratory.

A modular, open-source framework for building, training, and deploying machine-learning interatomic potentials — from quick experiments to production-scale distributed workflows.

Python 3.14 · GIL-free interpreter with true multithreading for data loading and preprocessing.

🗺️ Navigation

🔵 Getting started · 🟢 Core workflow · 🟡 Advanced features · 🟣 Infrastructure

🔵 Start Here	🟢 Train & Evaluate	🟡 Go Deeper	🟣 Under the Hood
Overview	Training	Models & Heads	Configuration System
Installation	Evaluation	Loss Functions	Logging
Project Structure	ASE Calculator	Foundation Model Adapters	Callbacks
Quick Start	Fine-Tuning	Feature Extraction	CLI Reference
Tutorial Notebook	Data Loading	Performance Engineering	Pixi Tasks
	Benchmark Datasets	Hyperparameter Tuning
		Mini Trainer
		Customising the Training Loop

🔵 Overview

GOAL (General Open Atomistic Laboratory) is a modular framework for training machine-learning interatomic potentials (MLIPs). Built on PyTorch Lightning 2.6+ and Hydra, it provides:

	Feature	Details
🔬	Equivariant & invariant backbones	HyperSpec (E(3)-equivariant) and SchNet-like invariant GNN
🧱	Modular & monolithic models	Backbone→head pipeline (HyperSpec, InvariantGNN, DeepSet, HyperSet, LucidSet) or self-contained monolithic models for external architectures
🎯	Multiple task heads	Energy, forces, stress, dipole, direct forces, generic scalar, multi-head
🧠	Foundation model adapters	MACE and FairChem (UMA) pre-trained models
📂	Flexible data loading	XYZ, HDF5, LMDB, ASE trajectory; multi-file merge, directory-based loading, auto-splitting
⚡	Distributed training	DDP, FSDP, FSDP2 (ModelParallel), DeepSpeed ZeRO (Stages 1/2/3 + CPU offload)
🧮	Configurable loss	Per-property loss type (MSE, MAE, Huber, Smooth L1) + composite sub-losses
🔧	Strategy factory	Unified `build_strategy(cfg)` for all distributed strategies
🚀	Performance engineering	TF32, cuDNN benchmark, `torch.compile`, gradient accumulation, EMA/SWA
📊	Experiment management	Hydra config composition · W&B · TensorBoard · CSV · MLflow · Neptune · Aim · Comet
🖥️	SLURM-aware	Auto checkpoint resumption and completion sentinels
🧪	ASE integration	Use any trained model as an ASE Calculator for MD, geometry optimisation, phonons
🐍	Python 3.14	GIL-free interpreter with real multithreading for data loading
🔬	Mini Trainer	Standalone notebook-friendly training loop for rapid prototyping on extracted features
🧰	Custom training loops	Three levels of loop customisation: GOALModule hooks, Fabric-based multi-GPU, or pure PyTorch

Package layout · The top-level namespace is goal. The ML training module lives at goal.ml:
from goal.ml.training.module import GOALModule
from goal.ml.data.datamodule import GOALDataModule
from goal.ml.utils.calculator import GOALCalculator

🔵 Installation

With pip

git clone https://github.com/Nourollah/GOAL.git
cd GOAL
pip install -e .

Optional extras:

pip install -e ".[mace]"       # MACE adapter
pip install -e ".[fairchem]"   # FairChem/UMA adapter
pip install -e ".[deepspeed]"  # DeepSpeed ZeRO strategies
pip install -e ".[all]"        # All optional dependencies
pip install -e ".[dev]"        # pytest, ruff, mypy

With pixi (recommended)

💡 What is pixi?

Pixi is a fast, cross-platform package manager built on top of conda-forge. It manages both conda and pip dependencies in a single lockfile, giving you:

Reproducible environments — a pixi.lock pins every package version (conda and pip)
Named environments — switch between CPU, CUDA, dev, and adapter-specific setups instantly
No conda activate — just pixi run <task> or pixi shell
Fast solves — written in Rust; resolves environments in seconds

Install pixi (one-liner):

curl -fsSL https://pixi.sh/install.sh | bash

Or see the official installation guide for Homebrew, Windows, and other methods.

GOAL ships a pixi workspace configuration in pyproject.toml. After installing pixi:

pixi install                    # default (CPU)
pixi install -e cuda            # CUDA 12+
pixi install -e dev             # CPU + dev tools (pytest, ruff, mypy)
pixi install -e dev-cuda        # CUDA + dev tools
pixi install -e fairchem        # CUDA + FairChem adapter
pixi install -e cuda-deepspeed  # CUDA + DeepSpeed

Note: Some optional dependencies have compatibility constraints:

MACE adapter — pins e3nn==0.4.4 which conflicts with the core e3nn>=0.5 requirement. Install via pip install -e ".[mace]" instead.

Ray Tune / Optuna — no Python 3.14 wheels yet. Install via pip install -e ".[tune]" on Python ≤3.13.

🔵 Project Structure

📁 Click to expand full project tree

├── configs/                        # Hydra configuration groups
│   ├── train.yaml                  #   Training defaults composition
│   ├── eval.yaml                   #   Evaluation defaults composition
│   ├── callbacks/                  #   Callback configs (checkpoint, EMA, SWA, …)
│   ├── data/                       #   Dataset configs (xyz, hdf5, lmdb, trajectory, benchmarks)
│   ├── hparams_search/             #   Hyperparameter search (basic, ray_tune, wandb_sweep)
│   ├── logger/                     #   Logger configs (wandb, tensorboard, csv, …)
│   ├── model/                      #   Model configs (hyperspec, invariant_gnn)
│   ├── strategy/                   #   Strategy configs (ddp, fsdp, fsdp2, deepspeed_*)
│   ├── trainer/                    #   Trainer configs (gpu, ddp, fsdp, model_parallel, …)
│   ├── training/                   #   Training hyperparameters (optimizer, EMA, losses, …)
│   ├── paths/                      #   Path definitions
│   └── hydra/                      #   Hydra runtime settings
├── src/
│   └── goal/                       # Top-level namespace package
│       └── ml/                     #   ML training module
│           ├── cli/                #     Entry points: train, evaluate, finetune, tune
│           ├── data/               #     DataModule, datasets (xyz, hdf5, lmdb, trajectory, concat)
│           ├── nn/                 #     Neural network components
│           │   ├── models/         #       Backbones: HyperSpec, InvariantGNN, DeepSet, HyperSet, LucidSet; MonolithicExample
│           │   ├── heads/          #       Task heads: energy, forces, stress, dipole, scalar, multi
│           │   ├── blocks/         #       Building blocks: embedding, interaction, readout
│           │   └── primitives/     #       Low-level ops: tensor products, radial basis, norms
│           ├── adapters/           #     Foundation model wrappers: MACE, FairChem
│           ├── training/           #     LightningModule, loss, EMA, tuning
│           │   ├── callbacks/      #       Checkpoint, logging callbacks
│           │   └── strategies/     #       Strategy factory: DDP, FSDP, FSDP2, DeepSpeed
│           ├── utils/              #     ASE calculator, feature extraction, mini trainer
│           └── registry.py         #     Lazy component registry
├── examples/
│   └── datasets/                   # Benchmark dataset loaders (MD17, ANI-1, QM9, SPICE)
├── notebooks/                      # Tutorials & demos (getting started, feature extraction, mini trainer)
├── scripts/                        # SLURM job scripts
├── tests/                          # Test suite
├── data/                           # Dataset storage
├── logs/                           # Training outputs (checkpoints, metrics)
└── pyproject.toml                  # Package metadata + pixi workspace config

🔵 Quick Start

Train the default model (HyperSpec) on XYZ data:

goal-train data.root=/path/to/dataset

Or equivalently via module:

python -m goal.ml.cli.train data.root=/path/to/dataset

This loads configs/train.yaml which composes: data=xyz, model=hyperspec, training=default, trainer=default.

📓 New to GOAL? Work through notebooks/getting_started.ipynb — a step-by-step tutorial covering model building (modular & monolithic), dataset loading, training, and using trained models as ASE calculators.

🟢 Training

Single GPU

goal-train trainer=gpu data.root=/path/to/dataset

The gpu trainer config sets accelerator: gpu and devices: 1.

Multi-GPU: DDP

Distributed Data Parallel — replicates the full model on each GPU and synchronizes gradients. Use when the model fits in a single GPU's memory.

goal-train trainer=ddp data.root=/path/to/dataset

Override the number of GPUs:

goal-train trainer=ddp trainer.devices=8

Multi-node:

goal-train trainer=ddp trainer.devices=4 trainer.num_nodes=2

DDP key settings (in configs/trainer/ddp.yaml):

find_unused_parameters: false — set true if you have frozen layers
static_graph: false — set true for models with fixed computation graphs (faster)
gradient_as_bucket_view: true — minor memory optimisation
sync_batchnorm: true — synchronize batch norm statistics across GPUs

Multi-GPU: FSDP

Fully Sharded Data Parallel — shards model parameters, gradients, and optimizer states across GPUs. Use when the model doesn't fit in a single GPU's memory.

goal-train trainer=fsdp data.root=/path/to/dataset

FSDP settings (in configs/trainer/fsdp.yaml):

auto_wrap_policy — controls how modules are wrapped for sharding
activation_checkpointing_policy — trade compute for memory by recomputing activations
cpu_offload: false — offload parameters to CPU (slower, saves GPU memory)
precision: "bf16-mixed" — recommended for FSDP on Ampere+ GPUs

Multi-GPU: FSDP2 / ModelParallel

ModelParallelStrategy (Lightning 2.4+) — supports FSDP2, tensor parallelism, torch.compile, and FP8. Recommended for very large models (500M+ parameters).

goal-train trainer=model_parallel data.root=/path/to/dataset

Or via the strategy factory:

goal-train +strategy=fsdp2 data.root=/path/to/dataset

Multi-GPU: DeepSpeed

DeepSpeed ZeRO enables training of very large models by partitioning optimizer states, gradients, and parameters across GPUs. Requires pip install -e ".[deepspeed]".

ZeRO Stage 1 — optimizer state partitioning only (lowest communication overhead):

goal-train +strategy=deepspeed_zero1 data.root=/path/to/dataset

ZeRO Stage 2 — optimizer state + gradient partitioning:

goal-train +strategy=deepspeed_zero2 data.root=/path/to/dataset

ZeRO Stage 3 — full parameter partitioning (maximum memory savings):

goal-train +strategy=deepspeed_zero3 data.root=/path/to/dataset

ZeRO Stage 3 + CPU offload — offload parameters to CPU (for extremely large models):

goal-train +strategy=deepspeed_zero3_offload data.root=/path/to/dataset

📋 DeepSpeed configuration options

# configs/strategy/deepspeed_zero3.yaml
name: deepspeed_zero3
stage: 3
allgather_bucket_size: 200_000_000
reduce_bucket_size: 200_000_000
logging_level: WARNING

Strategy Factory

GOAL provides a unified strategy factory (build_strategy()) that maps config to Lightning strategies. When cfg.strategy is present, it takes priority over the trainer's built-in strategy.

# Two ways to select a strategy:
# 1. Via trainer config group (backward compatible):
goal-train trainer=ddp

# 2. Via strategy config group (new, more options):
goal-train +strategy=fsdp2
goal-train +strategy=deepspeed_zero3_offload

Strategy Config	Lightning Strategy	Use Case
`ddp`	`DDPStrategy`	Model fits on one GPU
`fsdp`	`FSDPStrategy`	Model too large for one GPU
`fsdp2`	`ModelParallelStrategy`	Very large models, torch.compile
`deepspeed_zero1`	`DeepSpeedStrategy` (stage 1)	Optimizer state partitioning
`deepspeed_zero2`	`DeepSpeedStrategy` (stage 2)	+ gradient partitioning
`deepspeed_zero3`	`DeepSpeedStrategy` (stage 3)	Full parameter partitioning
`deepspeed_zero3_offload`	`DeepSpeedStrategy` (stage 3)	+ CPU offload

Apple Silicon (MPS)

goal-train trainer=mps data.root=/path/to/dataset

CPU

goal-train trainer=cpu data.root=/path/to/dataset

Resuming Training

If training is interrupted (crash, preemption, timeout), GOAL automatically resumes from the latest checkpoint. auto_resume is enabled by default — on every launch the framework scans previous run directories for the most recent last.ckpt that matches the current dataset + model combination:

logs/train/runs/
  2026-04-08_10-30-00_xyz_deepset/checkpoints/last.ckpt  ← found & resumed
  2026-04-07_09-00-00_xyz_deepset/checkpoints/last.ckpt  ← older, skipped
  2026-04-08_11-00-00_xyz_hyperspec/checkpoints/last.ckpt ← different model, ignored

Lightning restores the full training state (model weights, optimiser, scheduler, epoch counter, dataloader position) so training continues exactly where it left off.

Override behaviour:

# Disable auto-resume (always start fresh)
goal-train auto_resume=false

# Resume from a specific checkpoint (takes priority over auto-resume)
goal-train ckpt_path=/path/to/specific/checkpoint.ckpt

🟢 Evaluation

Evaluate a trained checkpoint on the test split:

goal-eval ckpt_path=/path/to/checkpoint.ckpt data.root=/path/to/dataset

The evaluation entry point supports the same trainer configs for distributed evaluation:

goal-eval trainer=ddp ckpt_path=/path/to/checkpoint.ckpt data.root=/path/to/dataset

🟢 ASE Calculator

Any trained GOAL model can be used as an ASE Calculator for molecular dynamics, geometry optimisation, phonons, and more.

From a Checkpoint

from goal.ml.utils.calculator import GOALCalculator

calc = GOALCalculator(checkpoint_path="logs/train/runs/.../last.ckpt")

From a Pre-loaded Module

from goal.ml.utils.calculator import GOALCalculator

calc = GOALCalculator(module=my_module, cutoff=5.0, device="cuda")

Single-Point Calculation

from ase.build import molecule

atoms = molecule("H2O")
atoms.calc = calc

energy = atoms.get_potential_energy()   # eV
forces = atoms.get_forces()             # eV/Å
stress = atoms.get_stress()             # eV/ų (Voigt, 6-component)

Geometry Optimisation

from ase.optimize import BFGS

opt = BFGS(atoms)
opt.run(fmax=0.01)

Molecular Dynamics

from ase.md.langevin import Langevin
from ase.md.velocitydistribution import MaxwellBoltzmannDistribution
from ase import units

MaxwellBoltzmannDistribution(atoms, temperature_K=300)
dyn = Langevin(atoms, 1.0 * units.fs, temperature_K=300, friction=0.01)
dyn.run(1000)

Parameter	Default	Description
`checkpoint_path`	—	Path to `.ckpt` file (mutually exclusive with `module`)
`module`	—	Pre-loaded `GOALModule` instance
`cutoff`	from config	Neighbour-list cutoff (Å). Auto-detected from checkpoint
`device`	`"cpu"`	`"cpu"`, `"cuda"`, `"cuda:0"`, etc.
`dtype`	`float64`	Precision for positions and cell
`head`	`None`	Multi-head tag for multi-task models

🟢 Fine-Tuning

Fine-tune a pre-trained foundation model on a downstream dataset:

goal-finetune model.backbone.name=mace-large model.backbone.pretrained=true data.root=/path/to/dataset

Backbone Loading Modes

1. Pre-trained hub model:

goal-finetune model.backbone.name=mace-large model.backbone.pretrained=true model.backbone.variant=large

2. Local checkpoint:

goal-finetune model.backbone.name=mace-large model.backbone.local_checkpoint=/path/to/model.pt

3. Fresh backbone (train from scratch):

goal-finetune model.backbone.name=mace-large

Freeze Backbone (Linear Probing)

goal-finetune training.freeze_backbone=true model.backbone.name=mace-large model.backbone.pretrained=true

Gradual Unfreezing

Use the backbone finetuning callback:

goal-finetune callbacks=backbone_finetuning model.backbone.name=mace-large model.backbone.pretrained=true

This freezes the backbone initially, then unfreezes at epoch 10 with a reduced learning rate (10% of head LR).

Available Adapters

Adapter	Registry Names	Source
MACE	`mace-large`, `mace-medium`, `mace-small`	`mace-torch`
FairChem/UMA	`uma-small`	`fairchem-core`

🟢 Data Loading

Supported Formats

Format	Config	File Types	Description
ExtXYZ	`data=xyz`	`.xyz`, `.extxyz`	ASE-readable extended XYZ files
HDF5	`data=hdf5`	`.h5`, `.hdf5`	Pre-processed atomic graphs with random access
LMDB	`data=lmdb`	`data.mdb`	FairChem/OCP-compatible format
Trajectory	`data=trajectory`	`.traj`	ASE trajectory files from MD simulations

Loading Modes

GOAL supports four data loading modes, automatically detected from the config:

Mode 1 — Single source, auto-split (default)

Point to a single directory. GOAL first looks for named split files (train.xyz, val.xyz, test.xyz). If those don't exist, it loads everything and splits by ratio.

goal-train data.root=/path/to/dataset

# configs/data/xyz.yaml
data:
  dataset_type: xyz
  root: ${paths.data_dir}
  split_ratio: [0.8, 0.1, 0.1]
  split_seed: 42

Mode 2 — Per-split paths

Specify separate file lists for train, validation, and test. Each split can load from multiple files.

goal-train data.train_paths='[/data/A/train.xyz,/data/B/train.xyz]' \
          data.val_paths='[/data/A/val.xyz]' \
          data.test_paths='[/data/A/test.xyz]'

Or in a config file:

data:
  dataset_type: xyz
  train_paths:
    - /data/dataset_A/train.xyz
    - /data/dataset_B/train.xyz
  val_paths:
    - /data/dataset_A/val.xyz
  test_paths:
    - /data/dataset_A/test.xyz

Mode 3 — Merged multi-source, auto-split

Provide a list of roots. All datasets are loaded, merged into one, then split by ratio.

data:
  dataset_type: xyz
  root:
    - /data/dataset_A
    - /data/dataset_B
    - /data/dataset_C
  merge_strategy: random
  split_ratio: [0.8, 0.1, 0.1]
  split_seed: 42

Mode 4 — Directory-based per-split

Point to directories containing data files. All matching files (.xyz, .extxyz, .h5, .hdf5, .lmdb, .traj, .db) inside each directory are automatically discovered and loaded.

goal-train data.train_dir=/data/train/ \
          data.val_dir=/data/val/ \
          data.test_dir=/data/test/

data:
  dataset_type: xyz
  train_dir: /data/splits/train/
  val_dir: /data/splits/val/
  test_dir: /data/splits/test/     # optional

Tip: Mode 4 is ideal when you have pre-organized split directories. Files are loaded in sorted order for reproducibility.

Merge Strategies

When loading multiple files (Mode 2 or Mode 3), datasets are merged using one of two strategies:

Strategy	Behaviour
`sequential`	Concatenate datasets in order (default)
`random`	Shuffle all indices after concatenation (seed-controlled)

goal-train data.merge_strategy=random data.split_seed=123

Auto-Splitting

When splits aren't provided explicitly, GOAL splits the dataset numerically:

data:
  split_ratio: [0.8, 0.1, 0.1]   # train / val / test
  split_seed: 42                   # reproducible splits

A two-element ratio creates train/val only (no test split):

data:
  split_ratio: [0.9, 0.1]         # train / val only

DataLoader Options

All data configs support these performance options:

data:
  batch_size: 32
  num_workers: 4              # parallel data loading workers
  pin_memory: true            # pin tensors in CPU memory for faster GPU transfer
  persistent_workers: true    # keep workers alive between epochs
  prefetch_factor: 2          # batches prefetched per worker

🟢 Benchmark Datasets

Ready-to-use benchmark datasets for training and evaluating MLIPs. Completely optional — the core framework works without them.

Dataset	Structures	Elements	Properties	Size	Config
MD17	~10k/mol	H, C, N, O	energy, forces	~100 MB	`data=md17_aspirin`
rMD17	~10k/mol	H, C, N, O	energy, forces	~100 MB	`data=rmd17_aspirin`
ANI-1	~20M	H, C, N, O	energy, forces	~30 GB	`data=ani1`
ANI-1x	~5M	H, C, N, O	energy, forces	~7 GB	`data=ani1x`
QM9	134k	H, C, N, O, F	19 properties	~1 GB	`data=qm9`
SPICE	~1.1M	10 elements	energy, forces	~15 GB	—

# Train on MD17 aspirin
goal-train data=md17_aspirin

# Train on ANI-1x, subsample 50k for quick experiment
goal-train data=ani1x data.max_structures=50000

# Train on QM9 predicting HOMO-LUMO gap
goal-train data=qm9 data.target=gap

# Override cutoff
goal-train data=md17_aspirin data.cutoff=6.0

Install optional dependencies for SPICE (HDF5):

pip install -e ".[examples]"

See examples/datasets/README.md for full documentation, citations, and unit conversion details.

🟡 Models

GOAL supports two model paradigms:

Paradigm	How it works	Config	Best for
Modular	Backbone → NodeFeatures → Head → property dict	`model.backbone` + `model.head`	Mixing backbones and heads freely
Monolithic	Model → property dict directly	`model.backbone` + `model.head: null`	External self-contained architectures

Modular models separate the backbone (feature extraction) from the head (property prediction). Any backbone can be paired with any compatible head. All built-in backbones (HyperSpec, InvariantGNN, DeepSet, HyperSet, LucidSet) are modular.

Monolithic models handle everything internally — embedding, interaction, readout, and property prediction — in a single forward() call. They return a dictionary of predicted properties directly (the same format heads produce). Set head: null in the config to use a monolithic model. This capability exists for users who want to bring their own self-contained architecture and use GOAL's training infrastructure without adopting the backbone→head split.

The MonolithicModel protocol in goal.ml.nn.models.base defines the contract: forward(graph) → dict[str, Tensor] and an output_keys property declaring which keys consumers can expect.

Modular Backbones

HyperSpec (equivariant)

E(3)-equivariant graph neural network using spherical harmonics and tensor products.

goal-train model=hyperspec

Key parameters:

hidden_channels: 128 — feature dimension
num_interactions: 3 — message passing layers
lmax: 2 — maximum spherical harmonics order
cutoff: 5.0 — interaction radius (Å)
num_radial_basis: 8 — radial basis functions

Output irreps: 128x0e+128x1o+128x2e (scalars + vectors + rank-2 tensors)

Invariant GNN

SchNet-like invariant backbone using only scalar features. Faster than equivariant models; use for baselines or when equivariance isn't needed.

goal-train model=invariant_gnn

Output irreps: 128x0e (scalars only)

DeepSet

Edge-based invariant backbone inspired by the SCAI project. Embeds atoms, expands edge distances with Bessel radial basis, projects source/target atoms and distances into a shared feature space, applies an edge interaction MLP, and scatter-aggregates to per-node invariant features.

goal-train model=deepset

Key parameters:

embedding_dim: 128 — atomic embedding dimension
hidden_channels: 128 — feature dimension
num_filters: 128 — projected feature space size
num_radial_basis: 20 — Bessel radial basis functions
transform_depth: 2 — layers in projection MLPs
cutoff: 5.0 — interaction radius (Å)

Output irreps: 128x0e (scalars only)

HyperSet ⚠️

Not implemented. The original SCAI HyperSet was intended to route edge features through atom-type–specific expert MLPs, but the implementation never diverged from DeepSet. Instantiation raises NotImplementedError. Use deepset instead.

# Will raise NotImplementedError at instantiation
goal-train model=hyperset

LucidSet ⚠️

Not implemented. The pairwise distance-binned mixture-of-experts approach requires an external atom-references dictionary and creates O(Z² × bins) expert modules, which does not scale within GOAL's paradigm. Instantiation raises NotImplementedError. Use deepset instead.

# Will raise NotImplementedError at instantiation
goal-train model=lucidset

Monolithic Models

Monolithic models bypass the backbone→head split. They take an AtomicGraph and return a property dictionary directly. This capability is provided for external users who want to bring their own self-contained architecture and use GOAL's training loop.

Monolithic Example

A minimal demonstration model that embeds atoms, applies a small MLP readout to obtain per-atom energy contributions, sums to total energy, and derives forces via autograd.

goal-train model=monolithic_example

Note: This is intentionally simplistic. For real tasks, use a modular backbone + head combination.

Implementing a New Monolithic Model

Create a model that satisfies the MonolithicModel protocol:

import torch
import torch.nn as nn
from goal.ml.data.graph import AtomicGraph
from goal.ml.registry import MODEL_REGISTRY

@MODEL_REGISTRY.register("my_monolithic")
class MyModel(nn.Module):
    def __init__(self, cutoff: float = 5.0) -> None:
        super().__init__()
        # Use any GOAL modules: AtomicNumberEmbedding, BesselBasis, etc.
        ...

    @property
    def output_keys(self) -> list[str]:
        return ["energy", "forces"]

    def forward(self, graph: AtomicGraph) -> dict[str, torch.Tensor]:
        # Compute everything internally, return property dict
        return {"energy": energy, "forces": forces}

Then in the config, set head: null:

model:
  backbone:
    name: my_monolithic
    cutoff: 5.0
  head: null

Heads

Task-specific output heads registered via the head registry. Used with modular backbones — monolithic models set head: null and skip the head entirely.

Head	Description	Config key
`energy_forces`	Energy prediction + force via autograd	`head.name: energy_forces`
`energy`	Energy prediction only	`head.name: energy`
`direct_forces`	Direct force prediction (no autograd)	`head.name: direct_forces`
`stress`	Stress tensor prediction	`head.name: stress`
`dipole`	Dipole moment prediction	`head.name: dipole`
`scalar`	Generic scalar property (any name)	`head.name: scalar`
`multi`	Compose multiple heads	`head.name: multi`

Override the head:

goal-train model.head.name=stress model.head.compute_stress=true

Generic Scalar Head

The scalar head predicts any per-structure scalar property. The property_name parameter sets the output key (and must match the target key on the graph):

head:
  name: scalar
  irreps_in: "128x0e"
  hidden_dim: 64
  property_name: band_gap   # ← becomes the key in predictions dict
  reduction: mean            # "mean" (intensive) or "sum" (extensive)

Multi-Head — Multiple Properties at Once

The multi head composes several sub-heads that share the same backbone features. Each sub-head independently produces its output keys, which are merged into a single dictionary:

model:
  backbone:
    name: invariant_gnn
    hidden_channels: 128
    # ...

  head:
    name: multi
    heads:
      - name: energy_forces
        irreps_in: "128x0e"
        hidden_dim: 64
      - name: scalar
        irreps_in: "128x0e"
        hidden_dim: 64
        property_name: homo
        reduction: mean
      - name: scalar
        irreps_in: "128x0e"
        hidden_dim: 64
        property_name: lumo
        reduction: mean

Each sub-head has its own readout MLP, so they learn separate representations for each property. The corresponding losses reference the property names:

training:
  losses:
    - name: energy
      weight: 4.0
    - name: forces
      weight: 100.0
    - name: scalar_property
      property_name: homo
      weight: 1.0
    - name: scalar_property
      property_name: lumo
      weight: 1.0

See configs/model/invariant_gnn_qm9.yaml for a complete QM9 multi-property example.

🟡 Loss Functions

Each property loss supports a configurable loss function via the fn parameter:

Key	Function	Notes
`mse`	Mean Squared Error	Default — good for smooth regression
`mae` / `l1`	Mean Absolute Error	Robust to outliers
`rmse`	Root Mean Squared Error	Penalises large errors more than MAE
`huber`	Huber Loss	Combines MSE + MAE (delta = 1.0)
`smooth_l1`	Smooth L1	Like Huber with beta = 1.0

Configure per-property in configs/training/default.yaml:

losses:
  - name: energy
    weight: 4.0
    fn: mse          # ← loss function
  - name: forces
    weight: 1.0
    fn: huber         # ← robust to noisy forces
  - name: stress
    weight: 0.01
    fn: mae

Composite Loss per Property

Use multiple loss functions simultaneously for the same property, each with its own weight and separate logging panel:

losses:
  - name: energy
    weight: 4.0
    fn: mse
  - name: forces
    fn:                    # ← list of sub-losses
      - name: mse
        weight: 4.0
      - name: rmse
        weight: 8.0

This produces five logged metrics in W&B / TensorBoard:

Logged metric	Description
`train/energy`	Energy MSE × 4.0
`train/forces_mse`	Forces MSE × 4.0
`train/forces_rmse`	Forces RMSE × 8.0
`train/forces`	Sum of forces sub-losses
`train/total`	Grand total

Each sub-loss gets its own chart in W&B automatically.

Custom / torchmetrics Loss Functions

Use any callable via a dotted import path:

losses:
  - name: forces
    fn:
      - name: mse
        weight: 4.0
      - name: torchmetrics.functional.mean_squared_error
        weight: 2.0

Install torchmetrics first: pip install -e ".[torchmetrics]"

Override from the CLI:

# Switch forces loss to MAE
goal-train 'training.losses=[{name: energy, weight: 4.0, fn: mse}, {name: forces, weight: 1.0, fn: mae}]'

Tip: Use huber or mae for forces when your dataset has noisy DFT reference forces — they're more robust to outliers than MSE.

🟡 Foundation Model Adapters

Adapters wrap pre-trained foundation models (MACE, FairChem/UMA) as GOAL backbones. They translate between the foundation model's interface and GOAL's backbone protocol.

# Fine-tune MACE-large
goal-finetune model.backbone.name=mace-large model.backbone.pretrained=true

# Fine-tune UMA-small
goal-finetune model.backbone.name=uma-small model.backbone.pretrained=true

Install adapter dependencies:

pip install -e ".[mace]"       # for MACE adapters
pip install -e ".[fairchem]"   # for FairChem/UMA adapters

🟡 Feature Extraction

Extract intermediate node features from any backbone for downstream analysis, transfer learning, or custom heads.

HookBasedExtractor

Attach forward hooks to interaction blocks — works with any model whose layers are a nn.ModuleList:

from goal.ml.utils.extraction import HookBasedExtractor

with HookBasedExtractor(model, blocks_attr="interactions", output_index=0) as ext:
    output = model(batch)
    features = ext.captured  # {"layer_0": Tensor, "layer_1": Tensor, ...}

Composable Backbone Wrappers

Wrapper	Description
`LayerBackbone`	Returns features from a single interaction layer
`MultiScaleBackbone`	Concatenates features from multiple layers
`FrozenBackbone`	Freezes all backbone parameters for feature extraction

Irrep Helpers

from goal.ml.utils.extraction import extract_scalars, extract_irrep_channels, pool_nodes

scalars = extract_scalars(node_feats, irreps)           # l=0 channels only
channels = extract_irrep_channels(node_feats, irreps)   # dict by irrep type
graph_feats = pool_nodes(node_feats, batch_idx)          # per-graph pooling

Pre-built Extractors

Registered as Hydra targets for zero-code feature extraction:

backbone:
  _target_: goal.ml.utils.extraction._build_mace_large_final       # last layer
  # or: goal.ml.utils.extraction._build_mace_large_multiscale      # all layers
  # or: goal.ml.utils.extraction._build_mace_large_frozen           # frozen weights

🟡 Mini Trainer

A standalone, lightweight training loop for rapid prototyping in Jupyter notebooks. Completely decoupled from the Lightning / Hydra pipeline — operates on raw PyTorch primitives.

Typical workflow:

Freeze a foundation model (MACE, FairChem, etc.) and extract representations
Cache extracted features as a TensorDataset
Train a downstream head with MiniTrainer — iterate fast without re-running the backbone

Basic Usage

from goal.ml.utils.mini_trainer import MiniTrainer

trainer = MiniTrainer(
    model=my_head,
    loss_fn=torch.nn.MSELoss(),
    optimizer=torch.optim.Adam(my_head.parameters(), lr=1e-3),
    device="auto",
)
history = trainer.fit(train_loader, val_loader=val_loader, epochs=50)
history.plot()  # loss curves in the notebook

Features

Feature	Description
Early stopping	Stop when validation loss plateaus (`early_stopping_patience`)
Best checkpoint	In-memory best model state, restore with `trainer.load_best()`
LR scheduling	Any PyTorch scheduler (ReduceLROnPlateau, cosine, etc.)
Gradient clipping	Max-norm clipping via `grad_clip` parameter
Progress bars	`tqdm.auto` progress bars per epoch
History	`TrainingHistory` with `.plot()`, `.best_val_loss`, `.best_epoch`
Prediction	`trainer.predict(loader)` returns `(preds, targets)` tensors
Custom step	Plug in `step_fn` for `AtomicGraph` batches or arbitrary logic

With AtomicGraph Batches

For training on graph data with CompositeLoss, use the built-in graph_step:

from goal.ml.utils.mini_trainer import MiniTrainer, graph_step

trainer = MiniTrainer(
    model=my_backbone_plus_head,
    loss_fn=composite_loss,
    optimizer=optimizer,
    step_fn=graph_step,  # handles AtomicGraph batches
)
history = trainer.fit(graph_train_loader, graph_val_loader, epochs=50)

Notebook Demo

See notebooks/mini_trainer_demo.ipynb for a complete walkthrough — from feature extraction to model evaluation with parity plots.

🟡 Customising the Training Loop

GOAL provides three levels of training loop customisation, from least to most control:

Level	Tool	Multi-GPU	Loop Control	Best For
1	GOALModule hooks + callbacks	✅	Partial — override hooks	Standard workflows with minor tweaks
2	FabricTrainer	✅	Full — write your own `for` loop	Custom optimisation, multi-optimiser, GAN-style
3	MiniTrainer	❌	Full — pure PyTorch	Quick notebook prototyping on extracted features

Level 1: Override GOALModule Hooks

The standard Lightning path. Subclass GOALModule and override any hook:

from goal.ml.training.module import GOALModule

class MyModule(GOALModule):
    """Custom training step with auxiliary loss."""

    def training_step(self, batch, batch_idx):
        predictions = self(batch)
        losses = self.loss(predictions, batch)

        # --- Your custom logic here ---
        aux_loss = self.compute_auxiliary_loss(predictions, batch)
        losses["total"] = losses["total"] + 0.1 * aux_loss
        # --------------------------------

        self.log_dict(
            {f"train/{k}": v for k, v in losses.items()},
            batch_size=batch.num_graphs, sync_dist=True,
        )
        return losses["total"]

Register it in Hydra and use the standard goal-train CLI as usual.

What you can override:

Hook	When it runs
`training_step(batch, batch_idx)`	Each training batch
`validation_step(batch, batch_idx)`	Each validation batch
`configure_optimizers()`	Optimizer + scheduler setup
`configure_model()`	Pre-training model transforms (compile, FSDP wrap)
`on_before_optimizer_step(optimizer)`	Before each optimizer step (gradient clipping)
`on_train_batch_end(outputs, batch, batch_idx)`	After each training step (EMA update)

You can also inject logic via Lightning callbacks without subclassing:

from lightning import Callback

class GradientMonitorCallback(Callback):
    def on_before_optimizer_step(self, trainer, pl_module, optimizer):
        grad_norm = torch.nn.utils.clip_grad_norm_(pl_module.parameters(), float("inf"))
        pl_module.log("grad_norm", grad_norm)

Level 2: FabricTrainer (Full Loop Control + Multi-GPU)

When Lightning hooks are not enough — you need full control over the for loop and distributed training. Built on Lightning Fabric.

from goal.ml.utils.fabric_trainer import FabricTrainer, graph_fabric_step

ft = FabricTrainer(
    model=my_model,
    loss_fn=composite_loss,
    optimizer=optimizer,
    train_loader=train_loader,
    val_loader=val_loader,
    # --- Distributed config (same options as Lightning Trainer) ---
    accelerator="gpu",
    strategy="ddp",       # or "fsdp", "deepspeed", etc.
    devices=4,
    precision="bf16-mixed",
    # --- Loop options ---
    step_fn=graph_fabric_step,
    grad_clip=10.0,
    grad_accumulation_steps=4,
)
history = ft.fit(epochs=100, early_stopping_patience=20)

Or write the loop from scratch using the setup_fabric() helper:

from goal.ml.utils.fabric_trainer import setup_fabric

fabric = setup_fabric(strategy="ddp", devices=4, precision="bf16-mixed")

model, optimizer = fabric.setup(model, optimizer)
train_loader = fabric.setup_dataloaders(train_loader)

for epoch in range(100):
    model.train()
    for batch in train_loader:
        optimizer.zero_grad()
        predictions = model(batch)
        losses = loss_fn(predictions, batch)
        fabric.backward(losses["total"])

        # Your custom logic — anything goes:
        if epoch > 50:
            fabric.clip_gradients(model, optimizer, max_norm=1.0)

        optimizer.step()

    # Validation, logging, checkpointing — all under your control
    fabric.save("checkpoint.pt", {"model": model, "optimizer": optimizer})

FabricTrainer features:

Feature	Description
Multi-GPU / multi-node	DDP, FSDP, DeepSpeed — same strategies as Lightning
Mixed precision	bf16, fp16, fp64
Gradient accumulation	Efficient sync-skipping via `fabric.no_backward_sync()`
Gradient clipping	`fabric.clip_gradients()`
Checkpointing	`save_checkpoint()` / `load_checkpoint()` — handles sharded saves
Early stopping	Built-in patience counter
History	Reuses `TrainingHistory` from MiniTrainer (`.plot()`, `.best_val_loss`)

Level 3: MiniTrainer (Pure PyTorch)

Single-device, no Lightning dependency at all. Ideal for notebook prototyping on pre-extracted features. See the Mini Trainer section above.

Choosing the Right Level

Need multi-GPU?
  ├── No  → MiniTrainer (Level 3)
  └── Yes
        ├── Standard loop is fine, just need custom loss/hook? → GOALModule (Level 1)
        └── Need full loop control? → FabricTrainer (Level 2)

🟡 Performance Engineering

TF32 Matmul Precision

On Ampere+ GPUs (A100, H100, RTX 30xx/40xx), TF32 tensor cores provide ~3× speedup for float32 operations with negligible precision loss:

# configs/training/default.yaml
training:
  performance:
    float32_matmul_precision: high  # "highest" = fp32, "high" = TF32+fp32, "medium" = TF32

cuDNN Benchmark

Auto-tunes convolution algorithms for fixed input sizes:

training:
  performance:
    cudnn_benchmark: true
    cudnn_deterministic: false  # set true only for debugging

torch.compile

Compile the backbone with torch.compile for faster training (PyTorch 2.0+):

goal-train training.compile_model=true

Configure compilation mode:

training:
  compile_model: true
  compile:
    mode: default           # 'default', 'reduce-overhead', 'max-autotune'
    fullgraph: false        # true = compile the entire graph (faster, stricter)
    dynamic: null           # null, true, false — dynamic shape support

Mixed Precision

goal-train trainer.precision=bf16-mixed    # bfloat16 (Ampere+, recommended)
goal-train trainer.precision=16-mixed       # float16
goal-train trainer.precision=64-true        # double precision

Gradient Accumulation

Simulate larger batch sizes without increasing GPU memory:

goal-train trainer.accumulate_grad_batches=4   # effective batch = batch_size × 4

Or use the dynamic scheduler callback:

goal-train callbacks=grad_accumulation

Exponential Moving Average (EMA)

Maintains a shadow copy of weights for more stable evaluation:

training:
  ema:
    enabled: true
    decay: 0.999

Stochastic Weight Averaging (SWA)

Alternative to EMA — averages weights during the last portion of training:

goal-train callbacks=swa

Sanity Validation Check

Before the first training epoch, Lightning runs a short validation sanity check to catch data loading, metric computation, or model errors early. This is enabled by default:

# configs/trainer/default.yaml
num_sanity_val_steps: 2   # run 2 val batches before training
                          # 0 = skip, -1 = full validation set

Override from the command line:

# Skip sanity check (faster startup)
goal-train trainer.num_sanity_val_steps=0

# Full validation run before training (thorough check)
goal-train trainer.num_sanity_val_steps=-1

🟡 Hyperparameter Tuning

GOAL provides three levels of hyperparameter optimisation, all fully config-driven.

Basic: Lightning Tuner

Built-in learning rate and batch size auto-discovery. Zero extra dependencies.

goal-tune hparams_search=basic

# configs/hparams_search/basic.yaml
hparams_search:
  method: tuner
  tuner:
    lr_find: true             # find optimal learning rate
    scale_batch_size: true    # find max batch size that fits in memory

Advanced: Ray Tune

Full hyperparameter search with ASHA early stopping, Optuna Bayesian optimisation, or Population-Based Training. Requires optional dependencies.

pip install -e ".[tune]"   # installs ray[tune] + optuna

goal-tune hparams_search=ray_tune

Example Ray Tune config

# configs/hparams_search/ray_tune.yaml
hparams_search:
  method: ray
  num_samples: 20
  max_epochs: 100
  metric: val/total
  mode: min
  scheduler: asha
  search_algorithm: optuna

  search_space:
    training.optimizer.lr:
      type: loguniform
      lower: 1.0e-5
      upper: 1.0e-2
    training.optimizer.weight_decay:
      type: loguniform
      lower: 1.0e-8
      upper: 1.0e-3
    training.ema.decay:
      type: uniform
      lower: 0.99
      upper: 0.9999

Scheduler	Description
`asha`	Asynchronous Successive Halving — prunes bad trials early (recommended)
`pbt`	Population-Based Training — mutates hyperparams during training

Search algorithm	Description
`null`	Random search (no extra deps)
`optuna`	Bayesian optimisation via Optuna
`hyperopt`	Tree-structured Parzen Estimators

Advanced: W&B Sweeps

Cloud-managed hyperparameter search via Weights & Biases. Supports Bayesian, grid, and random search with Hyperband early termination. Requires W&B (already a core dependency).

goal-tune hparams_search=wandb_sweep

Example W&B Sweep config

# configs/hparams_search/wandb_sweep.yaml
hparams_search:
  method: wandb
  project: goal
  sweep_method: bayes          # 'bayes', 'grid', 'random'
  metric: val/total
  mode: min
  count: 20

  early_terminate:
    type: hyperband
    min_iter: 10
    eta: 3

  parameters:
    training.optimizer.lr:
      distribution: log_uniform_values
      min: 1.0e-5
      max: 1.0e-2
    training.optimizer.weight_decay:
      distribution: log_uniform_values
      min: 1.0e-8
      max: 1.0e-3

Resume an existing sweep:

goal-tune hparams_search=wandb_sweep hparams_search.sweep_id=<SWEEP_ID>

Sweep method	Description
`bayes`	Bayesian optimisation (Gaussian process) — recommended
`grid`	Exhaustive grid search
`random`	Random search

🟣 Callbacks

Default Callbacks

The default callback group (callbacks=default) includes:

ModelCheckpoint — save top-k checkpoints by validation loss, plus last.ckpt
EarlyStopping — stop training after 100 epochs with no improvement
RichModelSummary — rich-formatted model summary
RichProgressBar — rich-formatted training progress

Additional Callbacks

Callback	Config	Description
Stochastic Weight Averaging	`callbacks=swa`	Average weights during late training
Backbone Finetuning	`callbacks=backbone_finetuning`	Gradual unfreezing for fine-tuning
Gradient Accumulation Scheduler	`callbacks=grad_accumulation`	Dynamic accumulation steps

Override callback parameters:

goal-train callbacks.model_checkpoint.save_top_k=5
goal-train callbacks.early_stopping.patience=200

🟣 Logging

GOAL supports all Lightning loggers. Enable via the logger config group:

goal-train logger=wandb
goal-train logger=tensorboard
goal-train logger=csv

Logger	Config	Notes
Weights & Biases	`logger=wandb`	Project: `goal`, requires `wandb` login
TensorBoard	`logger=tensorboard`	Saves to `output_dir/tensorboard/`
CSV	`logger=csv`	Simple CSV file logging
MLflow	`logger=mlflow`	MLflow tracking server
Neptune	`logger=neptune`	Requires `NEPTUNE_API_TOKEN`
Aim	`logger=aim`	Local `.aim` repo, open with `aim up`
Comet	`logger=comet`	Comet.ml experiment tracking

Use multiple loggers:

goal-train logger=wandb,csv

Run Naming Convention

Every run is automatically named with a timestamp + dataset + model pattern:

{date}_{time}_{dataset_type}_{model_backbone}

For example: 2026-04-09_14-30-45_xyz_hyperspec

This naming is applied consistently to:

Output directories (logs/train/runs/...)
Logger run names (W&B, TensorBoard, MLflow, etc.)
Hydra sweep directories

Override the name from the CLI:

goal-train run_name=my_custom_experiment

🟣 Configuration System

GOAL uses Hydra for composable configuration. Every aspect of training is controlled by YAML config files that can be overridden from the command line.

Config Groups

Group	Path	Options
Data	`configs/data/`	`xyz`, `hdf5`, `lmdb`, `trajectory`, `md17_aspirin`, `md17_ethanol`, `rmd17_aspirin`, `ani1`, `ani1x`, `qm9`
Model	`configs/model/`	`hyperspec`, `invariant_gnn`, `deepset`, `hyperset`⚠️, `lucidset`⚠️, `monolithic_example`
Trainer	`configs/trainer/`	`default`, `gpu`, `ddp`, `fsdp`, `model_parallel`, `cpu`, `mps`, `ddp_sim`
Training	`configs/training/`	`default`
Strategy	`configs/strategy/`	`ddp`, `fsdp`, `fsdp2`, `deepspeed_zero1`, `deepspeed_zero2`, `deepspeed_zero3`
Callbacks	`configs/callbacks/`	`default`, `none`, `swa`, `backbone_finetuning`, `grad_accumulation`
Logger	`configs/logger/`	`wandb`, `tensorboard`, `csv`, `mlflow`, `neptune`, `aim`, `comet`
Hparams Search	`configs/hparams_search/`	`basic`, `ray_tune`, `wandb_sweep`

Override Examples

# Change model and data format
goal-train model=invariant_gnn data=hdf5

# Override nested parameters
goal-train training.optimizer.lr=0.0005 training.ema.decay=0.9999

# Change loss weights
goal-train training.losses.0.weight=1.0 training.losses.1.weight=50.0

# Multi-run sweep
goal-train -m training.optimizer.lr=0.001,0.0005,0.0001

# Disable callbacks
goal-train callbacks=none

Output Directory

Each run creates a timestamped output directory:

logs/train/runs/2026-04-09_14-30-45_xyz_hyperspec/
├── checkpoints/
│   ├── epoch_001.ckpt
│   └── last.ckpt
├── train.log
└── .hydra/
    ├── config.yaml          # resolved config
    ├── hydra.yaml
    └── overrides.yaml       # command-line overrides

🟣 CLI Reference

Command	Description
`goal-train`	Train a model
`goal-eval`	Evaluate a checkpoint on test data
`goal-finetune`	Fine-tune a pre-trained model
`goal-tune`	Hyperparameter search (LR finder, Ray Tune, W&B Sweeps)

All commands accept Hydra overrides:

goal-train trainer=ddp data=hdf5 model=invariant_gnn logger=wandb seed=42

Module-based invocation (equivalent):

python -m goal.ml.cli.train trainer=ddp data=hdf5
python -m goal.ml.cli.evaluate ckpt_path=/path/to/ckpt
python -m goal.ml.cli.finetune model.backbone.pretrained=true
python -m goal.ml.cli.tune hparams_search=basic

🟣 Pixi Tasks

If using pixi as your environment manager, these tasks are available:

Task	Command	Description
`pixi run train`	`python -m goal.ml.cli.train`	Train a model
`pixi run eval`	`python -m goal.ml.cli.evaluate`	Evaluate a checkpoint
`pixi run finetune`	`python -m goal.ml.cli.finetune`	Fine-tune a model
`pixi run test`	`pytest -k 'not slow'`	Run fast tests
`pixi run test-full`	`pytest`	Run all tests
`pixi run lint`	`ruff check src/ tests/`	Lint code
`pixi run format`	`ruff format src/ tests/`	Format code
`pixi run typecheck`	`mypy src/goal/ml/`	Type check
`pixi run clean`	—	Remove build artifacts
`pixi run clean-logs`	`rm -rf logs/**`	Remove training logs

Pass Hydra overrides through pixi:

pixi run train trainer=ddp data.root=/path/to/data

Use the cuda-deepspeed environment for DeepSpeed training:

pixi run -e cuda-deepspeed train strategy=deepspeed_zero2

📦 Tested Versions

Package	Version
Python	3.14.4
PyTorch	2.10.0
Lightning	2.6.1
e3nn	0.6.0
PyG (torch-geometric)	2.7.0
Hydra	1.3.2
ASE	3.28.0
W&B	0.25.1
Rich	13.9.4

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github		.github
configs		configs
data		data
examples/datasets		examples/datasets
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
README.md		README.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation