ALTO is a Python library for low-precision model training and optimization, built on top of the TorchTitan fork. It ships Triton-backed low-precision kernels (MXFP4, block-scaled FP8, and related utilities) and a configurable stack of modifiers—low-precision training (LPT)—wired into TorchTitan through a model-converter pipeline.
Training-oriented kernels and schemes include:
- Blockwise FP8 — linear, grouped GEMM, and FlashAttention.
- MXFP4 — linear, grouped GEMM, and FlashAttention.
Techniques used to narrow the gap versus BF16 include:
- 2D block quantization
- Randomized Hadamard Transform (RHT)
- Stochastic Rounding (SR)
- Differential Gradient Estimation (DGE)
Recipes can combine multiple stages under alto/modifiers/, e.g. Low-Precision Training (LPT) Modifier.
Recipe YAML follows the same general shape as llm-compressor; use the configs under alto/models/*/configs/ as concrete templates.
| Model | Integration notes |
|---|---|
| Llama 3 | * Extended state dict adapter for Hugging Face Safetensors with observer/modifier state; * patcher keeps query/key projections in the Transformers layout for RoPE; * Config registry hooks for TorchTitan. |
| DeepSeek V3 | Config registry hooks for TorchTitan. |
| GPT-OSS | Config registry hooks for TorchTitan. |
- Python 3.9+
- PyTorch 2.9+ (see
pyproject.tomlfor the full dependency set:torchao,safetensors,compressed_tensors,lm_eval, etc.) - GPU — training paths expect ROCm/CUDA-capable hardware; see TorchTitan documentation for parallel layout details.
Clone the repository with submodules so the vendored TorchTitan tree is present:
git clone --recurse-submodules https://github.com/AMD-AGI/ALTO.git
cd ALTOIf you already cloned without submodules:
git submodule update --init --recursiveInstall the TorchTitan tree shipped under 3rdparty/torchtitan, then install ALTO in editable mode:
pip install --no-build-isolation -e 3rdparty/torchtitan
pip install -e .- Author or copy a recipe YAML (see existing files under
alto/models/<model>/configs/). - Register a TorchTitan config that attaches ALTO’s converter to
model_converters.
Example (Llama 3 registry pattern):
from torchtitan.protocols.model_converter import ModelConvertersContainer
from alto.components.converter import ModelOptConverter
config.model_converters = ModelConvertersContainer.Config(
converters=[
ModelOptConverter.Config(recipe="./alto/models/llama3/configs/recipe.yaml"),
],
)See alto/models/llama3/config_registry.py for full trainer configs.
From the repository root, the shared launcher wraps torchrun and python -m alto.train:
NGPU=8 MODULE=llama3 CONFIG=your_config_name ./examples/run.shEnvironment variables (see examples/run.sh):
| Variable | Role |
|---|---|
NGPU |
Processes per node (default 8). |
MODULE |
TorchTitan module name (llama3, gpt_oss, …). |
CONFIG |
Registered config function name. |
TRAIN_FILE |
Python module for training entrypoint (default alto.train). |
COMM_MODE |
Optional: fake_backend or local_tensor for config checks / single-GPU debugging. |
-
Recipe:
alto/models/gpt_oss/configs/lpt_recipe.yamluses
LowPrecisionTrainingModifierwithscheme: "mxfp4". -
Config:
gpt_oss_20b_lptin the GPT-OSS registry. -
Run:
NGPU=8 MODULE=gpt_oss CONFIG=gpt_oss_20b_lpt ./examples/run.sh
Illustrative recipe fragment:
training_stage:
lpt_modifiers:
LowPrecisionTrainingModifier:
scheme: "mxfp4"
targets: ["Linear", "GptOssGroupedExperts"]
ignore: ["output", "re:.*\\.router\\.gate"]
use_2dblock_x: false
use_2dblock_w: true
use_hadamard: true
use_sr_grad: true
use_dge: falseTorchTitan typically saves checkpoints in PyTorch DCP format; you can convert to Hugging Face Safetensors and run lm-eval tasks with the bundled export utility:
python ./alto/utils/exportation/export.py \
llama3 llama3_1b_opt \
--tasks wikitext- Homepage github.com/AMD-AGI/ALTO
- TorchTitan submodule: github.com/AMD-AGI/torchtitan-amd (
3rdparty/torchtitan)
For questions, issues, or contributions, please reach out to the maintainers:
- Guanchen Li — @guanchenl · GuanChen.Li@amd.com
- Han Wang — @hann-wang · Han.Wang@amd.com
- Yue Sun — @ysa2215 · Yue.Sun2@amd.com
- Zhitao Wang — @zhitwang17 · Zhitao.Wang@amd.com
See CODEOWNERS for the full ownership list.