ALTO: Advanced Low-precision Training and Optimization

ALTO is a Python library for low-precision model training and optimization, built on top of the TorchTitan fork. It ships Triton-backed low-precision kernels (MXFP4, block-scaled FP8, and related utilities) and a configurable stack of modifiers—low-precision training (LPT)—wired into TorchTitan through a model-converter pipeline.

Features

Low-precision training (LPT)

Training-oriented kernels and schemes include:

Blockwise FP8 — linear, grouped GEMM, and FlashAttention.
MXFP4 — linear, grouped GEMM, and FlashAttention.

Techniques used to narrow the gap versus BF16 include:

2D block quantization
Randomized Hadamard Transform (RHT)
Stochastic Rounding (SR)
Differential Gradient Estimation (DGE)

Modifiers

Recipes can combine multiple stages under alto/modifiers/, e.g. Low-Precision Training (LPT) Modifier.

Recipe YAML follows the same general shape as llm-compressor; use the configs under alto/models/*/configs/ as concrete templates.

Supported models

Model	Integration notes
Llama 3	* Extended state dict adapter for Hugging Face Safetensors with observer/modifier state; * patcher keeps query/key projections in the Transformers layout for RoPE; * Config registry hooks for TorchTitan.
DeepSeek V3	Config registry hooks for TorchTitan.
GPT-OSS	Config registry hooks for TorchTitan.

Requirements

Python 3.9+
PyTorch 2.9+ (see pyproject.toml for the full dependency set: torchao, safetensors, compressed_tensors, lm_eval, etc.)
GPU — training paths expect ROCm/CUDA-capable hardware; see TorchTitan documentation for parallel layout details.

Installation

Clone the repository with submodules so the vendored TorchTitan tree is present:

git clone --recurse-submodules https://github.com/AMD-AGI/ALTO.git
cd ALTO

If you already cloned without submodules:

git submodule update --init --recursive

Install the TorchTitan tree shipped under 3rdparty/torchtitan, then install ALTO in editable mode:

pip install --no-build-isolation -e 3rdparty/torchtitan
pip install -e .

Usage

Wire a recipe into TorchTitan

Author or copy a recipe YAML (see existing files under alto/models/<model>/configs/).
Register a TorchTitan config that attaches ALTO’s converter to model_converters.

Example (Llama 3 registry pattern):

from torchtitan.protocols.model_converter import ModelConvertersContainer
from alto.components.converter import ModelOptConverter

config.model_converters = ModelConvertersContainer.Config(
    converters=[
        ModelOptConverter.Config(recipe="./alto/models/llama3/configs/recipe.yaml"),
    ],
)

See alto/models/llama3/config_registry.py for full trainer configs.

Launch training

From the repository root, the shared launcher wraps torchrun and python -m alto.train:

NGPU=8 MODULE=llama3 CONFIG=your_config_name ./examples/run.sh

Environment variables (see examples/run.sh):

Variable	Role
`NGPU`	Processes per node (default `8`).
`MODULE`	TorchTitan module name (`llama3`, `gpt_oss`, …).
`CONFIG`	Registered config function name.
`TRAIN_FILE`	Python module for training entrypoint (default `alto.train`).
`COMM_MODE`	Optional: `fake_backend` or `local_tensor` for config checks / single-GPU debugging.

Examples

GPT-OSS 20B — MXFP4 Training

Recipe: alto/models/gpt_oss/configs/lpt_recipe.yaml

uses LowPrecisionTrainingModifier with scheme: "mxfp4".
Config: gpt_oss_20b_lpt in the GPT-OSS registry.

Run:

NGPU=8 MODULE=gpt_oss CONFIG=gpt_oss_20b_lpt ./examples/run.sh

Illustrative recipe fragment:

training_stage:
  lpt_modifiers:
    LowPrecisionTrainingModifier:
      scheme: "mxfp4"
      targets: ["Linear", "GptOssGroupedExperts"]
      ignore: ["output", "re:.*\\.router\\.gate"]
      use_2dblock_x: false
      use_2dblock_w: true
      use_hadamard: true
      use_sr_grad: true
      use_dge: false

Export and evaluation

TorchTitan typically saves checkpoints in PyTorch DCP format; you can convert to Hugging Face Safetensors and run lm-eval tasks with the bundled export utility:

python ./alto/utils/exportation/export.py \
  llama3 llama3_1b_opt \
  --tasks wikitext

Project links

Homepage github.com/AMD-AGI/ALTO
TorchTitan submodule: github.com/AMD-AGI/torchtitan-amd (3rdparty/torchtitan)

Contact

For questions, issues, or contributions, please reach out to the maintainers:

Guanchen Li — @guanchenl · GuanChen.Li@amd.com
Han Wang — @hann-wang · Han.Wang@amd.com
Yue Sun — @ysa2215 · Yue.Sun2@amd.com
Zhitao Wang — @zhitwang17 · Zhitao.Wang@amd.com

See CODEOWNERS for the full ownership list.

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github		.github
3rdparty		3rdparty
alto		alto
examples		examples
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.style.yapf		.style.yapf
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALTO: Advanced Low-precision Training and Optimization

Contents

Features

Low-precision training (LPT)

Modifiers

Supported models

Requirements

Installation

Usage

Wire a recipe into TorchTitan

Launch training

Examples

GPT-OSS 20B — MXFP4 Training

Export and evaluation

Project links

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ALTO: Advanced Low-precision Training and Optimization

Contents

Features

Low-precision training (LPT)

Modifiers

Supported models

Requirements

Installation

Usage

Wire a recipe into TorchTitan

Launch training

Examples

GPT-OSS 20B — MXFP4 Training

Export and evaluation

Project links

Contact

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages