Skip to content

AMD-AGI/ALTO

Repository files navigation

ALTO: Advanced Low-precision Training and Optimization

ALTO is a Python library for low-precision model training and optimization, built on top of the TorchTitan fork. It ships Triton-backed low-precision kernels (MXFP4, block-scaled FP8, and related utilities) and a configurable stack of modifiers—low-precision training (LPT)—wired into TorchTitan through a model-converter pipeline.

Contents

Features

Low-precision training (LPT)

Training-oriented kernels and schemes include:

  • Blockwise FP8 — linear, grouped GEMM, and FlashAttention.
  • MXFP4 — linear, grouped GEMM, and FlashAttention.

Techniques used to narrow the gap versus BF16 include:

  • 2D block quantization
  • Randomized Hadamard Transform (RHT)
  • Stochastic Rounding (SR)
  • Differential Gradient Estimation (DGE)

Modifiers

Recipes can combine multiple stages under alto/modifiers/, e.g. Low-Precision Training (LPT) Modifier.

Recipe YAML follows the same general shape as llm-compressor; use the configs under alto/models/*/configs/ as concrete templates.

Supported models

Model Integration notes
Llama 3 * Extended state dict adapter for Hugging Face Safetensors with observer/modifier state;
* patcher keeps query/key projections in the Transformers layout for RoPE;
* Config registry hooks for TorchTitan.
DeepSeek V3 Config registry hooks for TorchTitan.
GPT-OSS Config registry hooks for TorchTitan.

Requirements

  • Python 3.9+
  • PyTorch 2.9+ (see pyproject.toml for the full dependency set: torchao, safetensors, compressed_tensors, lm_eval, etc.)
  • GPU — training paths expect ROCm/CUDA-capable hardware; see TorchTitan documentation for parallel layout details.

Installation

Clone the repository with submodules so the vendored TorchTitan tree is present:

git clone --recurse-submodules https://github.com/AMD-AGI/ALTO.git
cd ALTO

If you already cloned without submodules:

git submodule update --init --recursive

Install the TorchTitan tree shipped under 3rdparty/torchtitan, then install ALTO in editable mode:

pip install --no-build-isolation -e 3rdparty/torchtitan
pip install -e .

Usage

Wire a recipe into TorchTitan

  1. Author or copy a recipe YAML (see existing files under alto/models/<model>/configs/).
  2. Register a TorchTitan config that attaches ALTO’s converter to model_converters.

Example (Llama 3 registry pattern):

from torchtitan.protocols.model_converter import ModelConvertersContainer
from alto.components.converter import ModelOptConverter

config.model_converters = ModelConvertersContainer.Config(
    converters=[
        ModelOptConverter.Config(recipe="./alto/models/llama3/configs/recipe.yaml"),
    ],
)

See alto/models/llama3/config_registry.py for full trainer configs.

Launch training

From the repository root, the shared launcher wraps torchrun and python -m alto.train:

NGPU=8 MODULE=llama3 CONFIG=your_config_name ./examples/run.sh

Environment variables (see examples/run.sh):

Variable Role
NGPU Processes per node (default 8).
MODULE TorchTitan module name (llama3, gpt_oss, …).
CONFIG Registered config function name.
TRAIN_FILE Python module for training entrypoint (default alto.train).
COMM_MODE Optional: fake_backend or local_tensor for config checks / single-GPU debugging.

Examples

GPT-OSS 20B — MXFP4 Training

Illustrative recipe fragment:

training_stage:
  lpt_modifiers:
    LowPrecisionTrainingModifier:
      scheme: "mxfp4"
      targets: ["Linear", "GptOssGroupedExperts"]
      ignore: ["output", "re:.*\\.router\\.gate"]
      use_2dblock_x: false
      use_2dblock_w: true
      use_hadamard: true
      use_sr_grad: true
      use_dge: false

Export and evaluation

TorchTitan typically saves checkpoints in PyTorch DCP format; you can convert to Hugging Face Safetensors and run lm-eval tasks with the bundled export utility:

python ./alto/utils/exportation/export.py \
  llama3 llama3_1b_opt \
  --tasks wikitext

Project links

Contact

For questions, issues, or contributions, please reach out to the maintainers:

See CODEOWNERS for the full ownership list.

About

ALTO: Advanced Low-precision Training and Optimization

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors