Skip to content

ideogram: use _scaled_mm for fp8 matmul, sorry non-Ada-or-newer owners#2740

Merged
bghira merged 6 commits into
mainfrom
feature/ideogram-scaled-mm
Jun 7, 2026
Merged

ideogram: use _scaled_mm for fp8 matmul, sorry non-Ada-or-newer owners#2740
bghira merged 6 commits into
mainfrom
feature/ideogram-scaled-mm

Conversation

@bghira
Copy link
Copy Markdown
Owner

@bghira bghira commented Jun 5, 2026

This pull request introduces support for efficient FP8 matrix multiplication using scaled matmul on compatible CUDA devices in the ideogram quantized loading helpers. The most significant change is the addition of a custom autograd function to accelerate FP8 linear layers when supported hardware and software conditions are met.

FP8 Scaled Matmul Support:

  • Added a function _scaled_mm_supported to check if the current environment and tensor are suitable for using the optimized FP8 scaled matrix multiplication, with an environment variable override for manual control.
  • Introduced the _Fp8LinearScaledMm custom autograd function to perform forward and backward passes using torch._scaled_mm for FP8 inputs, including dynamic input scaling and proper handling of gradients.
  • Updated the forward method in the quantized linear class to use the new scaled FP8 path when supported, falling back to the previous dequantized path otherwise.

General Improvements:

  • Added import of the os module to support environment variable checks.
  • Defined FP8_INPUT_DTYPE for clarity and consistency in FP8 input handling.

This comment was marked as resolved.

@bghira bghira merged commit faf9177 into main Jun 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants