Skip to content

[Codegen][CPU] Pick inner-tiled unroll factors from a register budget.#2

Open
bjacob wants to merge 1 commit into
inner-tiled-materialize-encodingfrom
inner-tiled-unroll-factors
Open

[Codegen][CPU] Pick inner-tiled unroll factors from a register budget.#2
bjacob wants to merge 1 commit into
inner-tiled-materialize-encodingfrom
inner-tiled-unroll-factors

Conversation

@bjacob
Copy link
Copy Markdown
Owner

@bjacob bjacob commented Apr 24, 2026

Replace the hard-coded intrinsics_m = intrinsics_n = 1 from the previous commit with a cost model that, for the intrinsic/orientation already chosen by chooseIntrinsic, picks the largest power-of-two unroll factors such that the three tiles (ACC + LHS + RHS) still fit in the target's architectural vector register file, breaking ties with arithmetic intensity (effM*effN)/(effM+effN) so approximately- square tiles win.

Introduces:

  • getRegisterSpaceBytes(intrinsic) returning the bit capacity of the three-tile budget; scalable ISAs (SVE/SVE2) are treated as their 128-bit VL minimum.
  • po2UnrollCap to round the static-matmul-extent cover count down to a power of two (and fall back to the budget for dynamic dims).
  • chooseUnrolling as Phase 2 of chooseCpuInnerTiledMmaForEncoding.

Updates materialize_encoding_x86_64.mlir to reflect the new unroll choices (e.g. intrinsics_m = 16 on dynamic AVX-512 f32 matmul).

Made-with: Cursor

Replace the hard-coded `intrinsics_m = intrinsics_n = 1` from the
previous commit with a cost model that, for the intrinsic/orientation
already chosen by chooseIntrinsic, picks the largest power-of-two
unroll factors such that the three tiles (ACC + LHS + RHS) still fit
in the target's architectural vector register file, breaking ties
with arithmetic intensity (effM*effN)/(effM+effN) so approximately-
square tiles win.

Introduces:
- `getRegisterSpaceBytes(intrinsic)` returning the bit capacity of the
  three-tile budget; scalable ISAs (SVE/SVE2) are treated as their
  128-bit VL minimum.
- `po2UnrollCap` to round the static-matmul-extent cover count down to
  a power of two (and fall back to the budget for dynamic dims).
- `chooseUnrolling` as Phase 2 of `chooseCpuInnerTiledMmaForEncoding`.

Updates materialize_encoding_x86_64.mlir to reflect the new unroll
choices (e.g. intrinsics_m = 16 on dynamic AVX-512 f32 matmul).

Made-with: Cursor
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant