Skip to content

Enable separate-xclbin dispatch for FusedMLIROperator for debugging#117

Draft
andrej wants to merge 4 commits into
amd:develfrom
andrej:fused-dispatch-modes
Draft

Enable separate-xclbin dispatch for FusedMLIROperator for debugging#117
andrej wants to merge 4 commits into
amd:develfrom
andrej:fused-dispatch-modes

Conversation

@andrej
Copy link
Copy Markdown
Collaborator

@andrej andrej commented May 12, 2026

While single-dispatch operators improve performance, it makes it hard to debug when something goes wrong. This adds several modes to the FusedMLIROperator to be able to dispatch layer-by-layer, inspect outputs after each layer and compare against a reference for troubleshooting.

andrej added 4 commits May 12, 2026 15:10
(fusion.py portion of upstream commit daf9162; operator-specific test
changes omitted since those operators are not on devel)
Each MLIROperator subclass used by llama decode now exposes a
reference() instance method callable with the operator's input
tensors (shaped per get_arg_spec()), returning the output tensor.

Covered: ElementwiseAdd, ElementwiseMul, SiLU, RMSNorm (weighted),
GEMV (optionally batched), GEMM (b_col_maj/c_col_maj), Softmax,
Transpose, Repeat, RoPE (method_type=0).

Used by the new 'reference' and 'compare' fusion dispatch modes.
- 'reference': pure-CPU evaluation of the runlist; each step calls
  op.reference(*inputs) on host-side torch.bfloat16 buffers. No NPU
  compilation or dispatch.

- 'compare': runs the separate-xclbin NPU pipeline (Phoenix path) and,
  after each step, runs op.reference() on the NPU-produced inputs and
  logs per-step max_abs / mean_abs / max_rel deviations. Because the
  reference is re-seeded from the NPU's actual inputs every step, each
  comparison reflects only the current operator's error (no
  accumulation).

New callables: FusedReferenceCallable, FusedCompareCallable
(subclass of FusedXclbinCallable).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

CI Test Results

c05f5c9 (2026_05_12_21_59_06)

IRON - CI Summary

Examples

iron/applications/llama_3.2_1b
Test Krackan Status Krackan Phoenix Status Phoenix
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40] - - -

Small

iron/operators/axpy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0] 135.66 245.44
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0] 174.42 335.40
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0] 154.00 305.72
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0] 169.64 - -
iron/operators/dequant
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32] 170.44 397.82
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32] 171.16 727.04
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32] 184.62 403.52
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32] 175.34 807.60
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32] 169.62 469.04
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32] 188.38 441.84
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32] 174.60 - -
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32] 208.66 - -
iron/operators/elementwise_add
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048] 140.32 350.68
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024] 138.32 333.24
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512] 158.28 461.56
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256] 174.52 - -
iron/operators/elementwise_mul
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048] 183.84 391.84
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024] 169.10 371.10
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512] 192.22 369.26
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256] 238.24 - -
iron/operators/gelu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 167.16 379.36
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 177.04 802.84
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 162.18 478.56
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 166.64 416.32
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 152.12 421.86
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 200.08 474.44
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 193.36 - -
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 235.10 - -
iron/operators/gemm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1] 2145.22 - -
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1] 265.08 496.38
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1] 254.60 573.66
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 48818.22 83454.00
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 28679.68 25191.60
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1] 7715.68 - -
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1] 2115.94 3416.20
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4] 3336.26 5995.50
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1] 1323.52 - -
iron/operators/gemv
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128] 0.23 0.11
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048] 13.36 3.49
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024] 24.27 6.79
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512] 40.05 10.45
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256] 42.85 - -
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024] 12.91 3.70
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024] 23.45 6.60
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024] 41.31 10.22
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024] 42.46 - -
iron/operators/layer_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 194.80 321.44
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 177.22 791.76
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 189.84 460.52
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 194.40 484.82
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 192.34 409.58
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 209.00 526.76
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 194.04 - -
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 222.02 - -
iron/operators/mem_copy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048] 148.42 336.66
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128] 224.84 - -
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024] 167.78 303.66
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024] 197.58 309.82
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512] 191.90 447.38
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512] 231.82 370.10
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256] 203.20 - -
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256] 188.32 480.56
iron/operators/mha
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0] 40643.42 - -
iron/operators/relu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 172.76 383.68
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 163.60 371.28
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 166.76 447.74
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 164.38 329.04
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 194.34 458.94
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 180.20 489.92
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 204.60 - -
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 214.52 - -
iron/operators/rms_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False] 155.90 394.12
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True] 170.62 419.82
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False] 165.62 444.14
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True] 175.94 802.88
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False] 163.62 382.08
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True] 227.66 433.54
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False] 168.34 406.90
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True] 175.84 373.72
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False] 162.18 773.88
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True] 203.02 469.62
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False] 182.74 386.46
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True] 185.82 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False] 186.58 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True] 205.22 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False] 202.38 - -
iron/operators/rope
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0] 162.36 431.66
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0] 160.08 498.84
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0] 144.02 387.22
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0] 194.76 - -
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0] 175.76 337.78
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0] 161.30 323.46
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0] 159.32 699.72
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0] 233.48 - -
iron/operators/sigmoid
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 135.62 320.10
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 134.94 368.54
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 160.98 384.18
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 182.58 342.66
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 158.30 397.14
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 168.38 676.80
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 155.80 - -
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 225.16 - -
iron/operators/silu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 156.04 338.76
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 215.44 286.22
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 177.72 404.24
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 207.24 - -
iron/operators/softmax
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024] 185.34 479.56
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048] 168.48 491.14
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512] 186.16 698.44
iron/operators/swiglu_decode
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584] 3698.89 5899.79
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048] 4001.62 14132.67
iron/operators/swiglu_prefill
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False] 12300.83 23432.15
iron/operators/tanh
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 179.94 429.94
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 206.00 328.24
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 189.62 478.32
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 170.64 469.42
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 162.64 504.78
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 187.02 677.30
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 178.08 - -
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 230.12 - -
iron/operators/transpose
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8] 220.22 472.76
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8] 189.16 871.86
Krackan - Small

IRON

Tested on 2026_05_12_21_59_06 at commit c05f5c9.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5135.660.09n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5174.420.08n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5154.000.08n/a
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]✅ 5/5169.640.07n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5170.440.03n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5171.160.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5184.620.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5175.340.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5169.620.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5188.380.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]✅ 5/5174.600.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]✅ 5/5208.660.03n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5140.320.09n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5138.320.09n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5158.280.08n/a
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5174.520.07n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5183.840.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5169.100.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5192.220.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5238.240.05n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5167.160.05n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5177.040.05n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5162.180.05n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5166.640.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5152.120.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5200.080.04n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5193.360.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5235.100.04n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]✅ 5/52145.224.411734.86
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5265.080.9038.44
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5254.600.8938.09
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/548818.220.52351.92
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/528679.680.88599.04
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/57715.683.262227.51
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/52115.943.851008.93
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/53336.260.3921.10
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]✅ 5/51323.525.121582.14
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.230.23
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a13.3613.36
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a24.2724.25
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a40.0540.02
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]✅ 5/5n/a42.8542.82
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a12.9112.91
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a23.4523.43
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a41.3141.29
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a42.4642.43
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5194.800.04n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5177.220.05n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5189.840.04n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5194.400.04n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5192.340.04n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5209.000.04n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5194.040.04n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5222.020.04n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5148.420.06n/a
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]✅ 5/5224.840.04n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5167.780.05n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5197.580.04n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5191.900.04n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5231.820.04n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]✅ 5/5203.200.04n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5188.320.04n/a
iron/operators/mha
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]✅ 5/540643.420.21n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5172.760.05n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5163.600.05n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5166.760.05n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5164.380.05n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5194.340.04n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5180.200.05n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5204.600.04n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5214.520.04n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5155.900.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5170.620.08n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5165.620.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5175.940.06n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5163.620.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5227.660.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5168.340.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5175.840.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5162.180.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5203.020.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5182.740.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]✅ 5/5185.820.05n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]✅ 5/5186.580.05n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]✅ 5/5205.220.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]✅ 5/5202.380.04n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5162.360.61n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5160.080.65n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5144.020.70n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]✅ 5/5194.760.51n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5175.760.43n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5161.300.47n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5159.320.48n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]✅ 5/5233.480.33n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5135.620.06n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5134.940.06n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5160.980.05n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5182.580.05n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5158.300.05n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5168.380.05n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5155.800.05n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5225.160.04n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5156.040.05n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5215.440.04n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5177.720.05n/a
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5207.240.04n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5185.340.71n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5168.480.80n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5186.160.76n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/53698.890.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/54001.620.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/512300.830.18n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5179.940.05n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5206.000.04n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5189.620.04n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5170.640.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5162.640.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5187.020.04n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5178.080.05n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5230.120.04n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]✅ 5/5220.222.47n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]✅ 5/5189.162.92n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.11 (+36.64%)0.09 (+28.17%)0.09 (+27.37%)0.07 (+19.83%)0.01 (+66.48%)166.20 (-16.52%)135.66 (-21.36%)131.30 (-21.52%)109.30 (-26.84%)21.59 (+1.67%)
5503a95 — 2026-05-12 00:06:190.08 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)199.10 (n/a)172.50 (n/a)167.30 (n/a)149.40 (n/a)21.24 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.11 (+13.82%)0.08 (+10.01%)0.08 (+21.32%)0.04 (-14.94%)0.02 (+30.32%)281.20 (+17.56%)174.42 (-5.62%)157.00 (-17.59%)113.10 (-12.19%)63.09 (+43.60%)
5503a95 — 2026-05-12 00:06:190.10 (n/a)0.07 (n/a)0.06 (n/a)0.05 (n/a)0.02 (n/a)239.20 (n/a)184.80 (n/a)190.50 (n/a)128.80 (n/a)43.93 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.09 (+33.30%)0.08 (+33.43%)0.09 (+36.94%)0.07 (+46.19%)0.01 (+17.52%)179.90 (-31.62%)154.00 (-25.45%)142.40 (-26.97%)138.40 (-24.99%)18.57 (-41.84%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)263.10 (n/a)206.56 (n/a)195.00 (n/a)184.50 (n/a)31.94 (n/a)

test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.09 (+11.73%)0.07 (+8.04%)0.07 (+9.59%)0.06 (+14.83%)0.01 (+15.35%)192.70 (-12.92%)169.64 (-7.41%)166.30 (-8.73%)138.40 (-10.54%)21.75 (-10.62%)
5503a95 — 2026-05-12 00:06:190.08 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)221.30 (n/a)183.22 (n/a)182.20 (n/a)154.70 (n/a)24.34 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.04 (-9.59%)0.03 (-4.16%)0.03 (-4.48%)0.02 (-10.20%)0.01 (-9.61%)216.20 (+11.33%)170.44 (+4.27%)158.40 (+4.69%)144.30 (+10.66%)30.07 (+6.92%)
5503a95 — 2026-05-12 00:06:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)194.20 (n/a)163.46 (n/a)151.30 (n/a)130.40 (n/a)28.13 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.04 (+0.33%)0.03 (+1.70%)0.03 (+15.89%)0.02 (-19.88%)0.01 (+36.40%)240.40 (+24.82%)171.16 (+0.48%)152.60 (-13.74%)140.40 (-0.28%)41.26 (+69.47%)
5503a95 — 2026-05-12 00:06:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)192.60 (n/a)170.34 (n/a)176.90 (n/a)140.80 (n/a)24.34 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.04 (-17.18%)0.03 (-11.24%)0.03 (-12.26%)0.02 (-25.16%)0.01 (-7.69%)260.20 (+33.64%)184.62 (+13.98%)185.10 (+13.98%)140.30 (+20.74%)47.81 (+44.34%)
5503a95 — 2026-05-12 00:06:190.05 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)194.70 (n/a)161.98 (n/a)162.40 (n/a)116.20 (n/a)33.12 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.04 (+16.54%)0.03 (+9.30%)0.03 (+15.82%)0.02 (-2.95%)0.01 (+69.06%)216.60 (+3.04%)175.34 (-6.69%)167.10 (-13.64%)131.40 (-14.17%)34.74 (+52.84%)
5503a95 — 2026-05-12 00:06:190.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)210.20 (n/a)187.92 (n/a)193.50 (n/a)153.10 (n/a)22.73 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (+43.78%)0.03 (+15.77%)0.03 (+12.40%)0.02 (-1.77%)0.01 (+130.45%)230.70 (+1.81%)169.62 (-7.45%)158.70 (-10.99%)102.20 (-30.43%)55.93 (+70.71%)
5503a95 — 2026-05-12 00:06:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)226.60 (n/a)183.28 (n/a)178.30 (n/a)146.90 (n/a)32.76 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.03 (-13.37%)0.03 (+0.68%)0.03 (+9.31%)0.02 (-1.36%)0.00 (-33.82%)210.70 (+1.40%)188.38 (-1.50%)184.50 (-8.48%)168.60 (+15.40%)20.29 (-21.03%)
5503a95 — 2026-05-12 00:06:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)207.80 (n/a)191.24 (n/a)201.60 (n/a)146.10 (n/a)25.69 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.04 (+0.00%)0.03 (-0.24%)0.03 (-3.39%)0.03 (+2.45%)0.00 (-8.97%)200.60 (-2.43%)174.60 (-0.21%)167.20 (+3.53%)141.90 (+0.00%)25.43 (-12.13%)
5503a95 — 2026-05-12 00:06:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)205.60 (n/a)174.96 (n/a)161.50 (n/a)141.90 (n/a)28.95 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.03 (+1.19%)0.03 (+1.22%)0.03 (+2.57%)0.02 (+0.10%)0.00 (+16.21%)233.10 (-0.13%)208.66 (-1.01%)208.10 (-2.53%)181.20 (-1.20%)21.04 (+16.56%)
5503a95 — 2026-05-12 00:06:190.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)233.40 (n/a)210.78 (n/a)213.50 (n/a)183.40 (n/a)18.05 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.10 (n/a)0.09 (n/a)0.09 (n/a)0.08 (n/a)0.01 (n/a)162.90 (n/a)140.32 (n/a)138.60 (n/a)125.50 (n/a)14.28 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.10 (n/a)0.09 (n/a)0.09 (n/a)0.08 (n/a)0.01 (n/a)147.70 (n/a)138.32 (n/a)141.00 (n/a)118.70 (n/a)11.33 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.09 (n/a)0.08 (n/a)0.08 (n/a)0.07 (n/a)0.01 (n/a)179.30 (n/a)158.28 (n/a)159.40 (n/a)138.40 (n/a)17.70 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.09 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)192.80 (n/a)174.52 (n/a)181.00 (n/a)141.80 (n/a)20.00 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)213.50 (n/a)183.84 (n/a)178.60 (n/a)169.40 (n/a)17.08 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.08 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)190.40 (n/a)169.10 (n/a)169.70 (n/a)147.50 (n/a)15.19 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.09 (n/a)0.07 (n/a)0.06 (n/a)0.06 (n/a)0.01 (n/a)217.00 (n/a)192.22 (n/a)204.60 (n/a)143.90 (n/a)29.36 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)301.80 (n/a)238.24 (n/a)240.20 (n/a)186.30 (n/a)43.84 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)210.60 (n/a)167.16 (n/a)166.00 (n/a)142.10 (n/a)26.46 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)234.40 (n/a)177.04 (n/a)164.80 (n/a)131.90 (n/a)45.46 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)209.00 (n/a)162.18 (n/a)169.70 (n/a)123.70 (n/a)34.97 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.00 (n/a)180.20 (n/a)166.64 (n/a)163.40 (n/a)158.80 (n/a)8.38 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (n/a)0.05 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)176.70 (n/a)152.12 (n/a)143.30 (n/a)133.30 (n/a)19.81 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)254.40 (n/a)200.08 (n/a)183.40 (n/a)176.50 (n/a)32.42 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)293.50 (n/a)193.36 (n/a)175.70 (n/a)136.50 (n/a)61.14 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)303.80 (n/a)235.10 (n/a)199.10 (n/a)174.10 (n/a)62.07 (n/a)
iron/operators/gemm

test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:424.89 (-4.78%)4.41 (-5.03%)4.21 (-8.36%)4.05 (-2.85%)0.39 (-4.98%)2321.60 (+2.93%)2145.22 (+5.28%)2236.20 (+9.13%)1923.40 (+5.02%)182.79 (+2.69%)1923.38 (-4.78%)1734.86 (-5.03%)1654.34 (-8.36%)1593.46 (-2.85%)152.10 (-4.98%)
5503a95 — 2026-05-12 00:06:195.13 (n/a)4.64 (n/a)4.59 (n/a)4.17 (n/a)0.41 (n/a)2255.50 (n/a)2037.64 (n/a)2049.10 (n/a)1831.50 (n/a)177.99 (n/a)2019.91 (n/a)1826.68 (n/a)1805.34 (n/a)1640.13 (n/a)160.08 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:421.12 (-9.96%)0.90 (-19.64%)1.04 (-9.64%)0.62 (-29.67%)0.26 (+78.93%)358.10 (+42.22%)265.08 (+32.29%)211.80 (+10.66%)198.00 (+11.05%)84.90 (+183.87%)47.65 (-9.96%)38.44 (-19.64%)44.56 (-9.64%)26.36 (-29.67%)11.08 (+78.93%)
5503a95 — 2026-05-12 00:06:191.24 (n/a)1.12 (n/a)1.16 (n/a)0.88 (n/a)0.15 (n/a)251.80 (n/a)200.38 (n/a)191.40 (n/a)178.30 (n/a)29.91 (n/a)52.92 (n/a)47.83 (n/a)49.32 (n/a)37.47 (n/a)6.19 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:421.12 (-22.07%)0.89 (-25.89%)0.92 (-26.30%)0.68 (-17.72%)0.16 (-34.35%)325.80 (+21.52%)254.60 (+33.21%)240.90 (+35.72%)198.00 (+28.32%)47.67 (+2.82%)47.66 (-22.07%)38.09 (-25.89%)39.18 (-26.30%)28.96 (-17.72%)6.91 (-34.35%)
5503a95 — 2026-05-12 00:06:191.43 (n/a)1.20 (n/a)1.25 (n/a)0.82 (n/a)0.25 (n/a)268.10 (n/a)191.12 (n/a)177.50 (n/a)154.30 (n/a)46.37 (n/a)61.16 (n/a)51.39 (n/a)53.15 (n/a)35.20 (n/a)10.52 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:420.52 (-0.05%)0.52 (-0.02%)0.52 (+0.02%)0.51 (-0.13%)0.00 (+69.92%)48916.70 (+0.13%)48818.22 (+0.02%)48797.40 (-0.02%)48787.10 (+0.05%)55.22 (+70.12%)352.14 (-0.05%)351.92 (-0.02%)352.07 (+0.02%)351.21 (-0.13%)0.40 (+69.94%)
5503a95 — 2026-05-12 00:06:190.52 (n/a)0.52 (n/a)0.52 (n/a)0.52 (n/a)0.00 (n/a)48854.90 (n/a)48807.76 (n/a)48807.50 (n/a)48763.20 (n/a)32.46 (n/a)352.31 (n/a)351.99 (n/a)351.99 (n/a)351.65 (n/a)0.23 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:420.88 (-1.47%)0.88 (-0.24%)0.88 (-0.60%)0.87 (+0.64%)0.00 (-65.07%)28832.10 (-0.64%)28679.68 (+0.23%)28723.80 (+0.61%)28514.50 (+1.49%)126.43 (-64.79%)602.50 (-1.47%)599.04 (-0.24%)598.11 (-0.60%)595.86 (+0.64%)2.64 (-65.07%)
5503a95 — 2026-05-12 00:06:190.90 (n/a)0.88 (n/a)0.88 (n/a)0.87 (n/a)0.01 (n/a)29017.60 (n/a)28613.74 (n/a)28550.40 (n/a)28096.10 (n/a)359.06 (n/a)611.47 (n/a)600.48 (n/a)601.74 (n/a)592.05 (n/a)7.56 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:423.33 (-1.06%)3.26 (+0.65%)3.27 (+2.68%)3.15 (-0.79%)0.07 (-19.77%)7999.30 (+0.79%)7715.68 (-0.66%)7698.50 (-2.61%)7564.50 (+1.07%)174.54 (-18.40%)2271.11 (-1.06%)2227.51 (+0.65%)2231.59 (+2.68%)2147.68 (-0.79%)49.51 (-19.77%)
5503a95 — 2026-05-12 00:06:193.36 (n/a)3.24 (n/a)3.18 (n/a)3.17 (n/a)0.09 (n/a)7936.30 (n/a)7767.18 (n/a)7905.10 (n/a)7484.70 (n/a)213.91 (n/a)2295.35 (n/a)2213.21 (n/a)2173.25 (n/a)2164.71 (n/a)61.71 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:424.24 (-0.73%)3.85 (-5.74%)4.08 (-2.87%)3.26 (-8.86%)0.41 (+43.02%)2474.00 (+9.71%)2115.94 (+6.67%)1977.10 (+2.95%)1900.90 (+0.74%)241.21 (+56.74%)1112.09 (-0.73%)1008.93 (-5.74%)1069.22 (-2.87%)854.45 (-8.86%)108.39 (+43.02%)
5503a95 — 2026-05-12 00:06:194.27 (n/a)4.08 (n/a)4.20 (n/a)3.57 (n/a)0.29 (n/a)2255.00 (n/a)1983.64 (n/a)1920.40 (n/a)1887.00 (n/a)153.89 (n/a)1120.28 (n/a)1070.40 (n/a)1100.77 (n/a)937.46 (n/a)75.78 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:420.54 (+4.22%)0.39 (+8.87%)0.35 (+4.51%)0.29 (-4.78%)0.10 (+10.75%)4345.90 (+5.02%)3336.26 (-7.32%)3567.60 (-4.32%)2313.10 (-4.04%)784.30 (+13.14%)29.01 (+4.22%)21.10 (+8.87%)18.81 (+4.51%)15.44 (-4.78%)5.33 (+10.75%)
5503a95 — 2026-05-12 00:06:190.52 (n/a)0.36 (n/a)0.33 (n/a)0.30 (n/a)0.09 (n/a)4138.30 (n/a)3599.66 (n/a)3728.60 (n/a)2410.60 (n/a)693.24 (n/a)27.84 (n/a)19.38 (n/a)18.00 (n/a)16.22 (n/a)4.81 (n/a)

test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:426.64 (+1.04%)5.12 (+1.06%)4.74 (-3.55%)4.63 (+22.96%)0.86 (-16.45%)1437.80 (-18.68%)1323.52 (-2.44%)1404.70 (+3.68%)1002.00 (-1.03%)183.21 (-33.68%)2051.12 (+1.04%)1582.14 (+1.06%)1463.11 (-3.55%)1429.38 (+22.96%)264.88 (-16.45%)
5503a95 — 2026-05-12 00:06:196.57 (n/a)5.07 (n/a)4.91 (n/a)3.76 (n/a)1.03 (n/a)1768.00 (n/a)1356.64 (n/a)1354.80 (n/a)1012.40 (n/a)276.26 (n/a)2030.00 (n/a)1565.55 (n/a)1517.00 (n/a)1162.48 (n/a)317.03 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:420.29 (+29.71%)0.23 (+19.04%)0.25 (+23.97%)0.17 (+20.34%)0.05 (+55.41%)0.29 (+29.71%)0.23 (+19.04%)0.24 (+23.97%)0.17 (+20.34%)0.05 (+55.41%)
5503a95 — 2026-05-12 00:06:190.23 (n/a)0.19 (n/a)0.20 (n/a)0.14 (n/a)0.03 (n/a)0.22 (n/a)0.19 (n/a)0.20 (n/a)0.14 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4213.71 (+4.43%)13.36 (+5.00%)13.25 (+3.94%)13.23 (+8.55%)0.20 (-41.77%)13.70 (+4.43%)13.36 (+5.00%)13.25 (+3.94%)13.23 (+8.55%)0.20 (-41.77%)
5503a95 — 2026-05-12 00:06:1913.13 (n/a)12.73 (n/a)12.75 (n/a)12.19 (n/a)0.35 (n/a)13.12 (n/a)12.72 (n/a)12.74 (n/a)12.18 (n/a)0.34 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4224.66 (-2.95%)24.27 (-0.58%)24.55 (+0.34%)23.21 (-1.04%)0.61 (-13.43%)24.64 (-2.95%)24.25 (-0.58%)24.54 (+0.34%)23.20 (-1.04%)0.61 (-13.43%)
5503a95 — 2026-05-12 00:06:1925.41 (n/a)24.41 (n/a)24.47 (n/a)23.46 (n/a)0.70 (n/a)25.39 (n/a)24.39 (n/a)24.46 (n/a)23.44 (n/a)0.70 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4241.80 (+3.01%)40.05 (+1.00%)40.31 (+1.89%)38.35 (-0.61%)1.32 (+71.15%)41.78 (+3.01%)40.02 (+1.00%)40.28 (+1.89%)38.32 (-0.61%)1.32 (+71.15%)
5503a95 — 2026-05-12 00:06:1940.58 (n/a)39.65 (n/a)39.56 (n/a)38.58 (n/a)0.77 (n/a)40.56 (n/a)39.63 (n/a)39.54 (n/a)38.56 (n/a)0.77 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4244.18 (-0.43%)42.85 (-1.46%)42.98 (-1.14%)41.47 (-2.93%)1.20 (+104.42%)44.15 (-0.43%)42.82 (-1.46%)42.95 (-1.14%)41.45 (-2.93%)1.20 (+104.42%)
5503a95 — 2026-05-12 00:06:1944.37 (n/a)43.48 (n/a)43.47 (n/a)42.72 (n/a)0.59 (n/a)44.34 (n/a)43.46 (n/a)43.44 (n/a)42.70 (n/a)0.59 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4213.35 (-0.56%)12.91 (+2.10%)13.07 (-0.64%)12.37 (+14.51%)0.44 (-59.19%)13.35 (-0.56%)12.91 (+2.10%)13.06 (-0.64%)12.36 (+14.51%)0.44 (-59.19%)
5503a95 — 2026-05-12 00:06:1913.43 (n/a)12.65 (n/a)13.15 (n/a)10.80 (n/a)1.07 (n/a)13.42 (n/a)12.64 (n/a)13.14 (n/a)10.79 (n/a)1.07 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4224.89 (+1.84%)23.45 (-1.96%)24.03 (-0.32%)21.34 (-6.74%)1.36 (+114.37%)24.88 (+1.84%)23.43 (-1.96%)24.01 (-0.32%)21.33 (-6.74%)1.36 (+114.38%)
5503a95 — 2026-05-12 00:06:1924.44 (n/a)23.92 (n/a)24.10 (n/a)22.89 (n/a)0.64 (n/a)24.43 (n/a)23.90 (n/a)24.09 (n/a)22.87 (n/a)0.64 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4243.86 (+6.76%)41.31 (+4.30%)41.67 (+3.98%)38.29 (+2.03%)2.38 (+51.34%)43.83 (+6.76%)41.29 (+4.30%)41.65 (+3.98%)38.27 (+2.03%)2.38 (+51.34%)
5503a95 — 2026-05-12 00:06:1941.09 (n/a)39.61 (n/a)40.08 (n/a)37.53 (n/a)1.57 (n/a)41.06 (n/a)39.59 (n/a)40.05 (n/a)37.51 (n/a)1.57 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:53:4244.90 (-2.62%)42.46 (+1.29%)42.85 (+4.40%)37.87 (-2.72%)2.73 (-1.32%)44.87 (-2.62%)42.43 (+1.29%)42.82 (+4.40%)37.85 (-2.72%)2.72 (-1.32%)
5503a95 — 2026-05-12 00:06:1946.11 (n/a)41.92 (n/a)41.04 (n/a)38.93 (n/a)2.76 (n/a)46.08 (n/a)41.89 (n/a)41.01 (n/a)38.91 (n/a)2.76 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)229.10 (n/a)194.80 (n/a)191.30 (n/a)174.30 (n/a)20.73 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)202.90 (n/a)177.22 (n/a)192.60 (n/a)126.00 (n/a)32.21 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)234.20 (n/a)189.84 (n/a)188.60 (n/a)162.80 (n/a)28.12 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)238.90 (n/a)194.40 (n/a)184.80 (n/a)154.20 (n/a)32.42 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)247.00 (n/a)192.34 (n/a)184.00 (n/a)137.90 (n/a)44.01 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)259.40 (n/a)209.00 (n/a)200.20 (n/a)166.20 (n/a)36.60 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)299.80 (n/a)194.04 (n/a)166.60 (n/a)151.50 (n/a)61.71 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)276.20 (n/a)222.02 (n/a)217.20 (n/a)187.10 (n/a)36.41 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (+32.14%)0.06 (+36.86%)0.06 (+36.55%)0.04 (+86.94%)0.01 (-17.54%)185.40 (-46.51%)148.42 (-30.86%)144.00 (-26.75%)120.70 (-24.33%)24.80 (-67.46%)
5503a95 — 2026-05-12 00:06:190.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)346.60 (n/a)214.68 (n/a)196.60 (n/a)159.50 (n/a)76.20 (n/a)

test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.04 (-10.68%)0.04 (-2.12%)0.04 (-3.66%)0.03 (+48.61%)0.00 (-73.56%)235.10 (-32.71%)224.84 (-3.08%)229.90 (+3.79%)202.20 (+11.96%)13.22 (-80.68%)
5503a95 — 2026-05-12 00:06:190.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)349.40 (n/a)231.98 (n/a)221.50 (n/a)180.60 (n/a)68.43 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (-12.83%)0.05 (-6.34%)0.05 (-4.86%)0.04 (-4.42%)0.00 (-26.90%)184.80 (+4.58%)167.78 (+6.39%)172.10 (+5.13%)150.50 (+14.71%)15.05 (-11.72%)
5503a95 — 2026-05-12 00:06:190.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)176.70 (n/a)157.70 (n/a)163.70 (n/a)131.20 (n/a)17.04 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (-8.70%)0.04 (-14.71%)0.04 (-20.26%)0.03 (-19.43%)0.01 (+24.71%)238.60 (+24.08%)197.58 (+19.08%)209.50 (+25.37%)148.60 (+9.51%)36.92 (+68.98%)
5503a95 — 2026-05-12 00:06:190.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)192.30 (n/a)165.92 (n/a)167.10 (n/a)135.70 (n/a)21.85 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (-19.88%)0.04 (-15.58%)0.04 (-14.48%)0.04 (-10.31%)0.01 (-33.91%)213.10 (+11.51%)191.90 (+17.47%)194.50 (+16.89%)152.50 (+24.80%)23.85 (-6.06%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)191.10 (n/a)163.36 (n/a)166.40 (n/a)122.20 (n/a)25.39 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (-23.34%)0.04 (-16.42%)0.04 (-9.05%)0.02 (-41.69%)0.01 (+0.97%)359.90 (+71.46%)231.82 (+24.01%)214.10 (+9.96%)176.10 (+30.44%)73.37 (+147.06%)
5503a95 — 2026-05-12 00:06:190.06 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)209.90 (n/a)186.94 (n/a)194.70 (n/a)135.00 (n/a)29.70 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (-30.64%)0.04 (-7.70%)0.04 (+11.36%)0.04 (+20.84%)0.00 (-70.76%)231.00 (-17.26%)203.20 (+0.15%)202.10 (-10.22%)173.70 (+44.15%)22.28 (-64.33%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)279.20 (n/a)202.90 (n/a)225.10 (n/a)120.50 (n/a)62.46 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (+5.14%)0.04 (-0.19%)0.04 (-5.67%)0.04 (+10.00%)0.01 (+7.57%)215.70 (-9.06%)188.32 (+0.18%)200.00 (+5.99%)137.20 (-4.85%)31.30 (-8.47%)
5503a95 — 2026-05-12 00:06:190.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)237.20 (n/a)187.98 (n/a)188.70 (n/a)144.20 (n/a)34.19 (n/a)
iron/operators/mha

test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.21 (-0.71%)0.21 (-0.35%)0.21 (-0.29%)0.21 (-0.30%)0.00 (-42.31%)40727.30 (+0.30%)40643.42 (+0.35%)40693.90 (+0.29%)40536.60 (+0.72%)86.48 (-41.71%)
5503a95 — 2026-05-12 00:06:190.21 (n/a)0.21 (n/a)0.21 (n/a)0.21 (n/a)0.00 (n/a)40604.80 (n/a)40502.20 (n/a)40576.30 (n/a)40247.10 (n/a)148.37 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (-2.69%)0.05 (-2.74%)0.05 (-15.46%)0.04 (-6.64%)0.01 (+8.01%)197.50 (+7.10%)155.90 (+3.37%)163.00 (+18.29%)123.40 (+2.75%)31.03 (+12.02%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)184.40 (n/a)150.82 (n/a)137.80 (n/a)120.10 (n/a)27.70 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.10 (+12.00%)0.08 (+2.62%)0.08 (+2.37%)0.05 (-0.48%)0.02 (+50.07%)236.20 (+0.47%)170.62 (+0.27%)157.10 (-2.30%)124.50 (-10.69%)48.03 (+27.88%)
5503a95 — 2026-05-12 00:06:190.09 (n/a)0.07 (n/a)0.08 (n/a)0.05 (n/a)0.01 (n/a)235.10 (n/a)170.16 (n/a)160.80 (n/a)139.40 (n/a)37.56 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.08 (+0.43%)0.05 (-18.31%)0.05 (-17.44%)0.04 (-22.12%)0.02 (+37.44%)213.00 (+28.39%)165.62 (+27.01%)157.90 (+21.09%)108.40 (-0.46%)44.10 (+86.57%)
5503a95 — 2026-05-12 00:06:190.08 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)165.90 (n/a)130.40 (n/a)130.40 (n/a)108.90 (n/a)23.64 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.08 (+7.00%)0.06 (-7.36%)0.06 (-7.86%)0.05 (-17.96%)0.01 (+109.29%)214.80 (+21.91%)175.94 (+10.20%)170.70 (+8.52%)132.80 (-6.54%)30.79 (+136.33%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.06 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)176.20 (n/a)159.66 (n/a)157.30 (n/a)142.10 (n/a)13.03 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (-16.00%)0.05 (-9.55%)0.05 (+1.66%)0.04 (-13.64%)0.01 (-13.09%)199.50 (+15.79%)163.62 (+10.67%)149.40 (-1.65%)136.80 (+19.06%)29.76 (+18.92%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)172.30 (n/a)147.84 (n/a)151.90 (n/a)114.90 (n/a)25.03 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (-16.83%)0.05 (-17.85%)0.05 (-9.10%)0.03 (-27.77%)0.01 (-16.98%)305.80 (+38.43%)227.66 (+22.23%)226.50 (+10.00%)164.20 (+20.29%)50.80 (+37.83%)
5503a95 — 2026-05-12 00:06:190.08 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)220.90 (n/a)186.26 (n/a)205.90 (n/a)136.50 (n/a)36.86 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (-16.68%)0.05 (-0.65%)0.05 (+18.42%)0.04 (+18.20%)0.01 (-42.41%)226.90 (-15.40%)168.34 (-6.00%)151.90 (-15.52%)132.40 (+20.04%)39.23 (-39.58%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.02 (n/a)268.20 (n/a)179.08 (n/a)179.80 (n/a)110.30 (n/a)64.93 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (+2.72%)0.05 (+16.46%)0.05 (+21.31%)0.04 (+48.74%)0.01 (-36.05%)209.20 (-32.75%)175.84 (-19.32%)179.50 (-17.58%)131.50 (-2.59%)28.29 (-59.21%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)311.10 (n/a)217.94 (n/a)217.80 (n/a)135.00 (n/a)69.35 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (+8.83%)0.05 (+6.37%)0.05 (+2.83%)0.04 (+10.05%)0.01 (-4.87%)206.70 (-9.10%)162.18 (-6.90%)164.10 (-2.78%)124.00 (-8.08%)32.60 (-19.27%)
5503a95 — 2026-05-12 00:06:190.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)227.40 (n/a)174.20 (n/a)168.80 (n/a)134.90 (n/a)40.37 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (+2.43%)0.05 (+2.76%)0.05 (-8.34%)0.03 (+12.93%)0.02 (-2.63%)287.90 (-11.44%)203.02 (-4.54%)204.70 (+9.12%)130.90 (-2.39%)64.07 (-17.93%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.02 (n/a)325.10 (n/a)212.68 (n/a)187.60 (n/a)134.10 (n/a)78.07 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (-1.89%)0.05 (-5.66%)0.05 (-3.76%)0.03 (-24.28%)0.02 (+28.19%)278.10 (+32.05%)182.74 (+11.26%)164.90 (+3.91%)117.20 (+1.91%)63.15 (+77.18%)
5503a95 — 2026-05-12 00:06:190.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)210.60 (n/a)164.24 (n/a)158.70 (n/a)115.00 (n/a)35.64 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (+2.02%)0.05 (+14.01%)0.05 (+23.94%)0.04 (+29.76%)0.01 (-10.08%)246.10 (-22.93%)185.82 (-14.48%)172.90 (-19.32%)143.90 (-1.98%)43.68 (-33.09%)
5503a95 — 2026-05-12 00:06:190.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)319.30 (n/a)217.28 (n/a)214.30 (n/a)146.80 (n/a)65.29 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.07 (+31.29%)0.05 (+9.86%)0.05 (+3.71%)0.03 (+5.43%)0.01 (+50.07%)280.50 (-5.14%)186.58 (-6.78%)170.60 (-3.56%)124.60 (-23.84%)57.87 (+6.86%)
5503a95 — 2026-05-12 00:06:190.05 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)295.70 (n/a)200.14 (n/a)176.90 (n/a)163.60 (n/a)54.15 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.05 (-22.62%)0.04 (-15.90%)0.05 (-6.28%)0.04 (-16.62%)0.00 (-34.84%)246.90 (+19.97%)205.22 (+18.27%)192.80 (+6.70%)182.00 (+29.26%)25.87 (+3.14%)
5503a95 — 2026-05-12 00:06:190.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)205.80 (n/a)173.52 (n/a)180.70 (n/a)140.80 (n/a)25.09 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.06 (+37.41%)0.04 (+27.48%)0.04 (+11.27%)0.04 (+64.77%)0.01 (-6.57%)223.80 (-39.30%)202.38 (-24.77%)214.40 (-10.10%)144.30 (-27.23%)33.06 (-58.99%)
5503a95 — 2026-05-12 00:06:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)368.70 (n/a)269.00 (n/a)238.50 (n/a)198.30 (n/a)80.61 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.71 (-13.57%)0.61 (-4.63%)0.63 (-10.78%)0.52 (+70.37%)0.08 (-62.75%)188.00 (-41.31%)162.36 (-6.41%)156.00 (+12.15%)138.70 (+15.68%)20.15 (-75.83%)
5503a95 — 2026-05-12 00:06:190.82 (n/a)0.64 (n/a)0.71 (n/a)0.31 (n/a)0.20 (n/a)320.30 (n/a)173.48 (n/a)139.10 (n/a)119.90 (n/a)83.35 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.85 (-3.90%)0.65 (-3.48%)0.71 (+4.65%)0.47 (-1.21%)0.17 (+13.87%)208.00 (+1.27%)160.08 (+5.36%)138.00 (-4.50%)116.20 (+4.03%)42.83 (+24.86%)
5503a95 — 2026-05-12 00:06:190.88 (n/a)0.67 (n/a)0.68 (n/a)0.48 (n/a)0.14 (n/a)205.40 (n/a)151.94 (n/a)144.50 (n/a)111.70 (n/a)34.30 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.88 (+26.49%)0.70 (+9.89%)0.68 (+4.52%)0.58 (+21.66%)0.11 (+23.69%)170.70 (-17.77%)144.02 (-9.11%)143.70 (-4.33%)111.50 (-20.98%)21.33 (-23.21%)
5503a95 — 2026-05-12 00:06:190.70 (n/a)0.63 (n/a)0.65 (n/a)0.47 (n/a)0.09 (n/a)207.60 (n/a)158.46 (n/a)150.20 (n/a)141.10 (n/a)27.78 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.55 (-27.44%)0.51 (-15.01%)0.50 (-7.72%)0.45 (-14.37%)0.04 (-57.54%)220.20 (+16.76%)194.76 (+16.01%)197.10 (+8.42%)177.90 (+37.80%)16.84 (-31.76%)
5503a95 — 2026-05-12 00:06:190.76 (n/a)0.60 (n/a)0.54 (n/a)0.52 (n/a)0.10 (n/a)188.60 (n/a)167.88 (n/a)181.80 (n/a)129.10 (n/a)24.68 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.54 (-9.97%)0.43 (-22.87%)0.41 (-28.29%)0.34 (-28.83%)0.08 (+54.10%)216.40 (+40.52%)175.76 (+31.93%)179.20 (+39.46%)136.50 (+11.07%)30.07 (+137.56%)
5503a95 — 2026-05-12 00:06:190.60 (n/a)0.56 (n/a)0.57 (n/a)0.48 (n/a)0.05 (n/a)154.00 (n/a)133.22 (n/a)128.50 (n/a)122.90 (n/a)12.66 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.56 (+6.42%)0.47 (+3.71%)0.49 (-0.22%)0.37 (+37.61%)0.09 (-19.99%)197.40 (-27.32%)161.30 (-6.86%)151.90 (+0.26%)130.60 (-6.04%)30.22 (-45.95%)
5503a95 — 2026-05-12 00:06:190.53 (n/a)0.45 (n/a)0.49 (n/a)0.27 (n/a)0.11 (n/a)271.60 (n/a)173.18 (n/a)151.50 (n/a)139.00 (n/a)55.91 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.65 (+5.43%)0.48 (-1.55%)0.46 (-0.36%)0.36 (-13.83%)0.11 (+40.72%)206.70 (+16.06%)159.32 (+3.71%)159.00 (+0.38%)113.10 (-5.20%)34.18 (+56.91%)
5503a95 — 2026-05-12 00:06:190.62 (n/a)0.49 (n/a)0.47 (n/a)0.41 (n/a)0.08 (n/a)178.10 (n/a)153.62 (n/a)158.40 (n/a)119.30 (n/a)21.79 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.42 (-17.59%)0.33 (-24.72%)0.32 (-33.51%)0.25 (-5.40%)0.08 (-22.20%)289.80 (+5.73%)233.48 (+31.14%)230.20 (+50.36%)175.20 (+21.41%)53.51 (-1.73%)
5503a95 — 2026-05-12 00:06:190.51 (n/a)0.44 (n/a)0.48 (n/a)0.27 (n/a)0.10 (n/a)274.10 (n/a)178.04 (n/a)153.10 (n/a)144.30 (n/a)54.46 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.79 (-26.66%)0.71 (-26.39%)0.69 (-31.49%)0.64 (-25.34%)0.06 (-40.12%)206.30 (+33.96%)185.34 (+35.46%)189.70 (+46.04%)166.10 (+36.37%)15.49 (+7.46%)
5503a95 — 2026-05-12 00:06:191.08 (n/a)0.97 (n/a)1.01 (n/a)0.85 (n/a)0.10 (n/a)154.00 (n/a)136.82 (n/a)129.90 (n/a)121.80 (n/a)14.41 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:421.07 (-5.08%)0.80 (-15.54%)0.73 (-23.74%)0.67 (-13.65%)0.17 (+19.17%)195.70 (+15.80%)168.48 (+19.90%)180.50 (+31.08%)122.70 (+5.32%)30.50 (+45.54%)
5503a95 — 2026-05-12 00:06:191.12 (n/a)0.95 (n/a)0.95 (n/a)0.78 (n/a)0.14 (n/a)169.00 (n/a)140.52 (n/a)137.70 (n/a)116.50 (n/a)20.96 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:421.11 (-2.55%)0.76 (-24.79%)0.77 (-24.25%)0.54 (-36.90%)0.23 (+122.97%)242.40 (+58.43%)186.16 (+41.35%)170.00 (+32.09%)117.70 (+2.62%)52.50 (+273.10%)
5503a95 — 2026-05-12 00:06:191.14 (n/a)1.00 (n/a)1.02 (n/a)0.86 (n/a)0.10 (n/a)153.00 (n/a)131.70 (n/a)128.70 (n/a)114.70 (n/a)14.07 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.00 (+0.00%)0.00 (+3.70%)0.00 (+0.00%)0.00 (+11.11%)0.00 (-35.83%)4122.24 (-9.27%)3698.89 (-5.86%)3600.81 (-6.51%)3491.26 (+0.18%)259.12 (-44.49%)
5503a95 — 2026-05-12 00:06:190.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4543.17 (n/a)3929.28 (n/a)3851.61 (n/a)3484.84 (n/a)466.81 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.00 (+0.00%)0.00 (-2.83%)0.00 (-9.09%)0.00 (+0.00%)0.00 (+19.68%)4578.54 (+1.81%)4001.62 (+2.95%)4079.32 (+8.04%)3563.54 (-0.61%)435.99 (+20.12%)
5503a95 — 2026-05-12 00:06:190.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4497.33 (n/a)3887.13 (n/a)3775.82 (n/a)3585.56 (n/a)362.95 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:420.28 (+0.69%)0.18 (-20.13%)0.16 (-37.79%)0.15 (-1.72%)0.05 (-4.48%)14077.04 (+1.76%)12300.83 (+24.57%)13173.37 (+60.76%)7558.23 (-0.68%)2689.63 (-4.82%)
5503a95 — 2026-05-12 00:06:190.28 (n/a)0.23 (n/a)0.26 (n/a)0.15 (n/a)0.06 (n/a)13833.24 (n/a)9874.95 (n/a)8194.32 (n/a)7610.28 (n/a)2825.75 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:423.22 (-7.82%)2.47 (-12.29%)2.39 (-9.48%)1.98 (-12.33%)0.53 (+7.14%)265.20 (+14.06%)220.22 (+15.30%)218.90 (+10.44%)162.70 (+8.47%)44.84 (+37.96%)
5503a95 — 2026-05-12 00:06:193.50 (n/a)2.81 (n/a)2.65 (n/a)2.26 (n/a)0.49 (n/a)232.50 (n/a)191.00 (n/a)198.20 (n/a)150.00 (n/a)32.50 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:53:424.29 (+8.59%)2.92 (-18.72%)2.52 (-35.02%)2.31 (-19.08%)0.82 (+75.67%)227.10 (+23.56%)189.16 (+27.62%)208.20 (+53.88%)122.30 (-7.91%)43.14 (+99.13%)
5503a95 — 2026-05-12 00:06:193.95 (n/a)3.59 (n/a)3.88 (n/a)2.85 (n/a)0.47 (n/a)183.80 (n/a)148.22 (n/a)135.30 (n/a)132.80 (n/a)21.66 (n/a)
Krackan - Examples

IRON

Tested on 2026_05_12_22_09_10 at commit c05f5c9.

iron/applications/llama_3.2_1b
TestChecksTTFT (mean)TPS (mean)
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]✅ 5/52.13n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]✅ 5/52.164.15
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]✅ 5/52.09n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]✅ 5/52.084.16

Trends:

IRON Trends

iron/applications/llama_3.2_1b

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c05f5c9 — 2026-05-12 22:03:232.15 (-0.14%)2.13 (+0.10%)2.13 (-0.05%)2.11 (+0.19%)0.02 (-18.59%)
5503a95 — 2026-05-11 23:56:252.15 (n/a)2.13 (n/a)2.13 (n/a)2.11 (n/a)0.02 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c05f5c9 — 2026-05-12 22:03:234.16 (-0.72%)4.15 (-0.52%)4.15 (-0.48%)4.15 (-0.24%)0.00 (-59.80%)2.28 (+1.61%)2.16 (+0.13%)2.13 (-0.42%)2.12 (+0.19%)0.07 (+35.94%)
5503a95 — 2026-05-11 23:56:254.19 (n/a)4.17 (n/a)4.17 (n/a)4.16 (n/a)0.01 (n/a)2.24 (n/a)2.16 (n/a)2.14 (n/a)2.12 (n/a)0.05 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c05f5c9 — 2026-05-12 22:03:232.10 (+0.38%)2.09 (-0.13%)2.09 (+0.10%)2.07 (-0.96%)0.01 (+285.40%)
5503a95 — 2026-05-11 23:56:252.10 (n/a)2.09 (n/a)2.09 (n/a)2.09 (n/a)0.00 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c05f5c9 — 2026-05-12 22:03:234.17 (-0.22%)4.16 (-0.23%)4.16 (-0.07%)4.14 (-0.26%)0.01 (-6.00%)2.10 (+0.38%)2.08 (+0.20%)2.09 (+0.63%)2.06 (-0.24%)0.02 (+42.19%)
5503a95 — 2026-05-11 23:56:254.18 (n/a)4.17 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.10 (n/a)2.08 (n/a)2.08 (n/a)2.07 (n/a)0.01 (n/a)
Phoenix - Small

IRON

Tested on 2026_05_12_22_01_30 at commit c05f5c9.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5245.440.05n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5335.400.04n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5305.720.04n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5397.820.02n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5727.040.01n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5403.520.01n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5807.600.01n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5469.040.01n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5441.840.01n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5350.680.04n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5333.240.04n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5461.560.03n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5391.840.04n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5371.100.04n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5369.260.04n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5379.360.02n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5802.840.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5478.560.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5416.320.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5421.860.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5474.440.02n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5496.380.4619.72
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5573.660.4117.51
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/583454.000.30205.87
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/525191.601.00682.11
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/53416.202.67701.20
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/55995.500.2111.28
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.110.11
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a3.493.49
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a6.796.78
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a10.4510.44
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a3.703.69
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a6.606.60
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a10.2210.22
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5321.440.03n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5791.760.02n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5460.520.02n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5484.820.02n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5409.580.02n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5526.760.02n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5336.660.03n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5303.660.03n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5309.820.03n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5447.380.02n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5370.100.02n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5480.560.02n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5383.680.02n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5371.280.03n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5447.740.02n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5329.040.03n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5458.940.02n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5489.920.02n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5394.120.02n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5419.820.03n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5444.140.02n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5802.880.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5382.080.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5433.540.03n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5406.900.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5373.720.03n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5773.880.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5469.620.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5386.460.02n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5431.660.26n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5498.840.25n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5387.220.28n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5337.780.24n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5323.460.26n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5699.720.18n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5320.100.03n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5368.540.03n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5384.180.02n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5342.660.03n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5397.140.02n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5676.800.02n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5338.760.03n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5286.220.03n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5404.240.02n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5479.560.30n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5491.140.30n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5698.440.30n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/55899.790.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/514132.670.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/523432.150.09n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5429.940.02n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5328.240.03n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5478.320.02n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5469.420.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5504.780.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5677.300.01n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]✅ 5/5472.761.22n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]✅ 5/5871.860.81n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (+9.72%)0.05 (+41.48%)0.05 (+28.33%)0.04 (+121.46%)0.00 (-65.87%)287.90 (-54.85%)245.44 (-37.71%)235.40 (-22.08%)229.20 (-8.83%)24.30 (-85.71%)
5503a95 — 2026-05-11 23:50:480.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)637.60 (n/a)394.04 (n/a)302.10 (n/a)251.40 (n/a)170.01 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (-11.01%)0.04 (-0.67%)0.04 (-14.02%)0.02 (+5.54%)0.01 (-34.30%)562.20 (-5.24%)335.40 (-6.39%)288.40 (+16.34%)265.50 (+12.36%)127.23 (-23.59%)
5503a95 — 2026-05-11 23:50:480.05 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.02 (n/a)593.30 (n/a)358.30 (n/a)247.90 (n/a)236.30 (n/a)166.51 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (+0.93%)0.04 (+27.38%)0.04 (+23.70%)0.03 (+46.94%)0.01 (-37.55%)433.20 (-31.94%)305.72 (-29.18%)284.10 (-19.15%)245.70 (-0.93%)73.94 (-59.33%)
5503a95 — 2026-05-11 23:50:480.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)636.50 (n/a)431.70 (n/a)351.40 (n/a)248.00 (n/a)181.83 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.02 (-4.76%)0.02 (-4.12%)0.01 (-14.47%)0.01 (-11.82%)0.01 (+8.88%)545.30 (+13.39%)397.82 (+7.44%)482.40 (+16.92%)239.00 (+5.01%)147.04 (+20.12%)
5503a95 — 2026-05-11 23:50:480.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)480.90 (n/a)370.26 (n/a)412.60 (n/a)227.60 (n/a)122.40 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.02 (+3.09%)0.01 (-15.61%)0.01 (-8.07%)0.00 (-74.55%)0.01 (+82.75%)1918.50 (+292.89%)727.04 (+81.28%)495.20 (+8.79%)275.00 (-3.00%)683.93 (+591.04%)
5503a95 — 2026-05-11 23:50:480.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)488.30 (n/a)401.06 (n/a)455.20 (n/a)283.50 (n/a)98.97 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.02 (+2.40%)0.01 (-2.72%)0.01 (-1.50%)0.01 (-3.56%)0.01 (+15.85%)561.30 (+3.69%)403.52 (+5.93%)374.20 (+1.52%)257.70 (-2.35%)143.01 (+23.24%)
5503a95 — 2026-05-11 23:50:480.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)541.30 (n/a)380.92 (n/a)368.60 (n/a)263.90 (n/a)116.04 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.02 (+14.22%)0.01 (+1.92%)0.01 (+9.81%)0.00 (-74.92%)0.01 (+94.91%)2437.70 (+298.77%)807.60 (+65.38%)430.30 (-8.95%)276.50 (-12.44%)916.51 (+665.22%)
5503a95 — 2026-05-11 23:50:480.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)611.30 (n/a)488.34 (n/a)472.60 (n/a)315.80 (n/a)119.77 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.02 (+16.14%)0.01 (-17.13%)0.01 (-33.45%)0.01 (-40.38%)0.01 (+38.19%)820.00 (+67.72%)469.04 (+36.57%)418.60 (+50.31%)210.80 (-13.89%)237.07 (+99.27%)
5503a95 — 2026-05-11 23:50:480.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)488.90 (n/a)343.44 (n/a)278.50 (n/a)244.80 (n/a)118.97 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.02 (-26.90%)0.01 (-5.54%)0.01 (+7.77%)0.01 (+3.83%)0.00 (-46.47%)574.80 (-3.69%)441.84 (-0.45%)460.10 (-7.22%)328.00 (+36.84%)99.37 (-28.46%)
5503a95 — 2026-05-11 23:50:480.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)596.80 (n/a)443.82 (n/a)495.90 (n/a)239.70 (n/a)138.91 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)503.40 (n/a)350.68 (n/a)276.40 (n/a)231.10 (n/a)139.43 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)432.10 (n/a)333.24 (n/a)292.70 (n/a)260.50 (n/a)82.11 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.08 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.03 (n/a)573.70 (n/a)461.56 (n/a)525.20 (n/a)152.70 (n/a)174.11 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)567.20 (n/a)391.84 (n/a)422.80 (n/a)240.30 (n/a)139.97 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)639.30 (n/a)371.10 (n/a)319.10 (n/a)239.30 (n/a)159.54 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)551.90 (n/a)369.26 (n/a)297.90 (n/a)248.90 (n/a)134.96 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)547.40 (n/a)379.36 (n/a)342.70 (n/a)237.80 (n/a)144.17 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)1919.90 (n/a)802.84 (n/a)522.20 (n/a)236.30 (n/a)659.41 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)600.90 (n/a)478.56 (n/a)489.20 (n/a)294.50 (n/a)130.07 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)583.90 (n/a)416.32 (n/a)424.60 (n/a)278.50 (n/a)117.08 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)569.30 (n/a)421.86 (n/a)367.70 (n/a)307.50 (n/a)121.01 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)633.40 (n/a)474.44 (n/a)479.90 (n/a)250.40 (n/a)140.77 (n/a)
iron/operators/gemm

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:370.56 (-11.90%)0.46 (+10.82%)0.49 (+25.27%)0.35 (+84.08%)0.10 (-44.06%)636.00 (-45.68%)496.38 (-21.08%)452.00 (-20.18%)393.80 (+13.52%)108.87 (-66.36%)23.97 (-11.90%)19.72 (+10.82%)20.88 (+25.27%)14.84 (+84.08%)4.08 (-44.06%)
5503a95 — 2026-05-11 23:50:480.64 (n/a)0.42 (n/a)0.39 (n/a)0.19 (n/a)0.17 (n/a)1170.80 (n/a)628.94 (n/a)566.30 (n/a)346.90 (n/a)323.64 (n/a)27.20 (n/a)17.80 (n/a)16.67 (n/a)8.06 (n/a)7.30 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:370.64 (+18.29%)0.41 (+9.69%)0.36 (-3.68%)0.31 (+139.67%)0.13 (-20.31%)713.40 (-58.28%)573.66 (-25.19%)619.80 (+3.80%)347.60 (-15.47%)138.69 (-74.27%)27.15 (+18.29%)17.51 (+9.69%)15.23 (-3.68%)13.23 (+139.67%)5.56 (-20.31%)
5503a95 — 2026-05-11 23:50:480.54 (n/a)0.37 (n/a)0.37 (n/a)0.13 (n/a)0.16 (n/a)1709.90 (n/a)766.82 (n/a)597.10 (n/a)411.20 (n/a)538.94 (n/a)22.95 (n/a)15.96 (n/a)15.81 (n/a)5.52 (n/a)6.98 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:370.31 (-1.57%)0.30 (-1.47%)0.30 (-1.77%)0.30 (-1.32%)0.00 (-16.32%)84328.60 (+1.34%)83454.00 (+1.49%)83538.40 (+1.81%)82419.20 (+1.60%)726.22 (-13.95%)208.44 (-1.57%)205.87 (-1.47%)205.65 (-1.77%)203.73 (-1.32%)1.80 (-16.32%)
5503a95 — 2026-05-11 23:50:480.31 (n/a)0.31 (n/a)0.31 (n/a)0.30 (n/a)0.00 (n/a)83216.60 (n/a)82225.06 (n/a)82057.10 (n/a)81121.80 (n/a)843.94 (n/a)211.78 (n/a)208.95 (n/a)209.36 (n/a)206.45 (n/a)2.15 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:371.01 (+0.38%)1.00 (+0.91%)1.01 (+1.42%)0.97 (-0.12%)0.02 (+18.10%)25894.60 (+0.12%)25191.60 (-0.90%)25032.10 (-1.40%)24923.40 (-0.37%)404.97 (+17.95%)689.31 (+0.38%)682.11 (+0.91%)686.31 (+1.42%)663.45 (-0.12%)10.77 (+18.10%)
5503a95 — 2026-05-11 23:50:481.01 (n/a)0.99 (n/a)0.99 (n/a)0.97 (n/a)0.01 (n/a)25863.90 (n/a)25420.24 (n/a)25386.80 (n/a)25017.00 (n/a)343.35 (n/a)686.73 (n/a)675.93 (n/a)676.72 (n/a)664.24 (n/a)9.12 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:373.96 (-0.23%)2.67 (-2.58%)2.17 (+2.71%)1.63 (-11.02%)1.07 (-1.86%)4946.70 (+12.39%)3416.20 (+3.28%)3719.10 (-2.64%)2035.30 (+0.23%)1271.39 (+8.73%)1038.62 (-0.23%)701.20 (-2.58%)568.41 (+2.71%)427.34 (-11.02%)281.28 (-1.86%)
5503a95 — 2026-05-11 23:50:483.97 (n/a)2.74 (n/a)2.11 (n/a)1.83 (n/a)1.09 (n/a)4401.30 (n/a)3307.84 (n/a)3819.90 (n/a)2030.60 (n/a)1169.34 (n/a)1041.02 (n/a)719.80 (n/a)553.40 (n/a)480.30 (n/a)286.60 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:370.24 (-9.26%)0.21 (-0.93%)0.21 (-10.08%)0.18 (+20.48%)0.02 (-53.88%)6845.60 (-17.00%)5995.50 (-2.35%)5927.90 (+11.21%)5298.30 (+10.20%)610.38 (-58.03%)12.67 (-9.26%)11.28 (-0.93%)11.32 (-10.08%)9.80 (+20.48%)1.13 (-53.88%)
5503a95 — 2026-05-11 23:50:480.26 (n/a)0.21 (n/a)0.23 (n/a)0.15 (n/a)0.05 (n/a)8247.30 (n/a)6139.54 (n/a)5330.30 (n/a)4807.80 (n/a)1454.42 (n/a)13.96 (n/a)11.39 (n/a)12.59 (n/a)8.14 (n/a)2.45 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:370.13 (+6.57%)0.11 (+62.72%)0.12 (+91.43%)0.07 (+113.87%)0.03 (-22.38%)0.13 (+6.57%)0.11 (+62.72%)0.12 (+91.43%)0.07 (+113.87%)0.03 (-22.38%)
5503a95 — 2026-05-11 23:50:480.12 (n/a)0.07 (n/a)0.06 (n/a)0.03 (n/a)0.03 (n/a)0.12 (n/a)0.07 (n/a)0.06 (n/a)0.03 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:373.87 (+0.39%)3.49 (-2.83%)3.46 (-3.34%)3.18 (-4.24%)0.32 (+66.26%)3.87 (+0.39%)3.49 (-2.83%)3.45 (-3.34%)3.18 (-4.24%)0.32 (+66.26%)
5503a95 — 2026-05-11 23:50:483.86 (n/a)3.59 (n/a)3.58 (n/a)3.32 (n/a)0.20 (n/a)3.85 (n/a)3.59 (n/a)3.57 (n/a)3.32 (n/a)0.19 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:377.55 (+0.30%)6.79 (-4.87%)6.63 (-6.86%)5.92 (-9.93%)0.65 (+73.14%)7.55 (+0.30%)6.78 (-4.87%)6.63 (-6.86%)5.92 (-9.93%)0.65 (+73.14%)
5503a95 — 2026-05-11 23:50:487.53 (n/a)7.13 (n/a)7.12 (n/a)6.57 (n/a)0.37 (n/a)7.53 (n/a)7.13 (n/a)7.12 (n/a)6.57 (n/a)0.37 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:3713.40 (-3.32%)10.45 (+0.00%)10.26 (+4.58%)8.42 (+6.36%)1.91 (-26.08%)13.39 (-3.32%)10.44 (+0.00%)10.25 (+4.58%)8.42 (+6.36%)1.91 (-26.08%)
5503a95 — 2026-05-11 23:50:4813.86 (n/a)10.45 (n/a)9.81 (n/a)7.92 (n/a)2.58 (n/a)13.85 (n/a)10.44 (n/a)9.80 (n/a)7.91 (n/a)2.58 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:373.80 (+0.25%)3.70 (+4.86%)3.76 (+0.83%)3.48 (+13.51%)0.13 (-59.57%)3.80 (+0.25%)3.69 (+4.86%)3.76 (+0.83%)3.48 (+13.51%)0.13 (-59.57%)
5503a95 — 2026-05-11 23:50:483.79 (n/a)3.52 (n/a)3.73 (n/a)3.07 (n/a)0.33 (n/a)3.79 (n/a)3.52 (n/a)3.73 (n/a)3.07 (n/a)0.33 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:377.59 (+5.11%)6.60 (-0.80%)6.54 (-4.43%)5.58 (-1.50%)0.95 (+60.64%)7.58 (+5.11%)6.60 (-0.80%)6.54 (-4.43%)5.58 (-1.50%)0.95 (+60.64%)
5503a95 — 2026-05-11 23:50:487.22 (n/a)6.66 (n/a)6.85 (n/a)5.66 (n/a)0.59 (n/a)7.22 (n/a)6.65 (n/a)6.84 (n/a)5.66 (n/a)0.59 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c05f5c9 — 2026-05-12 21:58:3713.73 (-2.76%)10.22 (-4.75%)9.86 (-9.36%)7.14 (+1.41%)2.68 (+5.48%)13.72 (-2.76%)10.22 (-4.75%)9.85 (-9.36%)7.14 (+1.41%)2.68 (+5.48%)
5503a95 — 2026-05-11 23:50:4814.12 (n/a)10.73 (n/a)10.88 (n/a)7.04 (n/a)2.54 (n/a)14.11 (n/a)10.73 (n/a)10.87 (n/a)7.04 (n/a)2.54 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)611.70 (n/a)321.44 (n/a)268.30 (n/a)195.30 (n/a)166.50 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2466.40 (n/a)791.76 (n/a)379.90 (n/a)270.00 (n/a)939.49 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)635.60 (n/a)460.52 (n/a)549.30 (n/a)251.00 (n/a)174.74 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)606.60 (n/a)484.82 (n/a)520.00 (n/a)323.00 (n/a)118.74 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)591.10 (n/a)409.58 (n/a)431.80 (n/a)237.00 (n/a)165.50 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)579.70 (n/a)526.76 (n/a)542.80 (n/a)451.20 (n/a)57.49 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (+18.23%)0.03 (+28.71%)0.03 (+77.66%)0.01 (-16.36%)0.01 (+17.19%)638.00 (+19.57%)336.66 (-19.12%)283.70 (-43.71%)201.70 (-15.43%)173.47 (+23.91%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)533.60 (n/a)416.26 (n/a)504.00 (n/a)238.50 (n/a)139.99 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (+25.55%)0.03 (+32.41%)0.03 (+13.09%)0.02 (+30.77%)0.01 (-5.58%)470.30 (-23.53%)303.66 (-28.29%)287.60 (-11.59%)229.50 (-20.34%)97.08 (-41.85%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)615.00 (n/a)423.44 (n/a)325.30 (n/a)288.10 (n/a)166.95 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (+17.18%)0.03 (+34.90%)0.03 (+75.39%)0.02 (+25.13%)0.01 (+20.09%)468.70 (-20.09%)309.82 (-25.91%)256.90 (-42.99%)227.00 (-14.63%)104.79 (-18.08%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)586.50 (n/a)418.16 (n/a)450.60 (n/a)265.90 (n/a)127.92 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (+18.16%)0.02 (+0.86%)0.02 (-12.29%)0.01 (-2.88%)0.01 (+50.11%)587.60 (+2.98%)447.38 (+4.35%)459.00 (+14.01%)247.90 (-15.36%)146.74 (+34.00%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)570.60 (n/a)428.74 (n/a)402.60 (n/a)292.90 (n/a)109.50 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (-17.61%)0.02 (-13.81%)0.03 (-5.70%)0.02 (-19.31%)0.01 (-14.93%)531.50 (+23.95%)370.10 (+16.90%)316.90 (+6.06%)234.20 (+21.35%)136.88 (+26.07%)
5503a95 — 2026-05-11 23:50:480.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)428.80 (n/a)316.60 (n/a)298.80 (n/a)193.00 (n/a)108.58 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (-23.96%)0.02 (-3.02%)0.01 (+2.29%)0.01 (-1.30%)0.01 (-29.15%)632.60 (+1.33%)480.56 (-0.84%)549.80 (-2.24%)326.80 (+31.51%)140.90 (-9.83%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)624.30 (n/a)484.62 (n/a)562.40 (n/a)248.50 (n/a)156.26 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (-18.35%)0.02 (-11.06%)0.02 (-13.67%)0.02 (+12.21%)0.00 (-42.78%)491.80 (-10.87%)394.12 (+6.62%)413.80 (+15.85%)291.20 (+22.46%)75.02 (-38.50%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)551.80 (n/a)369.66 (n/a)357.20 (n/a)237.80 (n/a)121.97 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.06 (+53.84%)0.03 (+32.51%)0.03 (+33.79%)0.02 (-13.11%)0.02 (+140.51%)677.40 (+15.09%)419.82 (-14.24%)417.30 (-25.26%)222.50 (-35.02%)189.68 (+70.94%)
5503a95 — 2026-05-11 23:50:480.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)588.60 (n/a)489.52 (n/a)558.30 (n/a)342.40 (n/a)110.97 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (-13.28%)0.02 (+8.80%)0.02 (+16.83%)0.01 (+7.30%)0.01 (-29.29%)585.90 (-6.81%)444.14 (-12.13%)446.10 (-14.41%)295.30 (+15.31%)118.37 (-19.99%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)628.70 (n/a)505.44 (n/a)521.20 (n/a)256.10 (n/a)147.95 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (-45.40%)0.02 (-33.23%)0.02 (+6.23%)0.00 (-76.69%)0.01 (-43.91%)2495.90 (+328.92%)802.88 (+111.74%)412.90 (-5.86%)259.30 (+83.12%)949.62 (+425.33%)
5503a95 — 2026-05-11 23:50:480.07 (n/a)0.04 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)581.90 (n/a)379.18 (n/a)438.60 (n/a)141.60 (n/a)180.76 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (+24.39%)0.02 (+39.39%)0.03 (+54.02%)0.01 (+267.94%)0.01 (-8.03%)664.30 (-72.82%)382.08 (-53.57%)321.00 (-35.09%)224.50 (-19.62%)171.75 (-81.19%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2444.20 (n/a)822.84 (n/a)494.50 (n/a)279.30 (n/a)913.16 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (-37.87%)0.03 (-16.81%)0.03 (-8.61%)0.02 (-14.45%)0.01 (-38.61%)608.00 (+16.90%)433.54 (+16.68%)398.80 (+9.41%)272.80 (+60.94%)166.05 (+24.52%)
5503a95 — 2026-05-11 23:50:480.06 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)520.10 (n/a)371.56 (n/a)364.50 (n/a)169.50 (n/a)133.35 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (-7.46%)0.02 (+42.33%)0.02 (+47.78%)0.02 (+275.37%)0.01 (-37.82%)527.30 (-73.36%)406.90 (-50.89%)432.30 (-32.33%)289.50 (+8.06%)110.38 (-83.57%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)0.01 (n/a)1979.40 (n/a)828.60 (n/a)638.80 (n/a)267.90 (n/a)671.83 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (-17.05%)0.03 (+1.68%)0.03 (+4.17%)0.02 (+16.44%)0.01 (-28.34%)513.70 (-14.13%)373.72 (-8.50%)337.20 (-3.99%)235.30 (+20.54%)128.51 (-26.05%)
5503a95 — 2026-05-11 23:50:480.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)598.20 (n/a)408.44 (n/a)351.20 (n/a)195.20 (n/a)173.77 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (-9.55%)0.02 (-12.34%)0.01 (-14.83%)0.00 (-73.45%)0.01 (+24.08%)2135.30 (+276.66%)773.88 (+77.41%)593.40 (+17.41%)236.10 (+10.53%)779.41 (+447.31%)
5503a95 — 2026-05-11 23:50:480.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)566.90 (n/a)436.22 (n/a)505.40 (n/a)213.60 (n/a)142.41 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.03 (-25.04%)0.02 (-1.77%)0.02 (+23.50%)0.02 (+4.95%)0.00 (-53.01%)599.00 (-4.71%)469.62 (-4.59%)448.40 (-19.03%)360.00 (+33.38%)88.31 (-38.32%)
5503a95 — 2026-05-11 23:50:480.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)628.60 (n/a)492.22 (n/a)553.80 (n/a)269.90 (n/a)143.18 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.04 (-5.90%)0.02 (+29.26%)0.02 (+44.81%)0.01 (+48.82%)0.01 (-6.39%)641.30 (-32.81%)386.46 (-27.32%)332.00 (-30.95%)231.90 (+6.28%)176.44 (-33.74%)
5503a95 — 2026-05-11 23:50:480.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)954.40 (n/a)531.76 (n/a)480.80 (n/a)218.20 (n/a)266.30 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.39 (+11.73%)0.26 (+5.54%)0.23 (-5.53%)0.16 (+10.76%)0.11 (+28.83%)614.20 (-9.72%)431.66 (-1.44%)422.30 (+5.87%)250.80 (-10.49%)177.83 (+6.69%)
5503a95 — 2026-05-11 23:50:480.35 (n/a)0.25 (n/a)0.25 (n/a)0.14 (n/a)0.09 (n/a)680.30 (n/a)437.96 (n/a)398.90 (n/a)280.20 (n/a)166.68 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.40 (+69.55%)0.25 (+25.99%)0.22 (+13.61%)0.11 (-35.88%)0.12 (+365.28%)924.70 (+55.96%)498.84 (-1.20%)445.60 (-11.99%)245.80 (-41.03%)277.18 (+312.94%)
5503a95 — 2026-05-11 23:50:480.24 (n/a)0.20 (n/a)0.19 (n/a)0.17 (n/a)0.03 (n/a)592.90 (n/a)504.90 (n/a)506.30 (n/a)416.80 (n/a)67.12 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.39 (+57.11%)0.28 (+40.72%)0.29 (+45.39%)0.16 (-7.47%)0.09 (+189.34%)617.90 (+8.06%)387.22 (-22.97%)344.40 (-31.22%)250.10 (-36.35%)145.33 (+102.21%)
5503a95 — 2026-05-11 23:50:480.25 (n/a)0.20 (n/a)0.20 (n/a)0.17 (n/a)0.03 (n/a)571.80 (n/a)502.68 (n/a)500.70 (n/a)392.90 (n/a)71.87 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.32 (+38.36%)0.24 (+64.27%)0.25 (+65.63%)0.15 (+285.43%)0.07 (+5.94%)506.50 (-74.06%)337.78 (-54.71%)295.00 (-39.62%)231.00 (-27.74%)115.58 (-82.96%)
5503a95 — 2026-05-11 23:50:480.23 (n/a)0.14 (n/a)0.15 (n/a)0.04 (n/a)0.07 (n/a)1952.40 (n/a)745.78 (n/a)488.60 (n/a)319.70 (n/a)678.37 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.37 (+4.64%)0.26 (+9.83%)0.27 (+52.90%)0.12 (-23.74%)0.09 (-1.78%)596.60 (+31.12%)323.46 (-6.58%)276.40 (-34.61%)200.00 (-4.44%)158.96 (+31.74%)
5503a95 — 2026-05-11 23:50:480.35 (n/a)0.24 (n/a)0.17 (n/a)0.16 (n/a)0.10 (n/a)455.00 (n/a)346.24 (n/a)422.70 (n/a)209.30 (n/a)120.66 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.30 (+10.79%)0.18 (-6.08%)0.17 (-16.70%)0.04 (-70.27%)0.10 (+42.71%)2055.00 (+236.33%)699.72 (+64.62%)428.00 (+20.06%)245.60 (-9.74%)764.35 (+351.19%)
5503a95 — 2026-05-11 23:50:480.27 (n/a)0.20 (n/a)0.21 (n/a)0.12 (n/a)0.07 (n/a)611.00 (n/a)425.06 (n/a)356.50 (n/a)272.10 (n/a)169.41 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.47 (+5.76%)0.30 (-2.15%)0.29 (-0.59%)0.20 (+18.70%)0.11 (-0.49%)645.10 (-15.74%)479.56 (-0.10%)451.10 (+0.58%)276.80 (-5.46%)143.10 (-22.24%)
5503a95 — 2026-05-11 23:50:480.45 (n/a)0.30 (n/a)0.29 (n/a)0.17 (n/a)0.11 (n/a)765.60 (n/a)480.06 (n/a)448.50 (n/a)292.80 (n/a)184.02 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.42 (-8.91%)0.30 (-9.53%)0.23 (-34.85%)0.20 (+9.84%)0.11 (+4.83%)647.50 (-8.96%)491.14 (+11.24%)576.00 (+53.48%)308.50 (+9.79%)166.39 (-1.86%)
5503a95 — 2026-05-11 23:50:480.47 (n/a)0.33 (n/a)0.35 (n/a)0.18 (n/a)0.11 (n/a)711.20 (n/a)441.52 (n/a)375.30 (n/a)281.00 (n/a)169.54 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.42 (-40.21%)0.30 (-32.09%)0.35 (-13.75%)0.07 (-75.94%)0.14 (-13.33%)1997.40 (+315.69%)698.44 (+113.81%)372.10 (+15.96%)311.10 (+67.26%)728.35 (+579.00%)
5503a95 — 2026-05-11 23:50:480.70 (n/a)0.44 (n/a)0.41 (n/a)0.27 (n/a)0.16 (n/a)480.50 (n/a)326.66 (n/a)320.90 (n/a)186.00 (n/a)107.27 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.00 (+14.29%)0.00 (+105.88%)0.00 (+133.33%)0.00 (+200.00%)0.00 (-65.90%)6969.97 (-69.48%)5899.79 (-61.60%)5792.28 (-62.57%)5012.06 (-19.81%)701.19 (-88.35%)
5503a95 — 2026-05-11 23:50:480.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)22834.34 (n/a)15364.62 (n/a)15474.57 (n/a)6250.44 (n/a)6018.20 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.00 (+116.67%)0.00 (+34.62%)0.00 (+20.00%)0.00 (+0.00%)0.00 (+322.58%)21280.59 (+10.60%)14132.67 (-12.31%)13605.55 (-18.96%)6374.79 (-52.72%)5544.74 (+140.55%)
5503a95 — 2026-05-11 23:50:480.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)19240.68 (n/a)16117.52 (n/a)16788.38 (n/a)13481.94 (n/a)2305.00 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:370.14 (+0.15%)0.09 (-5.54%)0.08 (-7.30%)0.07 (-11.49%)0.02 (+17.40%)28083.64 (+12.98%)23432.15 (+7.46%)25027.98 (+7.92%)15513.38 (-0.13%)4832.92 (+31.54%)
5503a95 — 2026-05-11 23:50:480.14 (n/a)0.10 (n/a)0.09 (n/a)0.08 (n/a)0.02 (n/a)24856.64 (n/a)21805.69 (n/a)23192.07 (n/a)15534.00 (n/a)3674.02 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:371.80 (+11.54%)1.22 (+12.02%)1.05 (-5.66%)0.77 (+374.90%)0.42 (-28.72%)684.70 (-78.94%)472.76 (-52.00%)501.00 (+5.99%)290.90 (-10.35%)156.77 (-87.66%)
5503a95 — 2026-05-11 23:50:481.62 (n/a)1.09 (n/a)1.11 (n/a)0.16 (n/a)0.59 (n/a)3251.70 (n/a)984.96 (n/a)472.70 (n/a)324.50 (n/a)1270.69 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c05f5c9 — 2026-05-12 21:58:371.00 (-31.81%)0.81 (-24.94%)0.97 (+0.65%)0.25 (-71.70%)0.32 (+34.21%)2128.80 (+253.39%)871.86 (+72.43%)542.50 (-0.64%)525.70 (+46.64%)703.81 (+632.82%)
5503a95 — 2026-05-11 23:50:481.46 (n/a)1.07 (n/a)0.96 (n/a)0.87 (n/a)0.24 (n/a)602.40 (n/a)505.64 (n/a)546.00 (n/a)358.50 (n/a)96.04 (n/a)
Phoenix - Examples

IRON

Tested on 2026_05_12_21_54_08 at commit c05f5c9.

Trends:

IRON Trends

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant