Skip to content

LancerLab/kebab

Repository files navigation

Kebab

Hopper GEMM kernels, microbenchmarks, and tuning notes in one repo.

Kebab is a CUDA/CuTe playground for studying fast GEMM on NVIDIA Hopper. The source lives in kebab/; the repo root stays focused on build entrypoints, config, results, and docs.

  • CUDA GEMM versions side by side, from simpler baselines to deeper Hopper kernels
  • CuTe implementations, validation harnesses, and benchmark runners
  • Focused microbenchmarks for WGMMA, TMA copy paths, sparse MMA, and metadata packing
  • One config.yaml for reproducible runs

Hopper-only for the interesting path: sm_90 / sm_90a, CUDA 12.x, and yaml-cpp.

Benchmark Snapshot

Generated by make bench-gemm with the checked-in config.yaml on NVIDIA H800 PCIe. FP16, mode RC, all numbers in TFLOP/s.

M = N = K cuBLAS v2 v3 v4 v5 v10 Best
2048 420.7 220.6 249.7 290.7 255.0 290.5 v4 (290.7, 69.1%)
4096 473.6 237.4 294.3 334.0 379.0 309.5 v5 (379.0, 80.0%)
8192 412.5 187.0 277.9 248.8 336.7 290.6 v5 (336.7, 81.6%)

The point of the repo is the ladder, not one magic kernel: each version exposes a different Hopper idea you can measure, compare, and inspect.

Usage

make setup
make bench-gemm

Results land in bench_results/. If you want different sizes, versions, or precisions, edit config.yaml.

Microbench Lab

  • make mbench-mma-wgmma - raw WGMMA behavior
  • make mbench-copy-gmem-to-smem-2d-tma-cute - TMA + CuTe copy path
  • make mbench-sparse-mma - Hopper 2:4 sparse MMA experiment
  • make mbench-cutlass-meta-probe - CUTLASS metadata packing sanity check

Layout

  • kebab/lib/cuda - CUDA GEMM versions and baselines
  • kebab/lib/cute - CuTe kernels
  • kebab/lib/benchmark - operator benchmarks
  • kebab/lib/microbench - focused kernel probes
  • docs/ - design notes and optimization writeups
  • config.yaml - benchmark/runtime config

License

MIT. See LICENSE.

About

Dissecting Hopper/Ampere performance with microbenchmarks and Gemm best practices.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors