greatrobocreator

greatrobocreator

Achievements

GopherGame GopherGame Public

Shad wasm-homework GopherGame with my own mini-GameEngine

Go
lavawolfiee/mini-flash-attention lavawolfiee/mini-flash-attention Public

Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

Cuda 23 1