Compiler & GPU-kernel person in Shanghai — I mostly make tensors go faster.
still looking for help with getting rid of procrastination 🤔
- TileOPs — high-performance LLM operator library on TileLang
- TileRT — tile-based runtime for ultra-low-latency LLM inference
- nncase — end-to-end compiler for efficient LLM deployment (paper)
- handson-polyhedral — hands-on tutorials on polyhedral compilation
🔥 Latest from my blog — zhen8838.github.io
auto-updated from my blog
- Circle-Loss — TF2 implementation of CircleLoss
- AnimeStylized — AnimeGAN / white-box cartoonize
- K210_Yolo_framework — YOLOv3 on the K210 edge chip
- playground — code for fun





