Skip to content
View Zlatanwic's full-sized avatar
  • Tongji University
  • Shanghai

Block or report Zlatanwic

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Zlatanwic/README.md

โšก About Me โšก

๐Ÿ‘‹ Hi, I'm Kuo Li

I'm an undergraduate at Tongji University, exploring the intersection of Machine Learning Systems, LLM training / inference / serving, deep learning compilers, agent infrastructure systems, and operating systems.

My recent work focuses on:

  • ๐Ÿš€ LLM-aware high-performance code generation
  • ๐Ÿง  KV cache optimization for long-context LLM inference
  • ๐Ÿงฉ Optimizing vLLM performance on RISC-V platforms
  • โš™๏ธ DL compilers, software hardware co-design/co-opt and MegaKernels
  • ๐Ÿ›ฐ๏ธ Modern AI / Agent systems from systems-level perspectives

I enjoy building low-level systems that make high-level intelligence run faster, cheaper, and more reliably.


๐Ÿงฌ Research & Engineering Constellation




๐Ÿ”ฅ Selected Work ๐Ÿ”ฅ

๐Ÿง  SieveKV

Semantics-aware KV cache eviction for long-context LLM inference

  • Long-context LLM inference optimization
  • KV cache memory pressure reduction
  • Semantic-aware token retention and eviction
  • Serving-time efficiency and accuracy trade-off

โšก Paged KV Cache CUDA Kernels

Fused CUDA kernels for efficient LLM decoding

  • Paged KV cache layout optimization
  • GPU memory access pattern analysis
  • Decode-stage kernel acceleration
  • CUDA-level performance engineering

๐Ÿฆ€ NovaOS

Rust-based POSIX-compatible kernel for RISC-V64

  • Rust systems programming
  • RISC-V64 kernel development
  • POSIX-compatible OS components
  • Low-level runtime and memory management

๐ŸŒ Distributed Semantic Retrieval System

Chord-based distributed dense retrieval and RAG pipeline

  • Distributed hash table architecture
  • Dense vector retrieval
  • RAG pipeline construction
  • Semantic search infrastructure

๐Ÿ› ๏ธ Tech Stack Galaxy

Languages

AI Systems & Frameworks

Systems & Tooling


๐Ÿ“Š GitHub Analytics








๐Ÿ† Trophies & Honors


๐Ÿฅ‡ National First Prize

Global Campus AI Algorithm Challenge

๐Ÿฅˆ International Silver Medal

iGEM


๐Ÿ Contribution Snake

github contribution grid snake animation

๐ŸŒŒ Current Orbit

MLSys Full Stack        โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘   95%
LLM Inference Serving   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘   90%
CUDA Kernel Design      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘   80%
DL Compiler             โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   75%
RISC-V Systems          โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   75%
Agent Infrastructure    โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   70%

Pinned Loading

  1. Kong-Debugger Kong-Debugger Public

    โ€œ็ฉบโ€--่ฐƒ่ฏ•ๅ™จ(Kong Debugger)๏ผŒไธ€ไธช็”จrust่ฏญ่จ€้‡ๅ†™gdb(GNU Debugger)็š„่ฝป้‡็บง้กน็›ฎ๏ผŒๆณจ้‡ๅ†…ๅญ˜ๅฎ‰ๅ…จใ€ๅนถๅ‘ๅฎ‰ๅ…จๅ’Œๆ€ง่ƒฝ๏ผŒๆœ‰ai่ต‹่ƒฝ

    Rust 1

  2. Fin-RAG Fin-RAG Public

    ไธ€ไธชๅŸบไบŽๆททๅˆๅŒ็ดขๅผ•ๆž„ๅปบๆœฌๅœฐ็Ÿฅ่ฏ†ๅบ“็š„RAGๆŠ€ๆœฏ้‡‘่ž้—ฎ็ญ”็ณป็ปŸ

    Python 2

  3. Wechat-Read-MCP-in-Rust Wechat-Read-MCP-in-Rust Public

    ไธ€ไธช็”จrustๅ†™็š„ๅพฎไฟกๅ…ฌไผ—ๅทๆŠ“ๅ–MCP๏ผŒๅฎž็Žฐไบ†ๆต่งˆๅ™จๆŠ“ๅ–็š„ๅ็ˆฌ็ป•่ฟ‡ๆœบๅˆถ

    Rust 12 3

  4. yzfly/Awesome-MCP-ZH yzfly/Awesome-MCP-ZH Public

    MCP ่ต„ๆบ็ฒพ้€‰๏ผŒ MCPๆŒ‡ๅ—๏ผŒClaude MCP๏ผŒMCP Servers, MCP Clients

    7.2k 574

  5. Fused-Kernel-for-Paged-attention Fused-Kernel-for-Paged-attention Public

    ้ขๅ‘ๅคงๆจกๅž‹้•ฟไธŠไธ‹ๆ–‡่งฃ็ ๅœบๆ™ฏ๏ผŒๅฎž็Žฐๅนถๅˆ†ๆž paged KV cache ็š„ block-gather CUDA kernel๏ผŒ้ชŒ่ฏ fused attention ๅฏนๅ‡ๅฐ‘ไธญ้—ดๆ˜พๅญ˜ๆต้‡ๅ’Œๆๅ‡ decode ๅžๅ็š„ๆ•ˆๆžœใ€‚

    Python

  6. SJTU-IPADS/SkVM SJTU-IPADS/SkVM Public

    The Language Virtual Machine for Agent Skills

    TypeScript 489 46