A Go-based LLM serving control plane that models token-aware scheduling, request lifecycle, streaming metrics, prefix cache, and KV block pressure around mock inference backends.
-
Updated
Jun 10, 2026 - Go
A Go-based LLM serving control plane that models token-aware scheduling, request lifecycle, streaming metrics, prefix cache, and KV block pressure around mock inference backends.
🏆[ICML 2026 Spotlight] Official implementation of "DecodeShare: Tracing the Shared Subspace of LLM Decode-Time Decisions"
A datacenter-scale simulation framework for energy- and SLO-aware LLM inference serving — non-invasive extension of CloudSim Plus
A curated map of AFD, PD disaggregation, KV-cache systems, MoE serving, and re-aggregation baselines for LLM serving.
Add a description, image, and links to the prefill-decode topic page so that developers can more easily learn about it.
To associate your repository with the prefill-decode topic, visit your repo's landing page and select "manage topics."