prefill-decode

Here are 4 public repositories matching this topic...

qujing226 / mini-llm-serve

A Go-based LLM serving control plane that models token-aware scheduling, request lifecycle, streaming metrics, prefix cache, and KV block pressure around mock inference backends.

scheduler distributed-computing inference golang-server mlsys ai-infra dynamic-batching llm-serving llm-inference connectrpc prefix-cache prefill-decode

Updated Jun 10, 2026
Go

Zishan-Shao / decodeshare

Star

🏆[ICML 2026 Spotlight] Official implementation of "DecodeShare: Tracing the Shared Subspace of LLM Decode-Time Decisions"

protocol large large-language-models mechanistic-interpretability activation-steering prefill-decode

Updated May 28, 2026
Python

FUJIANUT / CloudSimLLM

Star

A datacenter-scale simulation framework for energy- and SLO-aware LLM inference serving — non-invasive extension of CloudSim Plus

simulation gpu cloud-computing splitwise autoscaling cloudsim datacenter kv-cache large-language-models llm carbon-aware prefill-decode

Updated May 4, 2026
Java

chenxuniu / awesome-disaggregated-llm-serving

Star

A curated map of AFD, PD disaggregation, KV-cache systems, MoE serving, and re-aggregation baselines for LLM serving.

moe awesome-list disaggregation afd mixture-of-experts kv-cache llm-serving inference-systems prefill-decode

Updated May 15, 2026
Python

Improve this page

Add a description, image, and links to the prefill-decode topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the prefill-decode topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefill-decode

Here are 4 public repositories matching this topic...

qujing226 / mini-llm-serve

Zishan-Shao / decodeshare

FUJIANUT / CloudSimLLM

chenxuniu / awesome-disaggregated-llm-serving

Improve this page

Add this topic to your repo