Skip to content

doc: prefill/decode performance technique survey (June 2026)#43

Closed
hvasconcelos wants to merge 1 commit into
masterfrom
claude/mlx-forge-performance-research-zcml6p
Closed

doc: prefill/decode performance technique survey (June 2026)#43
hvasconcelos wants to merge 1 commit into
masterfrom
claude/mlx-forge-performance-research-zcml6p

Conversation

@hvasconcelos

Copy link
Copy Markdown
Owner

Survey of 2024-2026 inference-performance techniques applicable to the
engine, prioritized by Apple-Silicon evidence, effort, and compatibility
with the exactness invariants. Covers wired-memory limits, sampler
compilation, n-gram and MTP speculative decoding, token-budget chunked
prefill, packed prefill, fused Metal kernels, and paged attention, plus
MLX framework status (v0.31.2 is current; no fused quantized-KV SDPA
upstream).

https://claude.ai/code/session_0119yHPn3SDzSACP7Cy4V2kM

Survey of 2024-2026 inference-performance techniques applicable to the
engine, prioritized by Apple-Silicon evidence, effort, and compatibility
with the exactness invariants. Covers wired-memory limits, sampler
compilation, n-gram and MTP speculative decoding, token-budget chunked
prefill, packed prefill, fused Metal kernels, and paged attention, plus
MLX framework status (v0.31.2 is current; no fused quantized-KV SDPA
upstream).

https://claude.ai/code/session_0119yHPn3SDzSACP7Cy4V2kM
@hvasconcelos hvasconcelos deleted the claude/mlx-forge-performance-research-zcml6p branch June 13, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants