doc: prefill/decode performance technique survey (June 2026) by hvasconcelos · Pull Request #43 · hvasconcelos/libmlxforge

hvasconcelos · 2026-06-12T19:43:09Z

Survey of 2024-2026 inference-performance techniques applicable to the
engine, prioritized by Apple-Silicon evidence, effort, and compatibility
with the exactness invariants. Covers wired-memory limits, sampler
compilation, n-gram and MTP speculative decoding, token-budget chunked
prefill, packed prefill, fused Metal kernels, and paged attention, plus
MLX framework status (v0.31.2 is current; no fused quantized-KV SDPA
upstream).

https://claude.ai/code/session_0119yHPn3SDzSACP7Cy4V2kM

Survey of 2024-2026 inference-performance techniques applicable to the engine, prioritized by Apple-Silicon evidence, effort, and compatibility with the exactness invariants. Covers wired-memory limits, sampler compilation, n-gram and MTP speculative decoding, token-budget chunked prefill, packed prefill, fused Metal kernels, and paged attention, plus MLX framework status (v0.31.2 is current; no fused quantized-KV SDPA upstream). https://claude.ai/code/session_0119yHPn3SDzSACP7Cy4V2kM

hvasconcelos closed this Jun 13, 2026

hvasconcelos deleted the claude/mlx-forge-performance-research-zcml6p branch June 13, 2026 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: prefill/decode performance technique survey (June 2026)#43

doc: prefill/decode performance technique survey (June 2026)#43
hvasconcelos wants to merge 1 commit into
masterfrom
claude/mlx-forge-performance-research-zcml6p

hvasconcelos commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hvasconcelos commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants