A deterministic PyTorch autograd verification trap for catching silent KV-cache routing and block-alignment failures in vLLM and SGLang serving infrastructure.
-
Updated
Jun 7, 2026 - Python
A deterministic PyTorch autograd verification trap for catching silent KV-cache routing and block-alignment failures in vLLM and SGLang serving infrastructure.
Systematic VLA training optimization on 2× RTX 3090. WebDataset + FlashAttention-2 + FSDP → 3.3× throughput, 26% VRAM reduction. Profiler traces and W&B report linked. Reproducible in one command.
Regression-safe evaluation framework for RAG systems with faithfulness and coverage-based deployment gating.
Add a description, image, and links to the ml-infra topic page so that developers can more easily learn about it.
To associate your repository with the ml-infra topic, visit your repo's landing page and select "manage topics."