fix: tolerate no_compactable_groups skip in strict compaction mode by yjl-001 · Pull Request #16 · WeianMao/triattention

yjl-001 · 2026-06-01T02:42:56Z

Problem

When running vLLM server with TriAttention runtime enabled, the server crashes after ~2000 decode steps with:

RuntimeError: TRIATTN_FATAL_TRITON_SCORING_REQUIRED:unexpected_skip:req=...:step=2129:reason=no_compactable_groups

Root Cause

vLLM V1's async scheduling can cause the scheduler's estimated KV cache length to drift from the worker's actual block allocation state. When the scheduler signals compression but the worker finds all KV cache groups have empty block IDs or zero effective tokens, run_group_compaction_pipeline returns no_compactable_groups. This benign race condition was treated as a fatal error because it wasn't in the allowed skip reasons set.

The issue is in triattention/vllm/runtime/runner.py line 53:

self._strict_no_downgrade = bool(self.config.enable_experimental_kv_compaction)

When strict_no_downgrade is True (default), any skip reason not in _allowed_strict_skip_reasons raises a fatal RuntimeError.

Changes

runner.py: Add "no_compactable_groups" to _allowed_strict_skip_reasons so the condition is treated as a benign skip rather than a fatal error
state.py: Set last_compression_step in mark_compression_skipped() to enable the existing batch_queue_dedup guard, preventing tight retry loops when a skip occurs on consecutive steps

Verification

The error was observed on a production vLLM 0.21.0 deployment with Qwen3-8B model, TRIATTN_RUNTIME_KV_BUDGET=2048, handling 8 concurrent chat requests.

🤖 Generated with Claude Code

…n loop When vLLM V1 async scheduling causes the scheduler's KV estimate to lag behind the worker's actual block state, run_group_compaction_pipeline may find no compactable groups. Previously this was treated as a fatal error in strict mode, crashing the server. Changes: - Add 'no_compactable_groups' to allowed_strict_skip_reasons so the condition is treated as a benign skip rather than a fatal error - Set last_compression_step in mark_compression_skipped so the existing batch_queue_dedup guard can prevent tight retry loops Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: tolerate no_compactable_groups skip in strict compaction mode#16

fix: tolerate no_compactable_groups skip in strict compaction mode#16
yjl-001 wants to merge 1 commit into
WeianMao:mainfrom
yjl-001:fix/allow-no-compactable-groups-skip

yjl-001 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yjl-001 commented Jun 1, 2026

Problem

Root Cause

Changes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant