Skip to content

fix: tolerate no_compactable_groups skip in strict compaction mode#16

Open
yjl-001 wants to merge 1 commit into
WeianMao:mainfrom
yjl-001:fix/allow-no-compactable-groups-skip
Open

fix: tolerate no_compactable_groups skip in strict compaction mode#16
yjl-001 wants to merge 1 commit into
WeianMao:mainfrom
yjl-001:fix/allow-no-compactable-groups-skip

Conversation

@yjl-001

@yjl-001 yjl-001 commented Jun 1, 2026

Copy link
Copy Markdown

Problem

When running vLLM server with TriAttention runtime enabled, the server crashes after ~2000 decode steps with:

RuntimeError: TRIATTN_FATAL_TRITON_SCORING_REQUIRED:unexpected_skip:req=...:step=2129:reason=no_compactable_groups

Root Cause

vLLM V1's async scheduling can cause the scheduler's estimated KV cache length to drift from the worker's actual block allocation state. When the scheduler signals compression but the worker finds all KV cache groups have empty block IDs or zero effective tokens, run_group_compaction_pipeline returns no_compactable_groups. This benign race condition was treated as a fatal error because it wasn't in the allowed skip reasons set.

The issue is in triattention/vllm/runtime/runner.py line 53:

self._strict_no_downgrade = bool(self.config.enable_experimental_kv_compaction)

When strict_no_downgrade is True (default), any skip reason not in _allowed_strict_skip_reasons raises a fatal RuntimeError.

Changes

  1. runner.py: Add "no_compactable_groups" to _allowed_strict_skip_reasons so the condition is treated as a benign skip rather than a fatal error

  2. state.py: Set last_compression_step in mark_compression_skipped() to enable the existing batch_queue_dedup guard, preventing tight retry loops when a skip occurs on consecutive steps

Verification

The error was observed on a production vLLM 0.21.0 deployment with Qwen3-8B model, TRIATTN_RUNTIME_KV_BUDGET=2048, handling 8 concurrent chat requests.

🤖 Generated with Claude Code

…n loop

When vLLM V1 async scheduling causes the scheduler's KV estimate to
lag behind the worker's actual block state, run_group_compaction_pipeline
may find no compactable groups. Previously this was treated as a fatal
error in strict mode, crashing the server.

Changes:
- Add 'no_compactable_groups' to allowed_strict_skip_reasons so the
  condition is treated as a benign skip rather than a fatal error
- Set last_compression_step in mark_compression_skipped so the
  existing batch_queue_dedup guard can prevent tight retry loops

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant