Skip to content

AML-4269: Add Location Support to Qwen3.5#10

Open
anvdn wants to merge 2 commits into
v0.5.11+hsfrom
v0.5.11+hs3
Open

AML-4269: Add Location Support to Qwen3.5#10
anvdn wants to merge 2 commits into
v0.5.11+hsfrom
v0.5.11+hs3

Conversation

@anvdn
Copy link
Copy Markdown

@anvdn anvdn commented Jun 3, 2026

Attention Heatmap Support for Qwen3.5 + refactor

Summary

Extends attention heatmap capture to Qwen3.5 (hybrid linear/full-attention model) and refactors the existing heatmap code so any model can opt in with minimal boilerplate.

Changes

  • New AttentionHeatmapQueryRecorderMixin (hs/attention_heatmap.py): centralizes query-buffer allocation and per-layer query recording. Qwen2-VL, Qwen3-VL, and Qwen3.5 now share this implementation instead of duplicating it.
  • Qwen3.5 integration (models/qwen3_5.py): full-attention decoder layers now return q; linear-attention layers are skipped. The model registers itself via the mixin.
  • Layer selection API change (server_args.py): replaced the attention_heatmap_layer_start / _layer_end range with a flexible attention_heatmap_layer_ids: list[int]. This is required for hybrid models where capturable layers aren't contiguous.
  • Scheduler hybrid-pool support (scheduler_output_processor_mixin.py): when the KV cache is a HybridLinearKVPool, key tensors are looked up via full_attention_layer_id_mapping instead of raw layer id. The model now owns the canonical list of recorded layer ids.
  • Version bump to 0.5.11+hs3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant