feat(sc): uSystolic stoc_len halving + per-row QK granularity#4
Open
heroarmor wants to merge 1 commit into
Open
feat(sc): uSystolic stoc_len halving + per-row QK granularity#4heroarmor wants to merge 1 commit into
heroarmor wants to merge 1 commit into
Conversation
- SCController.halve flag (--sc_halve / SC_HALVE env): run bipolar SC matmuls
at stoc_len/2 via halve_bipolar_stoc_len, no accuracy loss; no-op for
non-bipolar modes and the noise surrogate
- SCAttention/SCMlp route through partial(sc_matmul, halve_bipolar_stoc_len=True)
when halve is set
- --sc_qk_granularity {per_head,per_row}: per-row QK scaling to match the AV path
- kernel_launch_counter: point DiT ckpt at the turbo path
- bump scmp_kernels submodule to a576b83 (on upstream/main)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
@Allenjin123 PTAL when you have a chance — same uSystolic halving work as the llama/spec-decode PRs, on the Q-DiT SC path. (Couldn't add you via the reviewer field from my fork.) |
Contributor
Author
|
GIVE ME THE WRITE ACCESS PLS |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the uSystolic/HUB sign-magnitude stoc_len halving trick and a per-row QK granularity option to the Q-DiT SC integration.
Changes
SCController.halveflag, surfaced as--sc_halve(andSC_HALVE=1env). When set, bipolar SC matmuls run atstoc_len/2viahalve_bipolar_stoc_len=True— no accuracy loss. No-op for non-bipolar modes and the noise surrogate.SCAttention/SCMlproute throughpartial(sc_matmul, halve_bipolar_stoc_len=True)when halve is enabled.--sc_qk_granularity {per_head,per_row}: per-row QK scaling to match the AV path (default staysper_head).tools/kernel_launch_counter.py: point the DiT ckpt at the turbo path.scmp_kernelssubmodule toa576b83(already on upstream/main).Debug/scratch tools were intentionally left out of this PR.
🤖 Generated with Claude Code