Skip to content

transformers for quant and full with unified example for smol lm3#2

Closed
DrJesseGlass wants to merge 9 commits into
downstreamfrom
examples/add/smolLM3
Closed

transformers for quant and full with unified example for smol lm3#2
DrJesseGlass wants to merge 9 commits into
downstreamfrom
examples/add/smolLM3

Conversation

@DrJesseGlass

Copy link
Copy Markdown
Owner

No description provided.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +345 to +348
let scale = 1.0 / (self.head_dim as f64).sqrt();
// Make q contiguous before matmul to avoid stride mismatch
let q = q.contiguous()?;
let attn_weights = (q.matmul(&k.t()?)? * scale)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use proper transpose for attention matmul

The quantized SmolLM3 attention builds Q/K/V as 4D tensors (B, num_heads, seq_len, head_dim) but computes scores with q.matmul(&k.t()?). Tensor::t() only handles 2D tensors, so with 4D k this call will fail at runtime and the quantized model cannot run. The full-precision path uses k.transpose(2, 3)? instead, which is the needed permutation to produce (B, H, L, L) attention scores.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// From candle's tensor.rs
pub fn t(&self) -> Result {
let rank = self.rank();
if rank < 2 { /* error */ }
else { self.transpose(rank - 2, rank - 1) } // Same as transpose(2,3) for 4D!
}

So you are wrong. LOL

@DrJesseGlass DrJesseGlass changed the base branch from downstream-transformers to downstream December 30, 2025 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant