add dot.rs; rm 4 dupe dot impl; migrate dot_32 call site; lean dispat… by DrJesseGlass · Pull Request #29 · DrJesseGlass/candle

DrJesseGlass · 2026-05-06T12:34:04Z

…ch standard cpu_flash

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eaa72f15a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T12:38:01Z

+            s += q[i] * k[i] as f32
+                + q[i + 1] * k[i + 1] as f32
+                + q[i + 2] * k[i + 2] as f32
+                + q[i + 3] * k[i + 3] as f32;


Preserve f64 precision in CPU flash-attn dot path

The new DotF32 for f64 implementation downcasts every k element to f32 before multiply/accumulate, which is a precision regression from the previous vec_dot::<T>(q_row, k_row).to_f32() behavior that accumulated in f64 and only cast once at the end. This materially changes logits for the DType::F64 CPU flash-attention path (used in candle-transformers/src/models/qwen3.rs), especially on larger head dimensions where rounding error compounds, and effectively removes the numerical benefit of the f64 path.

Useful? React with 👍 / 👎.

DrJesseGlass · 2026-05-06T12:55:36Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2927574de3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T13:02:44Z

+                for t in 0..d {
+                    q_f32[t] = q_row[t].to_f32().unwrap_or(0.0);
+                }


Preserve f64 query precision in generic CPU flash path

The generic causal kernel now downcasts each q_row element to f32 before the dot product (q_f32[t] = q_row[t].to_f32()), so flash_attn::<f64> no longer computes q·k with full f64 query precision. In the previous implementation, vec_dot::<T>(q_row, k_row) kept f64 values through multiplication/accumulation and only converted after the dot. This is a numerical regression for real f64 callers (e.g. the CPU DType::F64 path), and can change attention weights for high-dynamic-range inputs.

Useful? React with 👍 / 👎.

add dot.rs; rm 4 dupe dot impl; migrate dot_32 call site; lean dispat…

eaa72f1

…ch standard cpu_flash

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

softmax helper; impl in 13 locations; dot error compound bug resolved

2927574

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

DrJesseGlass added 3 commits May 6, 2026 09:22

drop f64 path from qwen3 dispatch

76bb0a4

rm 2 line internal comment

52cc1ff

fmt standard and dot

170757a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dot.rs; rm 4 dupe dot impl; migrate dot_32 call site; lean dispat…#29

add dot.rs; rm 4 dupe dot impl; migrate dot_32 call site; lean dispat…#29
DrJesseGlass wants to merge 5 commits into
downstreamfrom
cpu-flash-standardize

DrJesseGlass commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

DrJesseGlass commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DrJesseGlass commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

DrJesseGlass commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant