Qwen3 embed#17
Conversation
* Implement the new Google model * Fix model
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 647657e777
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Pad or truncate embeds to seq_len | ||
| let embed_len = embeds.dim(0)?; | ||
| if embed_len >= seq_len { | ||
| return embeds.narrow(0, 0, seq_len)?.unsqueeze(0); |
There was a problem hiding this comment.
Place modality embeddings at masked token indices
broadcast_embed_to_mask currently pads/truncates embeds from position 0 and returns it as a full sequence, but forward_multimodal later applies it with a sparse token mask (mask_expanded * image_embeds_broadcast). When image/audio placeholder tokens are not a prefix starting at index 0 (e.g., BOS/text before modality tokens), the selected positions pull zeros or the wrong embedding rows, so modality tokens are replaced incorrectly.
Useful? React with 👍 / 👎.
| return padded.unsqueeze(0); | ||
| } | ||
|
|
||
| Ok(zeros) |
There was a problem hiding this comment.
Support batched multimodal embedding broadcast
For b_sz > 1, broadcast_embed_to_mask unconditionally returns an all-zero tensor, which means every masked image/audio token in batched requests gets zeroed instead of receiving encoder features. This effectively drops all multimodal information whenever batch size is greater than 1.
Useful? React with 👍 / 👎.
No description provided.