EncodeMistralText: move tokens to text_encoder.device before forward by genno-whittlery · Pull Request #51 · Nerogar/mgds

genno-whittlery · 2026-05-11T13:18:47Z

Summary

EncodeMistralText.get_item now moves tokens and attention_mask to self.text_encoder.device before the forward call. One-line fix (counting the device move + the existing-mask guard).

Why

Upstream pipeline stages land tokens on cuda:0 by default. When the Mistral text encoder is hosted on CPU — for example, on model-parallel setups where a single 32 GB consumer card can't fit Mistral-3-Small-24B's ~48 GB bf16 footprint — the existing forward call dies with:

RuntimeError: Expected all tensors to be on the same device, but found at
least two devices, cuda:0 and cpu! (when checking argument for argument index
in method wrapper_CUDA__index_select)

.to(device) is a no-op when the tensor is already on that device, so the single-GPU happy-path is unaffected.

Where this came up

Building a dual-GPU model-parallel FLUX.2 LoRA training path for OneTrainer on 2× RTX 5090 — the Mistral TE has to stay on CPU because neither GPU can fit half the FLUX.2 transformer and the TE. With the TE on CPU, EncodeMistralText.get_item is the last device-mismatch bug before the forward succeeds. Full context: https://github.com/genno-whittlery/flux2-dual-gpu-lora (OneTrainer port section); paired with Nerogar/OneTrainer#1450.

Test plan

OneTrainer dual-GPU FLUX.2 LoRA run on 2× RTX 5090 with Mistral TE on CPU — forward succeeds, full 32-step training epoch completes
Single-GPU OneTrainer FLUX.2 run unchanged (.to(te_device) is no-op when devices match)

Upstream pipeline stages land tokens (and the attention mask) on cuda:0 by default. When the Mistral text encoder is hosted on CPU -- for model-parallel setups where a single 32 GB consumer card can't fit Mistral-24B's ~48 GB bf16 -- the existing forward call hits a device-mismatch error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) Move tokens and tokens_attention_mask to self.text_encoder.device just before the forward. .to(device) is a no-op when the tensors are already on that device, so the existing single-GPU happy-path is unaffected. Surfaced while bringing up dual-GPU FLUX.2 LoRA training in OneTrainer on 2x RTX 5090, where the Mistral TE has to stay on CPU and the transformer gets split across the two GPUs. Documented at https://github.com/genno-whittlery/flux2-dual-gpu-lora (OneTrainer port section).

Filed PR Nerogar/OneTrainer#1450 was closed by a maintainer who flagged that the body referenced issue #588 as "open since 2024-11", but the issue had actually been closed 2025-09-24. Closure happened on that factual error -- the dual-GPU implementation and its validation evidence weren't reviewed. The branch (genno-whittlery/OneTrainer:dual-gpu-flux2) is unchanged and still the canonical place to consume the OneTrainer port. Same for the mgds companion fix (genno-whittlery/mgds:te-device-fix) -- that PR (Nerogar/mgds#51) remains open separately. Dropped the "closes #588" claim from the README description.

dxqb · 2026-05-15T07:29:21Z

for this error to be triggered, that would require that the tokenizer is on another device than the text encoder. I don't see a usecase for that - why isn't the tokenizer also on CPU?

is Mistral fast enough on CPU?

genno-whittlery mentioned this pull request May 11, 2026

Add dual-GPU model-parallel path for FLUX.2 LoRA training (closes #588) Nerogar/OneTrainer#1450

Closed

4 tasks

genno-whittlery force-pushed the te-device-fix branch from c2c0562 to e03c769 Compare May 11, 2026 16:39

dxqb added the question Further information is requested label May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EncodeMistralText: move tokens to text_encoder.device before forward#51

EncodeMistralText: move tokens to text_encoder.device before forward#51
genno-whittlery wants to merge 1 commit into
Nerogar:masterfrom
genno-whittlery:te-device-fix

genno-whittlery commented May 11, 2026 •

edited

Loading

Uh oh!

dxqb commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

genno-whittlery commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Where this came up

Test plan

Uh oh!

dxqb commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

genno-whittlery commented May 11, 2026 •

edited

Loading