[omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter by yxs · Pull Request #1 · yxs/miles

yxs · 2026-06-14T17:07:33Z

Summary

RL loop wiring the Qwen3-Omni-30B-A3B thinker (text MoE) into miles GRPO, against sglang-omni's
rollout /generate: sgl-project/sglang-omni#785
and distributed weight-sync: sgl-project/sglang-omni#784
endpoints. Text-only thinker path.

Components

tools/extract_qwen3_omni_thinker.py — extracts the standalone Qwen3-MoE text backbone from the composite omni checkpoint (drops audio/visual, model_type=qwen3_moe) for the existing bridge.
.../megatron_to_hf/qwen3omni_moe.py — Megatron→HF converter, body.-prefixed (the namespace [RL] distributed weight-sync sgl-project/sglang-omni#784 demuxes on); --model-name qwen3omni_moe.
miles/rollout/generate_hub/omni_thinker.py — thin rollout adapter: reuses miles' payload/parse, adds the omni fields (output_modalities=["text"], return_omni_rollout=False, repetition_penalty=1.0).
Example GRPO script + Megatron model args + fast tests.

Notes

The omni /generate emits temp-1 (pre-temperature) full-vocab logprobs — measured directly: with greedy decoding the returned logprobs are identical at temperature 1.0 and 2.0. miles' train recompute divides logits by rollout_temperature, so the two agree only at temp=1; the adapter asserts rollout_temperature == 1.0. repetition_penalty is forced to 1.0 (the trainer recompute can't replay a repetition penalty).
Guards: the omni path asserts on MoE/indexer replay and image inputs (server forward-declares both — fail loud, not silent).
Off-policy until [RL] distributed weight-sync sgl-project/sglang-omni#784: point --sglang-router-ip/port at a standalone omni server; TIS + --get-mismatch-metrics absorb the gap.

Verification

tests/fast/test_qwen3_omni_thinker.py — 13 passed (ion-b200 miles image): extraction, body.* converter, adapter contract + guards.
Full cross-server E2E — real omni_thinker.generate against a live [RL] add Miles-compatible /generate rollout endpoint sgl-project/sglang-omni#785 omni server (Qwen3-Omni-30B-A3B on B200): input_ids rollout returns text, output_token_logprobs length == completion_tokens, cached_tokens int (no crash), finish_reason→status, weight_version echoed; logprob convention measured (temp-1).

…>miles loop) First omni<->miles RL loop on the Qwen3-Omni-30B-A3B thinker (text MoE).

yxs force-pushed the yxs/miles-omni-thinker-grpo branch 3 times, most recently from 4add1c5 to 4629316 Compare June 17, 2026 10:31

[omni] Qwen3-Omni thinker RL bridge + thin rollout glue (first omni<-…

6c60ccd

…>miles loop) First omni<->miles RL loop on the Qwen3-Omni-30B-A3B thinker (text MoE).

yxs force-pushed the yxs/miles-omni-thinker-grpo branch from 4629316 to 6c60ccd Compare June 17, 2026 10:38

[omni] enforce rollout_temperature==1 (omni emits temp-1 logprobs)

35735df

yxs changed the title ~~[omni] Qwen3-Omni thinker RL bridge + thin rollout glue~~ [omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter Jun 17, 2026

yxs requested review from Hayden727 and JingwenGu0829 June 17, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter#1

[omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter#1
yxs wants to merge 2 commits into
mainfrom
yxs/miles-omni-thinker-grpo

yxs commented Jun 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yxs commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Components

Notes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yxs commented Jun 14, 2026 •

edited

Loading