Skip to content

[omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter#1

Open
yxs wants to merge 2 commits into
mainfrom
yxs/miles-omni-thinker-grpo
Open

[omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter#1
yxs wants to merge 2 commits into
mainfrom
yxs/miles-omni-thinker-grpo

Conversation

@yxs

@yxs yxs commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

RL loop wiring the Qwen3-Omni-30B-A3B thinker (text MoE) into miles GRPO, against sglang-omni's
rollout /generate: sgl-project/sglang-omni#785
and distributed weight-sync: sgl-project/sglang-omni#784
endpoints. Text-only thinker path.

Components

  • tools/extract_qwen3_omni_thinker.py — extracts the standalone Qwen3-MoE text backbone from the composite omni checkpoint (drops audio/visual, model_type=qwen3_moe) for the existing bridge.
  • .../megatron_to_hf/qwen3omni_moe.py — Megatron→HF converter, body.-prefixed (the namespace [RL] distributed weight-sync sgl-project/sglang-omni#784 demuxes on); --model-name qwen3omni_moe.
  • miles/rollout/generate_hub/omni_thinker.py — thin rollout adapter: reuses miles' payload/parse, adds the omni fields (output_modalities=["text"], return_omni_rollout=False, repetition_penalty=1.0).
  • Example GRPO script + Megatron model args + fast tests.

Notes

  • The omni /generate emits temp-1 (pre-temperature) full-vocab logprobs — measured directly: with greedy decoding the returned logprobs are identical at temperature 1.0 and 2.0. miles' train recompute divides logits by rollout_temperature, so the two agree only at temp=1; the adapter asserts rollout_temperature == 1.0. repetition_penalty is forced to 1.0 (the trainer recompute can't replay a repetition penalty).
  • Guards: the omni path asserts on MoE/indexer replay and image inputs (server forward-declares both — fail loud, not silent).
  • Off-policy until [RL] distributed weight-sync sgl-project/sglang-omni#784: point --sglang-router-ip/port at a standalone omni server; TIS + --get-mismatch-metrics absorb the gap.

Verification

  • tests/fast/test_qwen3_omni_thinker.py13 passed (ion-b200 miles image): extraction, body.* converter, adapter contract + guards.
  • Full cross-server E2E — real omni_thinker.generate against a live [RL] add Miles-compatible /generate rollout endpoint sgl-project/sglang-omni#785 omni server (Qwen3-Omni-30B-A3B on B200): input_ids rollout returns text, output_token_logprobs length == completion_tokens, cached_tokens int (no crash), finish_reason→status, weight_version echoed; logprob convention measured (temp-1).

@yxs yxs force-pushed the yxs/miles-omni-thinker-grpo branch 3 times, most recently from 4add1c5 to 4629316 Compare June 17, 2026 10:31
…>miles loop)

First omni<->miles RL loop on the Qwen3-Omni-30B-A3B thinker (text MoE).
@yxs yxs force-pushed the yxs/miles-omni-thinker-grpo branch from 4629316 to 6c60ccd Compare June 17, 2026 10:38
@yxs yxs changed the title [omni] Qwen3-Omni thinker RL bridge + thin rollout glue [omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter Jun 17, 2026
@yxs yxs requested review from Hayden727 and JingwenGu0829 June 17, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant