[bugfix] fix process_weights_after_loading & non_thinking_prefix by hjh0119 · Pull Request #9519 · modelscope/ms-swift

hjh0119 · 2026-06-09T04:02:28Z

The current weight synchronization pipeline may invoke finish_vllm_weight_reload multiple times within a single complete weight synchronization cycle, which is non-idempotent.
Missing non_thinking_prefix on trainer side when enable_thinking=False

gemini-code-assist

Code Review

This pull request refactors the vLLM weight reloading process to run process_weights_after_loading once after all weight buckets or groups have been loaded, rather than after each individual load. It introduces a new /process_weights_after_loading/ endpoint to trigger this on all workers and updates finish_vllm_weight_reload to delegate to vLLM's built-in processing when model_config and target_device are available. The review feedback suggests improving performance by using asyncio.gather to concurrently await worker responses instead of a sequential loop, and simplifying a ternary hasattr check to a more Pythonic getattr call.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

hjh0119 · 2026-06-09T07:00:41Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the weight reloading mechanism for vLLM. Instead of calling weight reloading immediately after each weight chunk or bucket load inside try-finally blocks, it defers the call to process_weights_after_loading until after all weights have been loaded. Additionally, finish_vllm_weight_reload is updated to delegate to vLLM's built-in process_weights_after_loading when model_config and target_device are available, falling back to the FusedMoE-only path otherwise. New endpoints and client methods are introduced to coordinate this process across workers in server mode. I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

… fix-sync-0609

hjh0119 · 2026-06-09T12:04:20Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for injecting a non-thinking prefix during token-in-token-out re-encoding in GKD and GRPO trainers when thinking is disabled, and refactors vLLM weight reloading to run once after all weight buckets are loaded. It also updates the padding of routed experts in GRPO to expand the last entry. However, several critical issues were identified: a NameError in swift/ray/megatron/gkd_trainer.py where non_thinking_prefix_ids is undefined, a TypeError in swift/rlhf_trainers/gkd_trainer.py due to an unsupported keyword argument in build_teacher_infer_request, and shape mismatch bugs in both grpo_trainer.py and megatron_worker.py where left-padding incorrectly applies both left and right padding simultaneously.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

I am having trouble creating individual review comments. Click here to see my feedback.

swift/ray/megatron/gkd_trainer.py (288-293)

NameError at Runtime

non_thinking_prefix_ids is referenced here but is never defined within the _fetch_teacher_from_replicas method. This will raise a NameError at runtime when OPSD is enabled with teacher replicas.

To fix this, define non_thinking_prefix_ids by calling get_non_thinking_prefix_ids(self.template) before using it.

                opsd_item = build_opsd_teacher_data([item])[0]
                if opsd_item.get('response_token_ids'):
                    non_thinking_prefix_ids = get_non_thinking_prefix_ids(self.template)
                    opsd_item['messages'] = replace_assistant_response_with_ids(
                        copy.deepcopy(opsd_item['messages']),
                        opsd_item['response_token_ids'],
                        non_thinking_prefix_ids=non_thinking_prefix_ids)

swift/rlhf_trainers/gkd_trainer.py (623-625)

TypeError at Runtime

build_teacher_infer_request is defined in swift/rlhf_trainers/utils.py as def build_teacher_infer_request(data: Dict) -> 'RolloutInferRequest': and does not accept the non_thinking_prefix_ids keyword argument. Passing it here will cause a TypeError at runtime.

To fix this, you should update the signature of build_teacher_infer_request in swift/rlhf_trainers/utils.py to accept non_thinking_prefix_ids and pass it to replace_assistant_response_with_ids inside it.

swift/megatron/trainers/grpo_trainer.py (370-377)

Shape Mismatch Bug in Left Padding

When padding_right is False (left padding), the code prepends left_pad (of length padding_len) to padded_tail. However, padded_tail already has last_entry (of length padding_len) appended to its right. This results in a total sequence length of padding_to + padding_len instead of the expected padding_to, which will cause a shape mismatch error downstream.

To fix this, avoid appending last_entry to the right when left-padding. The padding should only be prepended to the original routed_experts tensor.

                    if padding_right:
                        last_entry = routed_experts[-1:].expand(padding_len, -1, -1)
                        padding_routed_experts = torch.cat([routed_experts, last_entry], dim=0)
                    else:
                        left_pad = torch.zeros(padding_len, *routed_experts.shape[1:], dtype=routed_experts.dtype)
                        padding_routed_experts = torch.cat([left_pad, routed_experts], dim=0)

swift/ray/megatron/megatron_worker.py (600-607)

Shape Mismatch Bug in Left Padding

When padding_right is False (left padding), the code prepends left_pad (of length pad_len) to padded. However, padded already has last_entry (of length pad_len) appended to its right. This results in a total sequence length of target_len + pad_len instead of the expected target_len, causing a shape mismatch error.

To fix this, only prepend left_pad to the original routed tensor when left-padding, without appending last_entry to the right.

        if padding_right:
            last_entry = routed[-1:].expand(pad_len, *routed.shape[1:])
            return torch.cat([routed, last_entry], dim=0)
        else:
            left_pad = torch.zeros(pad_len, *routed.shape[1:], dtype=routed.dtype)
            return torch.cat([left_pad, routed], dim=0)

hjh0119 · 2026-06-09T12:49:33Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for injecting non-thinking prefixes during token-in-token-out re-encoding when thinking is disabled, and refactors the vLLM weight reloading process to trigger process_weights_after_loading once after all weight buckets are loaded. The review feedback highlights several improvement opportunities: ensuring finish_vllm_weight_reload falls back to the FusedMoE-only path if the built-in vLLM method fails or lacks configuration, adding defensive checks when accessing template attributes in get_non_thinking_prefix_ids, and replacing the fragile next(llm_model.parameters()).device call with more robust device references (self.accelerator.device or self.device).

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-09T12:51:25Z

+def get_non_thinking_prefix_ids(template) -> Optional[List[int]]:
+    """Return the token ids of the non-thinking prefix (e.g. '<think>\n\n</think>\n\n').
+
+    When enable_thinking=False, the rollout engine injects this prefix into the prompt, so it
+    is part of the forwarded sequence (and routed_experts), but the generated response_token_ids
+    do NOT contain it. Token-in-token-out re-encoding must re-add it (masked out of the loss) to
+    keep the trainer/teacher sequence aligned with the rollout sequence. Returns None when the
+    prefix is not applicable (thinking enabled, or template has no non_thinking_prefix).
+    """
+    non_thinking_prefix = template.template_meta.non_thinking_prefix
+    if template.enable_thinking is False and non_thinking_prefix:
+        return template.tokenizer.encode(non_thinking_prefix, add_special_tokens=False)
+    return None


To prevent potential AttributeError or TypeError if template is None or if certain attributes like template_meta or enable_thinking are missing, use defensive getattr checks and early returns.

def get_non_thinking_prefix_ids(template) -> Optional[List[int]]: """Return the token ids of the non-thinking prefix (e.g. '<think>\\n\\n</think>\\n\\n'). When enable_thinking=False, the rollout engine injects this prefix into the prompt, so it is part of the forwarded sequence (and routed_experts), but the generated response_token_ids do NOT contain it. Token-in-token-out re-encoding must re-add it (masked out of the loss) to keep the trainer/teacher sequence aligned with the rollout sequence. Returns None when the prefix is not applicable (thinking enabled, or template has no non_thinking_prefix). """ if template is None: return None template_meta = getattr(template, "template_meta", None) if template_meta is None: return None non_thinking_prefix = getattr(template_meta, "non_thinking_prefix", None) if getattr(template, "enable_thinking", None) is False and non_thinking_prefix: return template.tokenizer.encode(non_thinking_prefix, add_special_tokens=False) return None

fix

4968cf5

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread swift/pipelines/infer/rollout.py Outdated

Comment thread swift/rlhf_trainers/rollout_mixin.py Outdated

gemini

c06f819

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

hjh0119 added 2 commits June 9, 2026 19:54

fix

7a0bd0e

Merge branch 'fix-sync-0609' of https://github.com/hjh0119/swift into…

8249267

… fix-sync-0609

hjh0119 changed the title ~~[bugfix] fix process_weights_after_loading~~ [bugfix] fix process_weights_after_loading & non_thinking_prefix Jun 9, 2026

revert

190d7b7

revert

1599573

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

hjh0119 added 4 commits June 9, 2026 20:21

fix

432d9c2

clean over-protect getattr

24590cb

clean

7cb6a1f

clean

c9a4674

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

fix

a2eb6b7

Jintao-Huang approved these changes Jun 9, 2026

View reviewed changes

hjh0119 merged commit aa02800 into modelscope:main Jun 9, 2026
3 checks passed

hjh0119 deleted the fix-sync-0609 branch June 9, 2026 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] fix process_weights_after_loading & non_thinking_prefix#9519

[bugfix] fix process_weights_after_loading & non_thinking_prefix#9519
hjh0119 merged 11 commits into
modelscope:mainfrom
hjh0119:fix-sync-0609

hjh0119 commented Jun 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

hjh0119 commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

hjh0119 commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hjh0119 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

hjh0119 commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

swift/ray/megatron/gkd_trainer.py (288-293)

NameError at Runtime

swift/rlhf_trainers/gkd_trainer.py (623-625)

TypeError at Runtime

swift/megatron/trainers/grpo_trainer.py (370-377)

Shape Mismatch Bug in Left Padding

swift/ray/megatron/megatron_worker.py (600-607)

Shape Mismatch Bug in Left Padding

Uh oh!

hjh0119 commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hjh0119 commented Jun 9, 2026 •

edited

Loading