Skip to content

[bugfix] fix process_weights_after_loading & non_thinking_prefix#9519

Merged
hjh0119 merged 11 commits into
modelscope:mainfrom
hjh0119:fix-sync-0609
Jun 9, 2026
Merged

[bugfix] fix process_weights_after_loading & non_thinking_prefix#9519
hjh0119 merged 11 commits into
modelscope:mainfrom
hjh0119:fix-sync-0609

Conversation

@hjh0119

@hjh0119 hjh0119 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator
  1. The current weight synchronization pipeline may invoke finish_vllm_weight_reload multiple times within a single complete weight synchronization cycle, which is non-idempotent.
  2. Missing non_thinking_prefix on trainer side when enable_thinking=False

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the vLLM weight reloading process to run process_weights_after_loading once after all weight buckets or groups have been loaded, rather than after each individual load. It introduces a new /process_weights_after_loading/ endpoint to trigger this on all workers and updates finish_vllm_weight_reload to delegate to vLLM's built-in processing when model_config and target_device are available. The review feedback suggests improving performance by using asyncio.gather to concurrently await worker responses instead of a sequential loop, and simplifying a ternary hasattr check to a more Pythonic getattr call.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread swift/pipelines/infer/rollout.py Outdated
Comment thread swift/rlhf_trainers/rollout_mixin.py Outdated
@hjh0119

hjh0119 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the weight reloading mechanism for vLLM. Instead of calling weight reloading immediately after each weight chunk or bucket load inside try-finally blocks, it defers the call to process_weights_after_loading until after all weights have been loaded. Additionally, finish_vllm_weight_reload is updated to delegate to vLLM's built-in process_weights_after_loading when model_config and target_device are available, falling back to the FusedMoE-only path otherwise. New endpoints and client methods are introduced to coordinate this process across workers in server mode. I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@hjh0119 hjh0119 changed the title [bugfix] fix process_weights_after_loading [bugfix] fix process_weights_after_loading & non_thinking_prefix Jun 9, 2026
@hjh0119

hjh0119 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for injecting a non-thinking prefix during token-in-token-out re-encoding in GKD and GRPO trainers when thinking is disabled, and refactors vLLM weight reloading to run once after all weight buckets are loaded. It also updates the padding of routed experts in GRPO to expand the last entry. However, several critical issues were identified: a NameError in swift/ray/megatron/gkd_trainer.py where non_thinking_prefix_ids is undefined, a TypeError in swift/rlhf_trainers/gkd_trainer.py due to an unsupported keyword argument in build_teacher_infer_request, and shape mismatch bugs in both grpo_trainer.py and megatron_worker.py where left-padding incorrectly applies both left and right padding simultaneously.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

I am having trouble creating individual review comments. Click here to see my feedback.

swift/ray/megatron/gkd_trainer.py (288-293)

critical

NameError at Runtime

non_thinking_prefix_ids is referenced here but is never defined within the _fetch_teacher_from_replicas method. This will raise a NameError at runtime when OPSD is enabled with teacher replicas.

To fix this, define non_thinking_prefix_ids by calling get_non_thinking_prefix_ids(self.template) before using it.

                opsd_item = build_opsd_teacher_data([item])[0]
                if opsd_item.get('response_token_ids'):
                    non_thinking_prefix_ids = get_non_thinking_prefix_ids(self.template)
                    opsd_item['messages'] = replace_assistant_response_with_ids(
                        copy.deepcopy(opsd_item['messages']),
                        opsd_item['response_token_ids'],
                        non_thinking_prefix_ids=non_thinking_prefix_ids)

swift/rlhf_trainers/gkd_trainer.py (623-625)

critical

TypeError at Runtime

build_teacher_infer_request is defined in swift/rlhf_trainers/utils.py as def build_teacher_infer_request(data: Dict) -> 'RolloutInferRequest': and does not accept the non_thinking_prefix_ids keyword argument. Passing it here will cause a TypeError at runtime.

To fix this, you should update the signature of build_teacher_infer_request in swift/rlhf_trainers/utils.py to accept non_thinking_prefix_ids and pass it to replace_assistant_response_with_ids inside it.

swift/megatron/trainers/grpo_trainer.py (370-377)

high

Shape Mismatch Bug in Left Padding

When padding_right is False (left padding), the code prepends left_pad (of length padding_len) to padded_tail. However, padded_tail already has last_entry (of length padding_len) appended to its right. This results in a total sequence length of padding_to + padding_len instead of the expected padding_to, which will cause a shape mismatch error downstream.

To fix this, avoid appending last_entry to the right when left-padding. The padding should only be prepended to the original routed_experts tensor.

                    if padding_right:
                        last_entry = routed_experts[-1:].expand(padding_len, -1, -1)
                        padding_routed_experts = torch.cat([routed_experts, last_entry], dim=0)
                    else:
                        left_pad = torch.zeros(padding_len, *routed_experts.shape[1:], dtype=routed_experts.dtype)
                        padding_routed_experts = torch.cat([left_pad, routed_experts], dim=0)

swift/ray/megatron/megatron_worker.py (600-607)

high

Shape Mismatch Bug in Left Padding

When padding_right is False (left padding), the code prepends left_pad (of length pad_len) to padded. However, padded already has last_entry (of length pad_len) appended to its right. This results in a total sequence length of target_len + pad_len instead of the expected target_len, causing a shape mismatch error.

To fix this, only prepend left_pad to the original routed tensor when left-padding, without appending last_entry to the right.

        if padding_right:
            last_entry = routed[-1:].expand(pad_len, *routed.shape[1:])
            return torch.cat([routed, last_entry], dim=0)
        else:
            left_pad = torch.zeros(pad_len, *routed.shape[1:], dtype=routed.dtype)
            return torch.cat([left_pad, routed], dim=0)

@hjh0119

hjh0119 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for injecting non-thinking prefixes during token-in-token-out re-encoding when thinking is disabled, and refactors the vLLM weight reloading process to trigger process_weights_after_loading once after all weight buckets are loaded. The review feedback highlights several improvement opportunities: ensuring finish_vllm_weight_reload falls back to the FusedMoE-only path if the built-in vLLM method fails or lacks configuration, adding defensive checks when accessing template attributes in get_non_thinking_prefix_ids, and replacing the fragile next(llm_model.parameters()).device call with more robust device references (self.accelerator.device or self.device).

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread swift/rlhf_trainers/utils.py
Comment on lines +732 to +744
def get_non_thinking_prefix_ids(template) -> Optional[List[int]]:
"""Return the token ids of the non-thinking prefix (e.g. '<think>\n\n</think>\n\n').

When enable_thinking=False, the rollout engine injects this prefix into the prompt, so it
is part of the forwarded sequence (and routed_experts), but the generated response_token_ids
do NOT contain it. Token-in-token-out re-encoding must re-add it (masked out of the loss) to
keep the trainer/teacher sequence aligned with the rollout sequence. Returns None when the
prefix is not applicable (thinking enabled, or template has no non_thinking_prefix).
"""
non_thinking_prefix = template.template_meta.non_thinking_prefix
if template.enable_thinking is False and non_thinking_prefix:
return template.tokenizer.encode(non_thinking_prefix, add_special_tokens=False)
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent potential AttributeError or TypeError if template is None or if certain attributes like template_meta or enable_thinking are missing, use defensive getattr checks and early returns.

def get_non_thinking_prefix_ids(template) -> Optional[List[int]]:
    """Return the token ids of the non-thinking prefix (e.g. '<think>\\n\\n</think>\\n\\n').

    When enable_thinking=False, the rollout engine injects this prefix into the prompt, so it
    is part of the forwarded sequence (and routed_experts), but the generated response_token_ids
    do NOT contain it. Token-in-token-out re-encoding must re-add it (masked out of the loss) to
    keep the trainer/teacher sequence aligned with the rollout sequence. Returns None when the
    prefix is not applicable (thinking enabled, or template has no non_thinking_prefix).
    """
    if template is None:
        return None
    template_meta = getattr(template, "template_meta", None)
    if template_meta is None:
        return None
    non_thinking_prefix = getattr(template_meta, "non_thinking_prefix", None)
    if getattr(template, "enable_thinking", None) is False and non_thinking_prefix:
        return template.tokenizer.encode(non_thinking_prefix, add_special_tokens=False)
    return None

Comment thread swift/rlhf_trainers/rollout_mixin.py
Comment thread swift/megatron/trainers/rollout_mixin.py Outdated
@hjh0119 hjh0119 merged commit aa02800 into modelscope:main Jun 9, 2026
3 checks passed
@hjh0119 hjh0119 deleted the fix-sync-0609 branch June 9, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants