Skip to content

[misc] chore: sync latest training code#69

Merged
yyDing1 merged 1 commit into
mainfrom
sync/yy-dev-nonseed
Jun 24, 2026
Merged

[misc] chore: sync latest training code#69
yyDing1 merged 1 commit into
mainfrom
sync/yy-dev-nonseed

Conversation

@yyDing1

@yyDing1 yyDing1 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

as title

Checklist Before Starting

  • Search for similar PRs or issues and paste at least one relevant link here: ...
  • Format the PR title as [{modules}] {type}: {description} (checked by CI)
    • {modules} may include core, interaction, model, env, tools, deployment, reward, dashboard, docs, examples, data, train, ci, build, deps, misc
    • If this PR involves multiple modules, separate them with , like [interaction, tools, docs]
    • {type} must be one of feat, fix, refactor, chore, test
    • If this PR breaks an API, config contract, workflow, or other compatibility boundary, add [BREAKING] to the beginning of the title
    • For a stacked PR series, you may prepend a progress marker such as [1/N]
    • Example: [BREAKING][deployment, docs] feat: simplify runtime env configuration

Test

List the checks you ran. If CI coverage is not practical for this change, describe the manual validation or experiment results.

API and Usage Example

Show any public interface changes or updated usage examples if relevant.

# Add a short example here when the PR changes public behavior

Design & Code Changes

Summarize the approach for non-trivial changes and call out important implementation details or trade-offs.

Checklist Before Submitting

  • Read the Contribute Guide
  • Run pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
  • Add or update docs/examples for user-facing changes
  • Add tests or explain why tests are not practical
  • Confirm the PR title matches the required format
  • Confirm the placeholder text in this template has been replaced with real content

@yyDing1 yyDing1 merged commit a066304 into main Jun 24, 2026
3 checks passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several updates across agent interaction, training, and data preprocessing. In parallel_infer.py, an optional --result-path argument is added to save inference results to a JSON file, and a potential division-by-zero error is resolved when calculating the mean RM score. In train_qwen3_moe.sh, configuration parameters for Decoupled PPO, Rollout Correction, and MoE Router Replay are parameterized and integrated, alongside additional timeout and checkpoint configurations. In swe_rebench.py, the dataset source is updated to nebius/SWE-rebench using the filtered split. The reviewer feedback suggests specifying encoding="utf-8" when writing the JSON result file to ensure cross-platform consistency, and quoting the array argument in the bash script to prevent shell globbing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +133 to +134
with open(result_path, "w") as f:
json.dump(result, f, indent=2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When opening files for writing text, it is highly recommended to specify encoding="utf-8" to ensure consistent behavior across different platforms and environments (e.g., Windows vs. Linux).

Suggested change
with open(result_path, "w") as f:
json.dump(result, f, indent=2)
with open(result_path, "w", encoding="utf-8") as f:
json.dump(result, f, indent=2)

actor_rollout_ref.rollout.enable_rollout_routing_replay=${enable_rollout_routing_replay} \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.actor.loss_agg_mode=${loss_agg_mode} \
+actor_rollout_ref.actor.checkpoint.save_contents=['model','hf_model'] \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In bash, passing unquoted arguments containing square brackets [ and ] (like ['model','hf_model']) can trigger shell globbing (pattern matching) if any matching files exist in the directory. To prevent unexpected shell expansion and ensure robust parsing by Hydra, please quote the entire argument.

Suggested change
+actor_rollout_ref.actor.checkpoint.save_contents=['model','hf_model'] \
+actor_rollout_ref.actor.checkpoint.save_contents="['model','hf_model']" \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant