[misc] chore: sync latest training code#69
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces several updates across agent interaction, training, and data preprocessing. In parallel_infer.py, an optional --result-path argument is added to save inference results to a JSON file, and a potential division-by-zero error is resolved when calculating the mean RM score. In train_qwen3_moe.sh, configuration parameters for Decoupled PPO, Rollout Correction, and MoE Router Replay are parameterized and integrated, alongside additional timeout and checkpoint configurations. In swe_rebench.py, the dataset source is updated to nebius/SWE-rebench using the filtered split. The reviewer feedback suggests specifying encoding="utf-8" when writing the JSON result file to ensure cross-platform consistency, and quoting the array argument in the bash script to prevent shell globbing.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| with open(result_path, "w") as f: | ||
| json.dump(result, f, indent=2) |
There was a problem hiding this comment.
When opening files for writing text, it is highly recommended to specify encoding="utf-8" to ensure consistent behavior across different platforms and environments (e.g., Windows vs. Linux).
| with open(result_path, "w") as f: | |
| json.dump(result, f, indent=2) | |
| with open(result_path, "w", encoding="utf-8") as f: | |
| json.dump(result, f, indent=2) |
| actor_rollout_ref.rollout.enable_rollout_routing_replay=${enable_rollout_routing_replay} \ | ||
| actor_rollout_ref.actor.entropy_coeff=0 \ | ||
| actor_rollout_ref.actor.loss_agg_mode=${loss_agg_mode} \ | ||
| +actor_rollout_ref.actor.checkpoint.save_contents=['model','hf_model'] \ |
There was a problem hiding this comment.
In bash, passing unquoted arguments containing square brackets [ and ] (like ['model','hf_model']) can trigger shell globbing (pattern matching) if any matching files exist in the directory. To prevent unexpected shell expansion and ensure robust parsing by Hydra, please quote the entire argument.
| +actor_rollout_ref.actor.checkpoint.save_contents=['model','hf_model'] \ | |
| +actor_rollout_ref.actor.checkpoint.save_contents="['model','hf_model']" \ |
What does this PR do?
as title
Checklist Before Starting
[{modules}] {type}: {description}(checked by CI){modules}may includecore,interaction,model,env,tools,deployment,reward,dashboard,docs,examples,data,train,ci,build,deps,misc,like[interaction, tools, docs]{type}must be one offeat,fix,refactor,chore,test[BREAKING]to the beginning of the title[1/N][BREAKING][deployment, docs] feat: simplify runtime env configurationTest
API and Usage Example
# Add a short example here when the PR changes public behaviorDesign & Code Changes
Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always