[misc] chore: sync latest training code by yyDing1 · Pull Request #69 · verl-project/uni-agent

yyDing1 · 2026-06-24T16:34:01Z

What does this PR do?

as title

Checklist Before Starting

Search for similar PRs or issues and paste at least one relevant link here: ...
Format the PR title as [{modules}] {type}: {description} (checked by CI)
- {modules} may include core, interaction, model, env, tools, deployment, reward, dashboard, docs, examples, data, train, ci, build, deps, misc
- If this PR involves multiple modules, separate them with , like [interaction, tools, docs]
- {type} must be one of feat, fix, refactor, chore, test
- If this PR breaks an API, config contract, workflow, or other compatibility boundary, add [BREAKING] to the beginning of the title
- For a stacked PR series, you may prepend a progress marker such as [1/N]
- Example: [BREAKING][deployment, docs] feat: simplify runtime env configuration

Test

List the checks you ran. If CI coverage is not practical for this change, describe the manual validation or experiment results.

API and Usage Example

Show any public interface changes or updated usage examples if relevant.

# Add a short example here when the PR changes public behavior

Design & Code Changes

Summarize the approach for non-trivial changes and call out important implementation details or trade-offs.

Checklist Before Submitting

Read the Contribute Guide
Run pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add or update docs/examples for user-facing changes
Add tests or explain why tests are not practical
Confirm the PR title matches the required format
Confirm the placeholder text in this template has been replaced with real content

gemini-code-assist

Code Review

This pull request introduces several updates across agent interaction, training, and data preprocessing. In parallel_infer.py, an optional --result-path argument is added to save inference results to a JSON file, and a potential division-by-zero error is resolved when calculating the mean RM score. In train_qwen3_moe.sh, configuration parameters for Decoupled PPO, Rollout Correction, and MoE Router Replay are parameterized and integrated, alongside additional timeout and checkpoint configurations. In swe_rebench.py, the dataset source is updated to nebius/SWE-rebench using the filtered split. The reviewer feedback suggests specifying encoding="utf-8" when writing the JSON result file to ensure cross-platform consistency, and quoting the array argument in the bash script to prevent shell globbing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-24T16:35:30Z

+        with open(result_path, "w") as f:
+            json.dump(result, f, indent=2)


When opening files for writing text, it is highly recommended to specify encoding="utf-8" to ensure consistent behavior across different platforms and environments (e.g., Windows vs. Linux).

Suggested change

with open(result_path, "w") as f:

json.dump(result, f, indent=2)

with open(result_path, "w", encoding="utf-8") as f:

json.dump(result, f, indent=2)

gemini-code-assist · 2026-06-24T16:35:30Z

+    actor_rollout_ref.rollout.enable_rollout_routing_replay=${enable_rollout_routing_replay} \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.actor.loss_agg_mode=${loss_agg_mode} \
+    +actor_rollout_ref.actor.checkpoint.save_contents=['model','hf_model'] \


In bash, passing unquoted arguments containing square brackets [ and ] (like ['model','hf_model']) can trigger shell globbing (pattern matching) if any matching files exist in the directory. To prevent unexpected shell expansion and ensure robust parsing by Hydra, please quote the entire argument.

Suggested change

+actor_rollout_ref.actor.checkpoint.save_contents=['model','hf_model'] \

+actor_rollout_ref.actor.checkpoint.save_contents="['model','hf_model']" \

update

aa64294

yyDing1 merged commit a066304 into main Jun 24, 2026
3 checks passed

gemini-code-assist Bot reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[misc] chore: sync latest training code#69

[misc] chore: sync latest training code#69
yyDing1 merged 1 commit into
mainfrom
sync/yy-dev-nonseed

yyDing1 commented Jun 24, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		with open(result_path, "w") as f:
		json.dump(result, f, indent=2)

	+actor_rollout_ref.actor.checkpoint.save_contents=['model','hf_model'] \
	+actor_rollout_ref.actor.checkpoint.save_contents="['model','hf_model']" \

Uh oh!

Conversation

yyDing1 commented Jun 24, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant