[data, rollout, worker] feat: add Open-R1 multimodal and TinyLLaVA-Video-R1 preprocessing and training scripts by lihanwen7 · Pull Request #6849 · verl-project/verl

lihanwen7 · 2026-06-25T13:28:05Z

What does this PR do?

Add GRPO training support for two multimodal datasets:

lmms-lab/multimodal-open-r1-8k-verified — image math reasoning, <think>/<answer> format
Zhang199/TinyLLaVA-Video-R1-training-data — video multiple-choice QA

bugfixes included:

verl/workers/engine_workers.py — add maybe_fix_3d_position_ids inside train_mini_batch loop to fix _ragged_idx corruption

Checklist Before Starting

Search for similar PRs:
https://github.com/verl-project/verl/pulls?q=is%3Apr+openr1mm
https://github.com/verl-project/verl/pulls?q=is%3Apr+tinyllava

Test

Both datasets trained on GPU and NPU. NPU training curves:

lmms-lab/multimodal-open-r1-8k-verified:

Zhang199/TinyLLaVA-Video-R1-training-data:

API and Usage Example

# Preprocess
python examples/data_preprocess/openr1mm.py --local_save_dir ~/data/openr1mm
python examples/data_preprocess/tinyllava_video_r1.py \
    --data_dir ~/data/tinyllava-video-r1 --local_save_dir ~/data/tinyllava_video_r1

# Train
bash examples/grpo_trainer/run_qwen3_5_2b_openr1_fsdp.sh
bash examples/grpo_trainer/run_qwen3_5_2b_video_fsdp.sh

Design & Code Changes

examples/data_preprocess/openr1mm.py — HF dataset → verl parquet, image bytes preserved
examples/data_preprocess/tinyllava_video_r1.py — JSONL → verl parquet, video file paths
examples/grpo_trainer/run_qwen3_5_2b_openr1_fsdp.sh — training script for image dataset
examples/grpo_trainer/run_qwen3_5_2b_video_fsdp.sh — training script for video dataset
verl/workers/engine_workers.py — maybe_fix_3d_position_ids in train_mini_batch loop

Checklist Before Submitting

Read the Contribute Guide
Apply pre-commit checks
Request CI in ci-request

gemini-code-assist

Code Review

This pull request adds preprocessing scripts and GRPO training configurations for the Open-R1 multimodal math and TinyLLaVA-Video-R1 datasets, along with related fixes in the agent loop and engine workers. The review feedback highlights critical issues: the referenced custom reward score scripts are missing from the repository, and the Open-R1 preprocessing script needs to disable automatic image decoding and explicitly select columns to prevent saving large, decoded images in the output Parquet files.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…ets, and implement reward scoring functions

lihanwen7 requested review from ArronHZG, PeterSH6, ji-huazhong, tardis-key, vermouth1992, wucong25 and wuxibin89 as code owners June 25, 2026 13:28

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread examples/grpo_trainer/run_qwen3_5_2b_openr1_fsdp.sh

Comment thread examples/grpo_trainer/run_qwen3_5_2b_video_fsdp.sh

Comment thread examples/data_preprocess/openr1mm.py

Comment thread examples/data_preprocess/openr1mm.py

lihanwen7 force-pushed the feat/openr1mm-tinyllava branch 2 times, most recently from 50fc30a to db5abb4 Compare June 26, 2026 10:41

feat: add preprocessing scripts for Open-R1 and TinyLLaVA-Video datas…

555a539

…ets, and implement reward scoring functions

lihanwen7 force-pushed the feat/openr1mm-tinyllava branch from db5abb4 to 555a539 Compare June 26, 2026 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data, rollout, worker] feat: add Open-R1 multimodal and TinyLLaVA-Video-R1 preprocessing and training scripts#6849

[data, rollout, worker] feat: add Open-R1 multimodal and TinyLLaVA-Video-R1 preprocessing and training scripts#6849
lihanwen7 wants to merge 1 commit into
verl-project:mainfrom
lihanwen7:feat/openr1mm-tinyllava

lihanwen7 commented Jun 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lihanwen7 commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lihanwen7 commented Jun 25, 2026 •

edited

Loading