[data, rollout, worker] feat: add Open-R1 multimodal and TinyLLaVA-Video-R1 preprocessing and training scripts#6849
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds preprocessing scripts and GRPO training configurations for the Open-R1 multimodal math and TinyLLaVA-Video-R1 datasets, along with related fixes in the agent loop and engine workers. The review feedback highlights critical issues: the referenced custom reward score scripts are missing from the repository, and the Open-R1 preprocessing script needs to disable automatic image decoding and explicitly select columns to prevent saving large, decoded images in the output Parquet files.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
50fc30a to
db5abb4
Compare
…ets, and implement reward scoring functions
db5abb4 to
555a539
Compare
What does this PR do?
Add GRPO training support for two multimodal datasets:
lmms-lab/multimodal-open-r1-8k-verified— image math reasoning,<think>/<answer>formatZhang199/TinyLLaVA-Video-R1-training-data— video multiple-choice QAbugfixes included:
verl/workers/engine_workers.py— addmaybe_fix_3d_position_idsinsidetrain_mini_batchloop to fix_ragged_idxcorruptionChecklist Before Starting
https://github.com/verl-project/verl/pulls?q=is%3Apr+openr1mm
https://github.com/verl-project/verl/pulls?q=is%3Apr+tinyllava
Test
Both datasets trained on GPU and NPU. NPU training curves:
lmms-lab/multimodal-open-r1-8k-verified:Zhang199/TinyLLaVA-Video-R1-training-data:API and Usage Example
Design & Code Changes
examples/data_preprocess/openr1mm.py— HF dataset → verl parquet, image bytes preservedexamples/data_preprocess/tinyllava_video_r1.py— JSONL → verl parquet, video file pathsexamples/grpo_trainer/run_qwen3_5_2b_openr1_fsdp.sh— training script for image datasetexamples/grpo_trainer/run_qwen3_5_2b_video_fsdp.sh— training script for video datasetverl/workers/engine_workers.py—maybe_fix_3d_position_idsintrain_mini_batchloopChecklist Before Submitting