json/jsonl example of selected 3000 examples from V-Interaction-400K for RL training

Hi team,

I'm currently working on RL fine-tuning using the V-Interaction-400K dataset, and I encountered a critical format issue that blocks the training process. I would greatly appreciate your help with the following:

Background:
During the SFT phase, I used a single-turn conversation format for the messages field, and the model could output expected responses without any errors.
However, when switching to RL fine-tuning (following the same single-turn structure but replacing the assistant role content with the solution field from the dataset), the training consistently throws errors (related to message structure validation and reward calculation failures).

Core Question:
Could you confirm the correct messages format for V-Interaction-400K in RL training? Specifically:
Should it use single-turn conversation (like SFT) or multi-turn conversation?
If single-turn is required, are there any differences from the SFT format (e.g., role naming, content structure, or additional fields)?

Example of My Current Single-Turn Format (SFT-Successful)
For reference, here's the format that worked in SFT:

{
  "messages": [
    {
      "role": "user",
      "content": "<image>\n[Problem description with mathematical expressions and choices]"
    },
    {
      "role": "assistant",
      "content": "<think>[Geometric reasoning process]\n\n<code>\n'''python\n[Image processing/geometry visualization code]\n'''\n</code>\n<sandbox_output><image></sandbox_output>\n\n[Calculation and conclusion logic]\n[Reasoning content]\n</think>\n<answer>[final answer]</answer>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json/jsonl example of selected 3000 examples from V-Interaction-400K for RL training #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

json/jsonl example of selected 3000 examples from V-Interaction-400K for RL training #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions