This repository contains training utilities for DualAxisRM.
The code evaluates spoken dialogue along two axes:
Response Relevance: whether the reply is logically consistent and topically appropriateInteractional Fluency: whether turn-taking is natural, including long pauses and extended overlap
The final label is binary:
0: poor interaction1: strong interaction
DualAxisRM/
├── examples/
│ └── data/
├── scripts/
├── src/
│ └── dual_axis_rm/
└── tools/
pip install -r requirements.txt
pip install -e .Each input line in examples/data/source.example.jsonl follows this schema:
{
"audio": "relative/or/absolute/path/to/dialogue.wav",
"overall_score": 0,
"response_think": "The response stays coherent and answers the previous turn directly.",
"fluency_think": "Turn-taking is natural, with no harmful overlap or long silence."
}Build SFT data:
python tools/build_dataset.py \
--input examples/data/source.example.jsonl \
--output data/train_sft.jsonl \
--mode sftBuild GRPO data:
python tools/build_dataset.py \
--input examples/data/source.example.jsonl \
--output data/train_grpo.jsonl \
--mode grpoMODEL_PATH=Qwen/Qwen2.5-Omni-7B \
DATASET_PATH=data/train_sft.jsonl \
OUTPUT_DIR=outputs/sft \
bash scripts/train_sft.shMODEL_PATH=outputs/sft/checkpoint-xxx \
DATASET_PATH=data/train_grpo.jsonl \
OUTPUT_DIR=outputs/grpo \
bash scripts/train_grpo.shMODEL_PATH=outputs/grpo/checkpoint-xxx \
VAL_DATASET=data/val.jsonl \
bash scripts/infer.sh