I have carefully gone through the run_emilia.sh and run_libritts.sh scripts and noticed a difference in how checkpoint averaging and inference steps are handled for ZipVoice-Distill model.
In the run_emilia.sh script, I do not see any model averaging step applied to the checkpoints generated from the 2nd stage of ZipVoice-Distill training. Additionally, the averaged model is not used for either ZipVoice-Distill inference or ONNX model exporting. Instead, it appears that a specific checkpoint (checkpoint-2000.pt) is directly used for ONNX export and inference.
However, in the run_libritts.sh script, the workflow seems different. After the 2nd stage of ZipVoice-Distill training, the generated checkpoints are averaged, and the resulting averaged model is then used for ZipVoice-Distill inference. This sequence of steps differs from that in run_emilia.sh .
Could you please clarify:
-
Why is checkpoint averaging not performed (or not used) in run_emilia.sh?
-
Is there a specific reason for using checkpoint-2000.pt directly for ONNX export and inference instead of an averaged model?
Additionally, I would appreciate clarification on the difference between a PyTorch model (.pt) and the exported ONNX model in this pipeline.
I have carefully gone through the
run_emilia.shandrun_libritts.shscripts and noticed a difference in how checkpoint averaging and inference steps are handled for ZipVoice-Distill model.In the
run_emilia.shscript, I do not see any model averaging step applied to the checkpoints generated from the 2nd stage of ZipVoice-Distill training. Additionally, the averaged model is not used for either ZipVoice-Distill inference or ONNX model exporting. Instead, it appears that a specific checkpoint (checkpoint-2000.pt) is directly used for ONNX export and inference.However, in the
run_libritts.shscript, the workflow seems different. After the 2nd stage of ZipVoice-Distill training, the generated checkpoints are averaged, and the resulting averaged model is then used for ZipVoice-Distill inference. This sequence of steps differs from that inrun_emilia.sh.Could you please clarify:
Why is checkpoint averaging not performed (or not used) in
run_emilia.sh?Is there a specific reason for using checkpoint-2000.pt directly for ONNX export and inference instead of an averaged model?
Additionally, I would appreciate clarification on the difference between a PyTorch model (
.pt) and the exported ONNX model in this pipeline.