For Common Voice, download from: https://commonvoice.mozilla.org/en/datasets
Since some audio files in Common Voice are broken, you can use validated_common_voice.py to obtain validated ones. Make sure to replace root_dir, language, and split in the python file.
For NTUML2021, download from: https://huggingface.co/datasets/ky552/ML2021_ASR_ST
For Fisher, download from: https://catalog.ldc.upenn.edu/LDC2010S01
It is recommended to build a Python-3.10 virtual environment using conda
conda create --name csstllm python=3.10 -y
conda activate csstllm
cd xtuner
pip install -e '.[all]'
pip install -U openai-whisper
pip install evaluate
pip install sacrebleu
pip install jiwer==3.1.0
pip install peft==0.12.0
pip install torch==2.4.0
pip install torchvision==0.19.0
pip install datasets==2.21.0
pip install librosa==0.11.0 soundfile==0.13.0
pip install deepspeed==0.17.4Taking NTUML2021 as a example
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage1_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage2_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage3_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage4_ntuml.py --deepspeed deepspeed_zero2Make sure to replace root_dir in the python file.
NPROC_PER_NODE=4 xtuner test workspace/9b_llama3_chat_stage4_ntuml.py --checkpoint work_dir/9b_llama3_chat_stage4_ntuml/epoch_1.pth/mp_rank_00_model_states.pt