I called the official test demo for speech recognition testing。as follow:
python examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py model_path=/root/.cache/modelscope/hub/models/nv-community/nemotron-3.5-asr-streaming-0.6b/nemotron-3.5-asr-streaming-0.6b.nemo audio_file=/opt/ai/liujixin/ASR/case5_sohu.wav batch_size=1 target_lang=auto att_context_size="[56,13]" strip_lang_tags=true compare_vs_offline=true amp=true debug_mode=true
audio file.
case5_sohu.wav
ASR output result:
the ground truth as follow:
I'm not sure if it's a problem with my invocation or a performance flaw in the model itself.
I called the official test demo for speech recognition testing。as follow:
python examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py model_path=/root/.cache/modelscope/hub/models/nv-community/nemotron-3.5-asr-streaming-0.6b/nemotron-3.5-asr-streaming-0.6b.nemo audio_file=/opt/ai/liujixin/ASR/case5_sohu.wav batch_size=1 target_lang=auto att_context_size="[56,13]" strip_lang_tags=true compare_vs_offline=true amp=true debug_mode=true
audio file.
case5_sohu.wav
ASR output result:
the ground truth as follow:
I'm not sure if it's a problem with my invocation or a performance flaw in the model itself.