The Nemotron-3.5-ASR-Streaming-0.6B model has poor Chinese speech recognition performance.

I called the official test demo for speech recognition testing。as follow:
python examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py model_path=/root/.cache/modelscope/hub/models/nv-community/nemotron-3.5-asr-streaming-0.6b/nemotron-3.5-asr-streaming-0.6b.nemo audio_file=/opt/ai/liujixin/ASR/case5_sohu.wav batch_size=1 target_lang=auto att_context_size="[56,13]" strip_lang_tags=true compare_vs_offline=true amp=true debug_mode=true

audio file.
[case5_sohu.wav](https://github.com/user-attachments/files/29081475/case5_sohu.wav)

ASR output result：

<img width="1812" height="406" alt="Image" src="https://github.com/user-attachments/assets/ea485950-3cdd-419a-80d4-5f829a786627" />

the ground truth as follow:

<img width="1340" height="288" alt="Image" src="https://github.com/user-attachments/assets/f1d47181-3038-478f-ad5d-d3d545801b23" />


I'm not sure if it's a problem with my invocation or a performance flaw in the model itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Nemotron-3.5-ASR-Streaming-0.6B model has poor Chinese speech recognition performance. #15809

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

The Nemotron-3.5-ASR-Streaming-0.6B model has poor Chinese speech recognition performance. #15809

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions