Skip to content

The Nemotron-3.5-ASR-Streaming-0.6B model has poor Chinese speech recognition performance. #15809

Description

@liujixingit

I called the official test demo for speech recognition testing。as follow:
python examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py model_path=/root/.cache/modelscope/hub/models/nv-community/nemotron-3.5-asr-streaming-0.6b/nemotron-3.5-asr-streaming-0.6b.nemo audio_file=/opt/ai/liujixin/ASR/case5_sohu.wav batch_size=1 target_lang=auto att_context_size="[56,13]" strip_lang_tags=true compare_vs_offline=true amp=true debug_mode=true

audio file.
case5_sohu.wav

ASR output result:

Image

the ground truth as follow:

Image

I'm not sure if it's a problem with my invocation or a performance flaw in the model itself.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions