Fine-tuning Offline FastConformer to Streaming Architecture

I am trying to adapt nvidia/stt_ar_fastconformer_hybrid_large_pc_v1.0 (offline, att_context_size: [-1, -1]) to a streaming architecture (att_context_size: [70, 13], att_context_style: chunked_limited) using speech_to_text_hybrid_rnnt_ctc_bpe.py with init_from_pretrained_model (encoder only, decoder/joint reinitialized from scratch).
The RNNT decoder collapses immediately to a blank-only prediction strategy and never recovers, resulting in val_wer stuck at exactly 1.000 across all training runs regardless of hyperparameters.

```
init_from_pretrained_model:
  model0:
    name: nvidia/stt_ar_fastconformer_hybrid_large_pc_v1.0
    include:
      - encoder
    exclude:
      - conv.batch_norm
      - pre_encode.out.weight
```
@pzelasko 
@chtruong814 
@nithinraok 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning Offline FastConformer to Streaming Architecture #15819

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Fine-tuning Offline FastConformer to Streaming Architecture #15819

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions