issue in librispeech recipe, Transducer+Ctc model (streaming)

Hi, 
i just found some unusual behavior after rebuilding fresh environment (torch 2.8, cuda12.6).

When training Transduder+Ctc model, with `Eden(... , warmup_start=0.1)`, the training diverges
after 100-200 steps of 1st epoch.

The issue disappears:
- when removing the Ctc loss
- or when setting back to original value `Eden(... , warmup_start=0.5)` (even for Transducer+Ctc training)
   - update: this postpones the divergence to epoch 3 (librispeech 100h train task),
     right before diverging the grad norm seems huge:
     `2026-06-23 17:37:35,479 WARNING [optim.py:588] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+07 2.033e+08 4.885e+08 1.750e+09 8.677e+10, threshold=9.769e+08, percent-clipped=13.0`

It seems like the ScaledAdam does not like the early Ctc gradients when the learning rate is too small. 
Should there be some Ctc loss warmstarting introduced for Transducer+Ctc loss training ?
(assuming the model should learn first something from Transducer loss...)

Have you seen something similar ?

With kind regards
Karel from BUT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue in librispeech recipe, Transducer+Ctc model (streaming) #2092

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

issue in librispeech recipe, Transducer+Ctc model (streaming) #2092

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions