Skip to content

issue in librispeech recipe, Transducer+Ctc model (streaming) #2092

Description

@KarelVesely84

Hi,
i just found some unusual behavior after rebuilding fresh environment (torch 2.8, cuda12.6).

When training Transduder+Ctc model, with Eden(... , warmup_start=0.1), the training diverges
after 100-200 steps of 1st epoch.

The issue disappears:

  • when removing the Ctc loss
  • or when setting back to original value Eden(... , warmup_start=0.5) (even for Transducer+Ctc training)
    • update: this postpones the divergence to epoch 3 (librispeech 100h train task),
      right before diverging the grad norm seems huge:
      2026-06-23 17:37:35,479 WARNING [optim.py:588] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+07 2.033e+08 4.885e+08 1.750e+09 8.677e+10, threshold=9.769e+08, percent-clipped=13.0

It seems like the ScaledAdam does not like the early Ctc gradients when the learning rate is too small.
Should there be some Ctc loss warmstarting introduced for Transducer+Ctc loss training ?
(assuming the model should learn first something from Transducer loss...)

Have you seen something similar ?

With kind regards
Karel from BUT

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions