Skip to content

aishell1 zipformer recipe dose not work #2069

Description

@menggedu

according to the readme, the running script is:
export CUDA_VISIBLE_DEVICES="0,1"
./zipformer/train.py
--world-size 2
--num-epochs 60
--start-epoch 1
--use-fp16 1
--context-size 1
--enable-musan 0
--exp-dir zipformer/exp
--max-duration 500
--base-lr 0.045
--lr-batches 7500
--lr-epochs 18
--spec-aug-time-warp-factor 20
--use-ctc 1
--use-cr-ctc 1
--use-transducer 0
--enable-spec-aug 0
--cr-loss-scale 0.2. however, even I decreased the base-lr to 0.02, the following errors still appeared:
========================= NOTE =========================
If you see this error, it means that the gradient scale is too small.

    The default base_lr is 0.045 / 0.05 (depends on which recipe you are 
    using), this is an empirical value obtained mostly using 4 * 32GB V100 
    GPUs with a max_duration of approx. 1,000. 
    The proper value of base_lr may vary depending on the number of GPUs 
    and the value of max-duration you are using. 

    To fix this issue, you may need to adjust the value of base_lr accordingly.

    We would suggest you to decrease the value of base_lr by 0.005 (e.g., 
    from 0.045 to 0.04), and try again. If the error still exists, you may 
    repeat the process until base_lr hits 0.02. (Note that this will lead to 
    certain loss of performance, but it should work. You can compensate this by
    increasing the num_epochs.)
    
    If the error still exists, you could try to seek help by raising an issue, 
    with a detailed description of (a) your computational resources, (b) the 
    base_lr and (c) the max_duration you are using, (d) detailed configuration 
    of your model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions