Add mls_english and multi_ja_en recipes#2
Open
kinanmartin wants to merge 146 commits into
Open
Conversation
…ed symlink to librispeech
…just datamodule to load from manifest files
This reverts commit ba603e0.
…ultiDatasetAsrDataModule, not tested yet
…re.sh from commit 547f5c5
…tructure, added script to update cutset paths. WIP
…the multilingual training recipe directory structure
…o make dev and test splits have matching sizes to reazonspeech
Musan mls clean final
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
I find there are some inf in tot_score, it makes model cannot converge, add inf mask can make training more stable.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the new code for two icefall recipes:
mls_englishmulti_ja_enTo get the big picture of how each recipe works, please first look at the
prepare.shscripts in each recipe: see themls_englishprepare script here, and themulti_ja_enprepare script here.The
mls_englishprepare script downloads the parler-tts/mls_eng dataset from HF, computes features, createslhotsemanifests, and trains a BPE tokenizer. Themls_englishtraining code (zipformer/train.py) trains a model in a similar way to thereazonspeechrecipe.The
multi_ja_enprepare script depends on the objects created by both themls_englishprepare script and themulti_ja_enprepare script. It creates symlinks to the features computed by each of those recipes, then creates newlhotsemanifests in order to properly use those features during training.