Rework PIT example train.py and data.py#125
Conversation
| audio_keys = ['observation', 'speech_source'] | ||
| def prepare_dataset(db, dataset_name: str, batch_size, prefetch=True, shuffle=True): | ||
| """ | ||
| Prepares the dataset for the training process (loading audio data, SFTF) |
| shuffle: should the data be shuffeled | ||
|
|
||
| Returns: | ||
| desired dataset of the database in prepared for the training |
There was a problem hiding this comment.
Something is wrong with the grammar in this sentence
| _config: Configuration dict of the experiment | ||
| _run: Run object of the current run of the experiment | ||
|
|
||
| Returns: |
There was a problem hiding this comment.
This can be left out when there is not return value
| None | ||
| """ | ||
| init(_config, _run) | ||
| (trainer, train_dataset, validate_dataset) = prepare(_config) |
There was a problem hiding this comment.
The parentheses on the left-hand-side are redundant
| # Test run to detects possible errors in the trainer/datasets | ||
| trainer.test_run(train_dataset, validate_dataset) | ||
|
|
||
| # path where the checkpoints of the training are stored |
There was a problem hiding this comment.
This comment is lower case, others are upper case. Stick to one (I prefer upper case)
| if shuffle: | ||
| dataset = dataset.shuffle(reshuffle=True) | ||
|
|
||
| #Splitting the dataset in batches and sorting the frames in the batch |
There was a problem hiding this comment.
Better write "... and sorts examples in a batch w.r.t. their duration" or something similar.
The frames themselves are not sorted
| def pre_batch_transform(inputs, return_keys=None): | ||
| def pre_batch_transform(inputs): | ||
| """ | ||
| Prepares the data through creating a dictionary with various data, which is computed through STFT. |
There was a problem hiding this comment.
"... by creating a dictionary with all data that is necessary for the model (e.g. STFT of observation)"
| """ Prepares the train and validation dataset from the database object """ | ||
| def prepare(_config): | ||
| """ | ||
| Preparation of the train and validation datasets for the training and initialisation of the padertorch trainer, |
There was a problem hiding this comment.
We try to stick to American English. intitialisation -> initialization
| database_json = _config['database_json'] | ||
|
|
||
| sacred.commands.print_config(_run) | ||
| # Initialisation of the trainer |
| checkpoint_path = trainer.checkpoint_dir / 'ckpt_latest.pth' | ||
|
|
||
| # Start of the training | ||
| trainer.register_validation_hook(validate_dataset) |
There was a problem hiding this comment.
Could you repeat the most important default arguments of the validation hook, so that it becomes clear, what options can be easily modified for the validation (number of checkpoints, metric for the best checkpoint, ...)
|
Now that the |
No description provided.