Skip to content
This repository was archived by the owner on Apr 29, 2021. It is now read-only.

Per-feature normalization#23

Open
julianmack wants to merge 4 commits into
masterfrom
per_feat_norm
Open

Per-feature normalization#23
julianmack wants to merge 4 commits into
masterfrom
per_feat_norm

Conversation

@julianmack

Copy link
Copy Markdown
Contributor

V. small PR adding per-feature normalization

@julianmack julianmack requested a review from samgd January 15, 2020 16:33
pre_process_step {
stage: TRAIN_AND_EVAL;
standardize {
norm_type: ALL_FEATURES;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e so that behaviour doesn't change although it's likely that setting this to PER_FEATURE will be better.

@samgd samgd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will per-tensor ("all_features") normalisation ever be preferable over per-feature (computed over a single batch)? If not it could be removed entirely to simplify the code base.

It also might be interesting to compare computing the mean and std over the entire dataset rather than each batch?

@julianmack

julianmack commented Jan 24, 2020

Copy link
Copy Markdown
Contributor Author

It also might be interesting to compare computing the mean and std over the entire dataset rather than each batch?

The current implementation is actually computing the mean and std over each sample not over each batch. Each feature is normalised over time in the sequence which gives a loose approximation to per-speaker normalisation (where each sample is assumed to be a different speaker).

So the options are x3:

  1. Per sample - current
  2. Per batch
  3. Per whole dataset

I've had a preliminary think about how we might implement 2. and 3. - I think it's worth discussing which we would like because, I think both will require relatively large changes to the codebase.

2

Should be easy-ish to implement - although it will require re-thinking the preprocessing pipeline as all transforms are applied per-sample at the moment in the dataset class:

class LibriSpeech(Dataset):
    ....
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, str]:
        ...
        if self._transform is not None:
            audio = self._transform(audio)        # <- all preprocessing applied here

I could move some (all?) of the preprocessing steps to the seq_to_seq_collate_fn to achieve normalization per batch?

3

Should be O.K. to implement but will require running the train_loader for all samples at the beginning of training I think. It's not feasible to pre-comute and hardcode these as we would need them for each combination of {dataset, subset, FeatExtractionType, number_features, win_len, hop_len } etc. This would add quite a lot of complexity if we just wanted to run evaluation on a model but it was still necessary to build and run the train loaders to get the mean/std.

A method of avoiding this eval-time faff could be to get model to remember averages when self.training == True effectively adding a normalisation layer at the start of every model.

conclusion

If we're not sure which we expect to be best, it might be worth running some training experiments before proceeding with a proper implementation.

and also

Will per-tensor ("all_features") normalisation ever be preferable over per-feature (computed over a single batch)? If not it could be removed entirely to simplify the code base.

Yes I agree - it will never be better. I'll remove after we decide on a route above.

@julianmack julianmack requested a review from samgd January 24, 2020 14:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants