Per-feature normalization#23
Conversation
| pre_process_step { | ||
| stage: TRAIN_AND_EVAL; | ||
| standardize { | ||
| norm_type: ALL_FEATURES; |
There was a problem hiding this comment.
i.e so that behaviour doesn't change although it's likely that setting this to PER_FEATURE will be better.
samgd
left a comment
There was a problem hiding this comment.
Will per-tensor ("all_features") normalisation ever be preferable over per-feature (computed over a single batch)? If not it could be removed entirely to simplify the code base.
It also might be interesting to compare computing the mean and std over the entire dataset rather than each batch?
The current implementation is actually computing the mean and std over each sample not over each batch. Each feature is normalised over time in the sequence which gives a loose approximation to So the options are x3:
I've had a preliminary think about how we might implement 2. and 3. - I think it's worth discussing which we would like because, I think both will require relatively large changes to the codebase. 2Should be easy-ish to implement - although it will require re-thinking the preprocessing pipeline as all transforms are applied per-sample at the moment in the dataset class: class LibriSpeech(Dataset):
....
def __getitem__(self, index: int) -> Tuple[torch.Tensor, str]:
...
if self._transform is not None:
audio = self._transform(audio) # <- all preprocessing applied hereI could move some (all?) of the preprocessing steps to the 3Should be O.K. to implement but will require running the A method of avoiding this eval-time faff could be to get model to remember averages when conclusionIf we're not sure which we expect to be best, it might be worth running some training experiments before proceeding with a proper implementation. and also
Yes I agree - it will never be better. I'll remove after we decide on a route above. |
V. small PR adding per-feature normalization