Batch norm#11
Conversation
julianmack
left a comment
There was a problem hiding this comment.
Hi @giuseppeCoccia - just a single comment here on protos (I was browsing). Will look at this in more detail next week if sam hasn't by then
| if self.batch_norm is not None: | ||
| # Collapses input of dim T*N*H to (T*N)*H and gives it to a batch |
There was a problem hiding this comment.
What about if the input to the rnn has dim=4?
Also - these comments are confusing in the batch_first=True case
There was a problem hiding this comment.
if the input has dim=4 then it would be necessary to use a batchnorm2d or change/collapse the input dimensions. I assumed that the input was always with dim=3 because it was explicitly said in the arg description of the forward function, but maybe it was a wrong assumption. Have you encountered situations in which the input dim is 4?
Regarding the comments, I have slightly changed them to clarify that it doesn't matter if batch_first is True or False because the first two dimensions are both collapsed into a single one. Let me know if they are still confusing.
There was a problem hiding this comment.
Ok you're absolutely right - dim=4 shouldn't work according to the docstring so the api shouldn't support this - my mistake.
The batch_first comments are v clear now!
julianmack
left a comment
There was a problem hiding this comment.
Hi @giuseppeCoccia - I have made quite a large high-level request for changes here. I think we should discuss with @samgd first because I may be wrong!
| FullyConnected( | ||
| (fully_connected): Sequential( | ||
| (0): Linear(in_features=32, out_features=64, bias=True) | ||
| (1): ReLU() | ||
| (2): Linear(in_features=64, out_features=64, bias=True) | ||
| (3): ReLU() | ||
| (4): Linear(in_features=64, out_features=16, bias=True) |
There was a problem hiding this comment.
We have delegated quite a lot of responsibility to the builders instead of our custom torch.nn.Modules here. In my opinion building the multiple layers should take place in the modules themselves meaning the builders change very little (apart from passing the batch_norm bool). So for example the new FullyConnected model def would look something like:
FullyConnected(
(fully_connected): Sequential(
(0): BatchNorm
(1): Linear(in_features=32, out_features=64, bias=True)
(2): ReLU()
etc
(i.e. more similar to the original implementation).
This is quite a big change so it may be best to check with @samgd first. In any case, my reasoning is as follows:
With the proposed design, we don't have full control over the FullyConnected output as Sequential is the top level module (i.e. we have Sequential( (0): FullyConnected(...). Requiring all modules to be sequential may make it harder to add features in future (e.g. parallel filters).
A concrete example of this is in the RNN class, where we use the hidden state outputs for rnnt decoding. With the proposed Sequential design, there isn't a class that will house the functions to collate and parse the hidden state across multiple layers (eg see deepseech_internal: https://github.com/MyrtleSoftware/deepspeech_internal/blob/98420a20b405389fbebbff5afd0dd71e523f2ba7/src/deepspeech/networks/utils.py#L326)
There was a problem hiding this comment.
Sticking to the standard PyTorch RNN API that includes num_layers is a good idea - it means the classes (implementation) can be used in a standalone manner without having to deal with builders/protobufs (e.g. creating an RNN layer in a notebook).
Keeping the implementation contained in one place may lead to less bugs (single responsibility). What currently happens if dropout = 0.5 is passed to LSTM?
There was a problem hiding this comment.
I have just changed both the RNN and the FullyConnected building phases. Now everything is done inside the RNN and FullyConnected classes. Their forward functions are now a bit longer because it is necessary to check the type of each single layer in a for loop (whether it is a batch_norm, rnn, etc.) in order to run them with the correct input sizes (or to pad the input sequences)
samgd
left a comment
There was a problem hiding this comment.
See comments - pushing num_layers into the RNN and FullyConnected layers feels like the best solution.
|
TODO: this needs to be rebased on |
Added batch normalization to rnn, convolution and fully connected layers + Fixed all tests.
The way all layers are built seems to be changed a lot but it is simply because it is now necessary to build separately all layers in order to insert batch norm between them.
Example: instead of calling torch.nn.LSTM once and passing the number of layers as an argument, it is now necessary to call this function num_layers time.