Skip to content
This repository was archived by the owner on Apr 29, 2021. It is now read-only.

Batch norm#11

Open
giuseppeCoccia wants to merge 9 commits into
masterfrom
batch_norm
Open

Batch norm#11
giuseppeCoccia wants to merge 9 commits into
masterfrom
batch_norm

Conversation

@giuseppeCoccia

Copy link
Copy Markdown
Contributor

Added batch normalization to rnn, convolution and fully connected layers + Fixed all tests.
The way all layers are built seems to be changed a lot but it is simply because it is now necessary to build separately all layers in order to insert batch norm between them.
Example: instead of calling torch.nn.LSTM once and passing the number of layers as an argument, it is now necessary to call this function num_layers time.

@giuseppeCoccia giuseppeCoccia requested a review from samgd November 19, 2019 12:37

@julianmack julianmack left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @giuseppeCoccia - just a single comment here on protos (I was browsing). Will look at this in more detail next week if sam hasn't by then

Comment thread src/myrtlespeech/protos/deep_speech_2.proto Outdated
Comment thread src/myrtlespeech/model/rnn.py Outdated
Comment on lines +143 to +144
if self.batch_norm is not None:
# Collapses input of dim T*N*H to (T*N)*H and gives it to a batch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if the input to the rnn has dim=4?

Also - these comments are confusing in the batch_first=True case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the input has dim=4 then it would be necessary to use a batchnorm2d or change/collapse the input dimensions. I assumed that the input was always with dim=3 because it was explicitly said in the arg description of the forward function, but maybe it was a wrong assumption. Have you encountered situations in which the input dim is 4?

Regarding the comments, I have slightly changed them to clarify that it doesn't matter if batch_first is True or False because the first two dimensions are both collapsed into a single one. Let me know if they are still confusing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok you're absolutely right - dim=4 shouldn't work according to the docstring so the api shouldn't support this - my mistake.

The batch_first comments are v clear now!

@julianmack julianmack left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @giuseppeCoccia - I have made quite a large high-level request for changes here. I think we should discuss with @samgd first because I may be wrong!

Comment on lines -39 to -45
FullyConnected(
(fully_connected): Sequential(
(0): Linear(in_features=32, out_features=64, bias=True)
(1): ReLU()
(2): Linear(in_features=64, out_features=64, bias=True)
(3): ReLU()
(4): Linear(in_features=64, out_features=16, bias=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have delegated quite a lot of responsibility to the builders instead of our custom torch.nn.Modules here. In my opinion building the multiple layers should take place in the modules themselves meaning the builders change very little (apart from passing the batch_norm bool). So for example the new FullyConnected model def would look something like:

FullyConnected(
          (fully_connected): Sequential(
            (0): BatchNorm
            (1): Linear(in_features=32, out_features=64, bias=True)
            (2): ReLU()
            etc

(i.e. more similar to the original implementation).

This is quite a big change so it may be best to check with @samgd first. In any case, my reasoning is as follows:

With the proposed design, we don't have full control over the FullyConnected output as Sequential is the top level module (i.e. we have Sequential( (0): FullyConnected(...). Requiring all modules to be sequential may make it harder to add features in future (e.g. parallel filters).
A concrete example of this is in the RNN class, where we use the hidden state outputs for rnnt decoding. With the proposed Sequential design, there isn't a class that will house the functions to collate and parse the hidden state across multiple layers (eg see deepseech_internal: https://github.com/MyrtleSoftware/deepspeech_internal/blob/98420a20b405389fbebbff5afd0dd71e523f2ba7/src/deepspeech/networks/utils.py#L326)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticking to the standard PyTorch RNN API that includes num_layers is a good idea - it means the classes (implementation) can be used in a standalone manner without having to deal with builders/protobufs (e.g. creating an RNN layer in a notebook).

Keeping the implementation contained in one place may lead to less bugs (single responsibility). What currently happens if dropout = 0.5 is passed to LSTM?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just changed both the RNN and the FullyConnected building phases. Now everything is done inside the RNN and FullyConnected classes. Their forward functions are now a bit longer because it is necessary to check the type of each single layer in a for loop (whether it is a batch_norm, rnn, etc.) in order to run them with the correct input sizes (or to pad the input sequences)

@samgd samgd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments - pushing num_layers into the RNN and FullyConnected layers feels like the best solution.

@julianmack

Copy link
Copy Markdown
Contributor

TODO: this needs to be rebased on master - involving a non-trivial refactor.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants