Batch norm by giuseppeCoccia · Pull Request #11 · MyrtleSoftware/myrtlespeech

giuseppeCoccia · 2019-11-19T12:37:22Z

Added batch normalization to rnn, convolution and fully connected layers + Fixed all tests.
The way all layers are built seems to be changed a lot but it is simply because it is now necessary to build separately all layers in order to insert batch norm between them.
Example: instead of calling torch.nn.LSTM once and passing the number of layers as an argument, it is now necessary to call this function num_layers time.

julianmack

Hi @giuseppeCoccia - just a single comment here on protos (I was browsing). Will look at this in more detail next week if sam hasn't by then

julianmack · 2019-11-29T13:12:44Z

+        if self.batch_norm is not None:
+            # Collapses input of dim T*N*H to (T*N)*H and gives it to a batch


What about if the input to the rnn has dim=4?

Also - these comments are confusing in the batch_first=True case

if the input has dim=4 then it would be necessary to use a batchnorm2d or change/collapse the input dimensions. I assumed that the input was always with dim=3 because it was explicitly said in the arg description of the forward function, but maybe it was a wrong assumption. Have you encountered situations in which the input dim is 4?

Regarding the comments, I have slightly changed them to clarify that it doesn't matter if batch_first is True or False because the first two dimensions are both collapsed into a single one. Let me know if they are still confusing.

Ok you're absolutely right - dim=4 shouldn't work according to the docstring so the api shouldn't support this - my mistake.

The batch_first comments are v clear now!

julianmack

Hi @giuseppeCoccia - I have made quite a large high-level request for changes here. I think we should discuss with @samgd first because I may be wrong!

julianmack · 2019-12-02T12:47:53Z

-        FullyConnected(
-          (fully_connected): Sequential(
-            (0): Linear(in_features=32, out_features=64, bias=True)
-            (1): ReLU()
-            (2): Linear(in_features=64, out_features=64, bias=True)
-            (3): ReLU()
-            (4): Linear(in_features=64, out_features=16, bias=True)


We have delegated quite a lot of responsibility to the builders instead of our custom torch.nn.Modules here. In my opinion building the multiple layers should take place in the modules themselves meaning the builders change very little (apart from passing the batch_norm bool). So for example the new FullyConnected model def would look something like:

FullyConnected( (fully_connected): Sequential( (0): BatchNorm (1): Linear(in_features=32, out_features=64, bias=True) (2): ReLU() etc

(i.e. more similar to the original implementation).

This is quite a big change so it may be best to check with @samgd first. In any case, my reasoning is as follows:

With the proposed design, we don't have full control over the FullyConnected output as Sequential is the top level module (i.e. we have Sequential( (0): FullyConnected(...). Requiring all modules to be sequential may make it harder to add features in future (e.g. parallel filters).
A concrete example of this is in the RNN class, where we use the hidden state outputs for rnnt decoding. With the proposed Sequential design, there isn't a class that will house the functions to collate and parse the hidden state across multiple layers (eg see deepseech_internal: https://github.com/MyrtleSoftware/deepspeech_internal/blob/98420a20b405389fbebbff5afd0dd71e523f2ba7/src/deepspeech/networks/utils.py#L326)

Sticking to the standard PyTorch RNN API that includes num_layers is a good idea - it means the classes (implementation) can be used in a standalone manner without having to deal with builders/protobufs (e.g. creating an RNN layer in a notebook).

Keeping the implementation contained in one place may lead to less bugs (single responsibility). What currently happens if dropout = 0.5 is passed to LSTM?

I have just changed both the RNN and the FullyConnected building phases. Now everything is done inside the RNN and FullyConnected classes. Their forward functions are now a bit longer because it is necessary to check the type of each single layer in a for loop (whether it is a batch_norm, rnn, etc.) in order to run them with the correct input sizes (or to pad the input sequences)

samgd

See comments - pushing num_layers into the RNN and FullyConnected layers feels like the best solution.

julianmack · 2020-01-31T10:57:33Z

TODO: this needs to be rebased on master - involving a non-trivial refactor.

giuseppeCoccia added 5 commits November 18, 2019 16:50

Added batch norm to rnn layers

b094e65

Added batch norm to fully connected layers

fc76b98

Added batch norm to convolution layers

f6c8062

Fix all tests, after having integrating batch norm

d52cda5

Fix line length error

2f58d3e

giuseppeCoccia requested a review from samgd November 19, 2019 12:37

julianmack reviewed Nov 29, 2019

View reviewed changes

Comment thread src/myrtlespeech/protos/deep_speech_2.proto Outdated

julianmack reviewed Nov 29, 2019

View reviewed changes

giuseppeCoccia added 2 commits December 2, 2019 10:41

change id number of batch_norm in proto file

9d4bd49

Change comment for batch norm in rnn layers

22d9932

julianmack reviewed Dec 2, 2019

View reviewed changes

samgd suggested changes Dec 2, 2019

View reviewed changes

giuseppeCoccia added 2 commits December 5, 2019 10:07

Push RNN building phase inside RNN class

14e72dc

Push FullyConnected building phase inside FullyConnected class

3800762

julianmack added the blocked label Jan 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch norm#11

Batch norm#11
giuseppeCoccia wants to merge 9 commits into
masterfrom
batch_norm

giuseppeCoccia commented Nov 19, 2019

Uh oh!

julianmack left a comment

Uh oh!

Uh oh!

julianmack Nov 29, 2019

Uh oh!

giuseppeCoccia Dec 2, 2019

Uh oh!

julianmack Dec 2, 2019

Uh oh!

julianmack left a comment

Uh oh!

julianmack Dec 2, 2019

Uh oh!

samgd Dec 2, 2019

Uh oh!

giuseppeCoccia Dec 5, 2019

Uh oh!

samgd left a comment

Uh oh!

julianmack commented Jan 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if self.batch_norm is not None:
		# Collapses input of dim TNH to (TN)H and gives it to a batch

Uh oh!

Conversation

giuseppeCoccia commented Nov 19, 2019

Uh oh!

julianmack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

julianmack Nov 29, 2019

Choose a reason for hiding this comment

Uh oh!

giuseppeCoccia Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

julianmack Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

julianmack left a comment

Choose a reason for hiding this comment

Uh oh!

julianmack Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

samgd Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

giuseppeCoccia Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

samgd left a comment

Choose a reason for hiding this comment

Uh oh!

julianmack commented Jan 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants