Hi. I think there's an problem in transformer scaling layer. When I run UNMT, got Exceptionerror in NMT/src/modules/multihead_attention.py line 97. _line 97 : q *= self.scaling line 30 : self.scaling = self.head_dim**-0.5_ I could not find the reason. So I just change my code to _line 97 : q = q / math.sqrt(self.head_dim)_ and it worked.
Hi. I think there's an problem in transformer scaling layer.
When I run UNMT, got Exceptionerror in NMT/src/modules/multihead_attention.py line 97.
line 97 : q = self.scaling
line 30 : self.scaling = self.head_dim*-0.5
I could not find the reason.
So I just change my code to
line 97 : q = q / math.sqrt(self.head_dim)
and it worked.