Could you explain the reason why you use the pe_scaler and ple_scaler in the forward pass of the Encoder-class in kingcrab.py?
In particular, why do you choose the form
pe_scaler = 2**(1-self.pos_scaler)**2
and
ple_scaler = 2**(1-self.pos_scaler_log)**2?
I don't really understand why one needs these two scalers (and also self.emb_scaler) in the first place and why you chose the above exponential forms for them.
Could you explain the reason why you use the
pe_scalerandple_scalerin the forward pass of theEncoder-class inkingcrab.py?In particular, why do you choose the form
pe_scaler = 2**(1-self.pos_scaler)**2and
ple_scaler = 2**(1-self.pos_scaler_log)**2?I don't really understand why one needs these two scalers (and also
self.emb_scaler) in the first place and why you chose the above exponential forms for them.