The paper claims that positional encodings are embedded after encoding the input using a 2-layer Transformer.

However, upon reviewing the code, I couldn't find any line or module where the positional encodings are explicitly added to the input embeddings before or during the Transformer processing.
|
def forward(self, x_num, x_cat, timesteps): |
|
e = self.tokenizer(x_num, x_cat) |
|
decoder_input = e[:, 1:, :] # ignore the first CLS token. |
|
y = self.encoder(decoder_input) |
|
pred_y = self.mlp(y.reshape(y.shape[0], -1), timesteps) |
|
pred_e = self.decoder(pred_y.reshape(*y.shape)) |
|
x_num_pred, x_cat_pred = self.detokenizer(pred_e) |
|
x_cat_pred = torch.cat(x_cat_pred, dim=-1) if len(x_cat_pred)>0 else torch.zeros_like(x_cat).to(x_num_pred.dtype) |
|
|
|
return x_num_pred, x_cat_pred |
I see that PositionalEmbedding is indeed defined in the file, but it is used only inside MLPDiffusion class to provide noise level conditioning.
There doesn't seem to be any line where positional encodings are added to the input embeddings before passing them into the mlp module. Could you clarify if positional information is handled differently than stated in the paper?
The paper claims that positional encodings are embedded after encoding the input using a 2-layer Transformer.

However, upon reviewing the code, I couldn't find any line or module where the positional encodings are explicitly added to the input embeddings before or during the Transformer processing.
TabDiff/tabdiff/modules/main_modules.py
Lines 90 to 99 in 03bdfd0
I see that
PositionalEmbeddingis indeed defined in the file, but it is used only insideMLPDiffusionclass to provide noise level conditioning.There doesn't seem to be any line where positional encodings are added to the input embeddings before passing them into the mlp module. Could you clarify if positional information is handled differently than stated in the paper?