Skip to content

Mismatch Between Paper and Code on Positional Encoding Usage #15

@prateekgargX

Description

@prateekgargX

The paper claims that positional encodings are embedded after encoding the input using a 2-layer Transformer.
Image

However, upon reviewing the code, I couldn't find any line or module where the positional encodings are explicitly added to the input embeddings before or during the Transformer processing.

def forward(self, x_num, x_cat, timesteps):
e = self.tokenizer(x_num, x_cat)
decoder_input = e[:, 1:, :] # ignore the first CLS token.
y = self.encoder(decoder_input)
pred_y = self.mlp(y.reshape(y.shape[0], -1), timesteps)
pred_e = self.decoder(pred_y.reshape(*y.shape))
x_num_pred, x_cat_pred = self.detokenizer(pred_e)
x_cat_pred = torch.cat(x_cat_pred, dim=-1) if len(x_cat_pred)>0 else torch.zeros_like(x_cat).to(x_num_pred.dtype)
return x_num_pred, x_cat_pred

I see that PositionalEmbedding is indeed defined in the file, but it is used only inside MLPDiffusion class to provide noise level conditioning.

There doesn't seem to be any line where positional encodings are added to the input embeddings before passing them into the mlp module. Could you clarify if positional information is handled differently than stated in the paper?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions