Mismatch Between Paper and Code on Positional Encoding Usage

The paper claims that positional encodings are embedded after encoding the input using a 2-layer Transformer. 
<img width="664" alt="Image" src="https://github.com/user-attachments/assets/baeca328-0eb9-401e-acc3-0384c227acd8" />

However, upon reviewing the code, I couldn't find any line or module where the positional encodings are explicitly added to the input embeddings before or during the Transformer processing.

https://github.com/MinkaiXu/TabDiff/blob/03bdfd0fe06b43fd45951b97fe988d17d0b842c1/tabdiff/modules/main_modules.py#L90-L99

I see that `PositionalEmbedding` is indeed defined in the file, but it is used only inside `MLPDiffusion` class to provide noise level conditioning.
 
There doesn't seem to be any line where positional encodings are added to the input embeddings before passing them into the mlp module. Could you clarify if positional information is handled differently than stated in the paper?

	def forward(self, x_num, x_cat, timesteps):
	e = self.tokenizer(x_num, x_cat)
	decoder_input = e[:, 1:, :] # ignore the first CLS token.
	y = self.encoder(decoder_input)
	pred_y = self.mlp(y.reshape(y.shape[0], -1), timesteps)
	pred_e = self.decoder(pred_y.reshape(*y.shape))
	x_num_pred, x_cat_pred = self.detokenizer(pred_e)
	x_cat_pred = torch.cat(x_cat_pred, dim=-1) if len(x_cat_pred)>0 else torch.zeros_like(x_cat).to(x_num_pred.dtype)

	return x_num_pred, x_cat_pred

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch Between Paper and Code on Positional Encoding Usage #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mismatch Between Paper and Code on Positional Encoding Usage #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions