Thank you for your insightful work!
I have a question regarding the sampling process.
In LDM, the image latent produced by the VAE is regularized via a KL loss, so starting from N(0, I) during sampling works as intended.
However, DINO feature maps used as conditioning are not regularized to follow any prior distribution.
During sampling, without a reference image, should we also initialize the DINO features from N(0, I)?
Wouldn’t this lead to a distribution mismatch, since DINO features are not trained to follow a Gaussian prior?
Thank you for your insightful work!
I have a question regarding the sampling process.
In LDM, the image latent produced by the VAE is regularized via a KL loss, so starting from N(0, I) during sampling works as intended.
However, DINO feature maps used as conditioning are not regularized to follow any prior distribution.
During sampling, without a reference image, should we also initialize the DINO features from N(0, I)?
Wouldn’t this lead to a distribution mismatch, since DINO features are not trained to follow a Gaussian prior?