Hi Flextok authors,
Thanks a lot for the great work and releasing the codebase.
I found that directly training with the FSQ can often lead to underutilized codebook.
I wonder during training if you find any augmentation or normalization techniques necessary to improve codebook usage and convergence?
For instance, I wonder what you set for the regularization/augmentation hyperparameters?
drop_quant_p: float,
corrupt_tokens_p: float,
min_corrupt_tokens_p: Optional[float],
apply_corrupt_tokens_p: float,
And does the model require REPA-style regularization during training?
Thank you.
Hi Flextok authors,
Thanks a lot for the great work and releasing the codebase.
I found that directly training with the FSQ can often lead to underutilized codebook.
I wonder during training if you find any augmentation or normalization techniques necessary to improve codebook usage and convergence?
For instance, I wonder what you set for the regularization/augmentation hyperparameters?
And does the model require REPA-style regularization during training?
Thank you.