Improve codebook utlization for FSQ

Hi Flextok authors,

Thanks a lot for the great work and releasing the codebase.
I found that directly training with the FSQ can often lead to underutilized codebook.

I wonder during training if you find any augmentation or normalization techniques necessary to improve codebook usage and convergence?
For instance, I wonder what you set for the regularization/augmentation hyperparameters? 

```
drop_quant_p: float,
corrupt_tokens_p: float,
min_corrupt_tokens_p: Optional[float],
apply_corrupt_tokens_p: float,
```

And does the model require REPA-style regularization during training?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve codebook utlization for FSQ #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improve codebook utlization for FSQ #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions