How to obtain the pretraining data?

Thanks for your great jobs! I noticed your paper mentioned "We use a processed subset containing 456K molecules from the ChEMBL database [24] for pretraining." Could you please release your pretraining data or give detailed instructions how to obtain it. Thanks!