Skip to content

tsa87/tacogfn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

178 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TacoGFN: Target Conditioned GFlowNet for Structure-based Drug Design

Update

2025-07-15 We have made architecture improvements to TacoGFN - extending it for 3D generation / and ensuring molecular synthzability. We recommend you to check our new repository at CGFlow (ICML 25') repository.

Accepted in TMLR (Transaction on Machine Learning Research) and spotlighted in NeurIPS GenBio Workshop 2023 [arxiv].

Official Github for TacoGFN: Target Conditioned GFlowNet for Structure-based Drug Design by Tony Shen, Seonghwan Seo, Grayson Lee, Mohit Pandey, Jason Smith, Artem Cherkasov, Woo Youn Kim and Martin Ester.

image

We frame structure-based drug design as a Reinforcement Learning task, where the goal is to search the wider chemical space for molecules with desirable properties as opposed to fitting a training data distribution. We propose TacoGFN, a Generative Flow Network conditioned on protein pocket structure, using binding affinity, drug-likeliness and synthesizability measures as our reward.

Empirically, our method outperforms state-of-art methods on the CrossDocked2020 benchmark for every molecular property (Vina score, QED, SA), while improving the generation time by multiple orders of magnitude. TACOGFN achieves −8.82 in median docking score and 52.63% in Novel Hit Rate.

If you have any problems or need help with the code, please add an issue or contact tonyzshen@gmail.com.

Setup

Before running any scripts, please download the necessary package:

https://figshare.com/s/2738ce20d82463227113

This package includes:

  • trained model weights
  • pre-computed pharmacophores in lmdb
  • saved pocket-graphs in lmdb
  • misc files (data splitting, avg vina score, pocket centroid, generated molecules)

Set current directory into pythonpath.

export PYTHONPATH=.

Please also setup up Conda Environment and install neccessary dependencies.

conda env create -f environment.yml
conda activate tacogfn
cd src/molvoxel
pip install -e .

Training TacoGFN

If you wish to re-train the model we provide the HPS for 3 model variants presented in our paper:

  • TacoGFN: hps/crossdocked_mo_256.json
  • TacoGFN (ZINCDock-15M): hps/zinc_mo_256.json
  • TacoGFN no pocket conditioning: hps/zinc_mo_256_noph.json

Note: TacoGFN_ranked is the same model as TacoGFN - we just generate 500 instead of 100 molecules at inference time, and rank by predicted docking score. The inference script takes care of that.

python3 src/tacogfn/tasks/pharmaco_frag.py --hps_path "$HPS_PATH"

Generating molecules and computing metrics

If you just wish to generate molecules and evaluate them, we also provide trained models files. The following scripts re-generates molecules and computes metrics on them (Docking needs to be computed seperatly).

bash scripts/generate_and_evaluate.sh

Note if you have re-trained a model, you can specify your model path to generate and evaluate the performance. You can set $NUM_PER_POCKET to 100 for normal runs. If you'd like to run TacoGFN_ranked, please change $NUM_PER_POCKET to 500.

python3 src/tasks/generate_molecules.py \
        --model_path "$MODEL_PATH" \
        --num_per_pocket $NUM_PER_POCKET \
        --comment "${COMMENT}"

python3 src/tasks/evaluate_molecules.py \
    --molecules_path "misc/generated_molecules/1.0_1.0_${NUM_PER_POCKET}_${COMMENT}.json"

Aggregating and displaying metrics

To display the metrics, we provide the generated molecules from our model and baseline models in misc/evaluations. The following scripts computes the metrics used in Table 1 and Table 2:

bash scripts/see_all_results.sh

Note if you've generated molecules from a trained model, please compute docking scores using QVina 2.1 first. Then you could call the following:

python3 src/tasks/aggergate_evals.py --eval_path "$EVAL_FILE"

Citations

@article{
        shen2024tacogfn,
        title={Taco{GFN}: Target-conditioned {GF}lowNet for Structure-based Drug Design},
        author={Tony Shen and Seonghwan Seo and Grayson Lee and Mohit Pandey and Jason R Smith and Artem Cherkasov and Woo Youn Kim and Martin Ester},
        journal={Transactions on Machine Learning Research},
        issn={2835-8856},
        year={2024},
        url={https://openreview.net/forum?id=N8cPv95zOU},
}

This project modifies GFlowNet library for graph and molecular data.

About

[TMLR 24'] TacoGFN: Target Conditioned GFlowNet for Structure-based Drug Design

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors