GMC-MPNN for BBBP Datasets

This repository provides code for hyperparameter optimization, feature extraction, training, and evaluation of the GMC-MPNN model, specifically designed for predicting blood-brain barrier permeability (BBBP).

Installation

Install Chemprop following the official guide: 🔗 https://chemprop.readthedocs.io/en/latest/installation.html

📚 Additional Dependencies

pip install pandas numpy scipy scikit-learn biopandas rdkit

Depending on your system, rdkit may require Conda:

conda install -c rdkit rdkit

We recommend setting up a clean environment using conda or virtualenv, and ensuring all dependencies are satisfied as specified in the Chemprop documentation.

1. Hyperparameter Optimization

To search for optimal hyperparameters using Chemprop's CLI:

chemprop hpopt \
    --data-path <path_to_dataset.csv> \
    --task-type <classification|regression> \
    --search-parameter-keywords all \
    --split-type SCAFFOLD_BALANCED \
    --hpopt-save-dir <path_to_output_dir> \
    --raytune-num-gpus 1

2. GGL Feature Extraction

To compute GGL-based ligand features:

python <script_path> -k <kernel_index> -c <cutoff> -f <csv_file> -dd <data_folder> -fd <feature_folder>

Example:

python get_ggl_ligand_features.py -k 1551 -c 12.0 -f dataset.csv -dd ./mol2_files -fd ./features

On SLURM:

Use the provided SLURM job script:

sbatch extract_ggl_features.sh

3. Model Training

To train models with multiple seeds (0-4) and automatically average test results:

python train.py \
    --training_script <training_script> \
    --input_path <path_to_dataset.csv> \
    --features_folder <path_to_features> \
    --results_path <path_to_results> \
    --seeds 0 1 2 3 4 \
    --target_columns <target_column_name>

Example for B3DB_cls:

python train.py \
    --training_script train_b3db_cls.py \
    --input_path /path/to/B3DB_cls.csv \
    --features_folder /path/to/features/B3DB_cls \
    --results_path /path/to/results/B3DB_cls/multi_seed \
    --seeds 0 1 2 3 4 \
    --target_columns labels \
    --batch_size 32 \
    --max_epochs 100 \
    --split_type SCAFFOLD_BALANCED

The script will train models for each seed and automatically calculate averaged test results across seeds.

Parallel Training with SLURM

To submit parallel training jobs:

sbatch train.sh

Update train.sh with your dataset-specific paths and configuration before submitting.

4. Reproducing Results

To reproduce results by training only the best kernel for each seed:

python test.py \
    --dataset <dataset_name> \
    --input_path <path_to_dataset.csv> \
    --features_folder <path_to_features> \
    --results_path <path_to_results>

Example for B3DB_cls:

python test.py \
    --dataset B3DB_cls \
    --input_path /path/to/B3DB_cls.csv \
    --features_folder /path/to/features/B3DB_cls \
    --results_path /path/to/results/B3DB_cls/test

The script automatically uses the best kernel for each seed

5. Data Access

We provide the following for reproducibility and testing:

✅ All datasets
✅ GGL feature files (.npz)

📥 Access via OneDrive
🔗 http://bit.ly/4558Ovg

Contact

For questions or support, please contact: 📧 ducnguyen@utk.edu

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
chemprop		chemprop
utils		utils
README.md		README.md
test.py		test.py
train.py		train.py
train.sh		train.sh
train_b3db_cls.py		train_b3db_cls.py
train_b3db_regression.py		train_b3db_regression.py
train_bbbp.py		train_bbbp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GMC-MPNN for BBBP Datasets

Installation

1. Hyperparameter Optimization

2. GGL Feature Extraction

On SLURM:

3. Model Training

Parallel Training with SLURM

4. Reproducing Results

5. Data Access

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GMC-MPNN for BBBP Datasets

Installation

1. Hyperparameter Optimization

2. GGL Feature Extraction

On SLURM:

3. Model Training

Parallel Training with SLURM

4. Reproducing Results

5. Data Access

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages