This repository is for the integrative models of the HDAC1 corepressor complexes - NuRD, Sin3A, coREST, based on data from chemical crosslinking, cryo-EM maps, X-ray crystallography, homology modeling using Modeller, and structure prediction from Alphafold. It contains input data, scripts for data preprocessing, modeling and results including bead models and localization probability density maps. The modeling was performed using IMP (Integrative Modeling Platform).
These integrative structures will be deposited in the PDB-Dev database with accession codes TBD
- input : contains the subdirectories for the input data used for modeling all the corepressor complexes.
- scripts : contains all the scripts used for pre-processing, modeling and analysis of the models.
- results : contains the models and the localization probability densities of the top cluster of the corepressor complexes.
- test : scripts for testing the sampling.
-
In case of structures predicted by AlphaFold2, only regions of high confidence (>70 pLDDT and <5 PAE) were used. Following scripts extracts regions of high confidence:
python get_high_confidence_region_from_AF2.py af2_struct.cif af2_struct.json -
For the presence of multiple paralogs of a protein, XLs from all paralogs were mapped to the paralog with the highest number of XLs.
Following script generates the mapped XLs:python paralog_alignment.py -
EM maps, where available, were converted to Gaussian Mixture Models (GMM) which were used as input for the modeling. Following script generates GMMs for the input EM map:
./create_gmm.sh EM_map.mrc thresholdThreshold can be obtained from the Validation section on EMDB for the specific EM map. The minimum number of Gaussians which yield a cross-correlation of >0.95 with the original EM map is used.
To run the sampling, run modeling scripts like this
./run_modeling.sh
To run the analysis, run the end-t-end analysis script like
python end_to_end_analysis.py
The above script does the following -
Good-scoring models were selected using pmi_analysis (Please refer to pmi_analysis tutorial for more detailed explaination) along with our variable_filter_v1.py script.
Following are the scripts used:
-
run_analysis_trajectories.py -
variable_filter_v1.pyon the major cluster if the number of models exceeds 30000. -
The selected good-scoring models were then extracted using
run_extract_models.py.:
A density_{}.txt (Nurd, corest, Sin3a) file was created. This file contains the details of the domains to be split for visualizing the localisation probability densities. Finally, sampling exhaustiveness tests were performed using imp-sampcon.
-
Compute crosslink violations using
get_xl_viol_validation_set.pyscript. -
Create contact maps for the component proteins in the complex using
contact_maps_all_pairs_surface.pyscript. The proteins to be considered are specified as listsprotein1andprotein2. -
Obtain domainwise precision using PrISM.
For the simulations, the results directory consists of a subdirectory for each complex comprising of -
contact_maps: Directory containing of the contact map for the component proteins of the complex.models_and_densities: Directory containing sampcon output for the largest cluster.prism: Directory containing the PrISM output.xl_violations: Directory containing the logs for crosslink violations.
Author(s): Jules Nde*, Kartik Majila*, Rosalyn C. Zimmermann, Cassandra Kempf, Ying Zhang, Joseph Cesare, Janet L. Thornton, Jerry L. Workman, Laurence Florens, Shruthi Viswanath, Michael P. Washburn
Date: September 12th, 2023
License: CC BY-SA 4.0
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License.
Last known good IMP version: Not tested
Testable: Yes
Parallelizeable: Yes
Publications:
