Multi-complex Integrative Structure Determination of the HDAC1/2 Interactome

This repository is for the integrative models of the HDAC1 corepressor complexes - NuRD, Sin3A, coREST, based on data from chemical crosslinking, cryo-EM maps, X-ray crystallography, homology modeling using Modeller, and structure prediction from Alphafold. It contains input data, scripts for data preprocessing, modeling and results including bead models and localization probability density maps. The modeling was performed using IMP (Integrative Modeling Platform).

These integrative structures will be deposited in the PDB-Dev database with accession codes TBD

Directory structure

input : contains the subdirectories for the input data used for modeling all the corepressor complexes.
scripts : contains all the scripts used for pre-processing, modeling and analysis of the models.
results : contains the models and the localization probability densities of the top cluster of the corepressor complexes.
test : scripts for testing the sampling.

Protocol

Preprocessing

In case of structures predicted by AlphaFold2, only regions of high confidence (>70 pLDDT and <5 PAE) were used. Following scripts extracts regions of high confidence:
```
python get_high_confidence_region_from_AF2.py af2_struct.cif af2_struct.json
```
For the presence of multiple paralogs of a protein, XLs from all paralogs were mapped to the paralog with the highest number of XLs.
Following script generates the mapped XLs:
```
python paralog_alignment.py
```
EM maps, where available, were converted to Gaussian Mixture Models (GMM) which were used as input for the modeling. Following script generates GMMs for the input EM map:
```
./create_gmm.sh EM_map.mrc threshold
```
Threshold can be obtained from the Validation section on EMDB for the specific EM map. The minimum number of Gaussians which yield a cross-correlation of >0.95 with the original EM map is used.

Sampling

To run the sampling, run modeling scripts like this

./run_modeling.sh

Analysis

To run the analysis, run the end-t-end analysis script like

python end_to_end_analysis.py

The above script does the following -

1. Getting the good-scoring models

Good-scoring models were selected using pmi_analysis (Please refer to pmi_analysis tutorial for more detailed explaination) along with our variable_filter_v1.py script.

Following are the scripts used:

run_analysis_trajectories.py
variable_filter_v1.py on the major cluster if the number of models exceeds 30000.
The selected good-scoring models were then extracted using run_extract_models.py.:

2. Running the sampling exhaustiveness tests (Sampcon)

A density_{}.txt (Nurd, corest, Sin3a) file was created. This file contains the details of the domains to be split for visualizing the localisation probability densities. Finally, sampling exhaustiveness tests were performed using imp-sampcon.

3. Analysing the major cluster

Compute crosslink violations using get_xl_viol_validation_set.py script.
Create contact maps for the component proteins in the complex using contact_maps_all_pairs_surface.py script. The proteins to be considered are specified as lists protein1 and protein2.
Obtain domainwise precision using PrISM.

Results

For the simulations, the results directory consists of a subdirectory for each complex comprising of -

contact_maps : Directory containing of the contact map for the component proteins of the complex.
models_and_densities : Directory containing sampcon output for the largest cluster.
prism : Directory containing the PrISM output.
xl_violations : Directory containing the logs for crosslink violations.

Information

Author(s): Jules Nde*, Kartik Majila*, Rosalyn C. Zimmermann, Cassandra Kempf, Ying Zhang, Joseph Cesare, Janet L. Thornton, Jerry L. Workman, Laurence Florens, Shruthi Viswanath, Michael P. Washburn
Date: September 12th, 2023
License: CC BY-SA 4.0 This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
Last known good IMP version: Not tested
Testable: Yes
Parallelizeable: Yes
Publications:

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
input		input
results		results
scripts		scripts
test		test
README.md		README.md
main_figure.png		main_figure.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-complex Integrative Structure Determination of the HDAC1/2 Interactome

Directory structure

Protocol

Preprocessing

Sampling

Analysis

1. Getting the good-scoring models

2. Running the sampling exhaustiveness tests (Sampcon)

3. Analysing the major cluster

Results

Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-complex Integrative Structure Determination of the HDAC1/2 Interactome

Directory structure

Protocol

Preprocessing

Sampling

Analysis

1. Getting the good-scoring models

2. Running the sampling exhaustiveness tests (Sampcon)

3. Analysing the major cluster

Results

Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages