Skip to content

NaegleLab/DANSy_Applications

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DANSy_Applications

Here, we provide example applications of our Domain Architecture Network Syntax (DANSy) to different applications on either the whole human proteome, post-translational modification (PTM) systems, or fusion genes. This results of these applications are summarized in our bioRxiv paper and here we provide the code to produce the results and figures in the manuscript.

How to cite: Please cite our bioRxiv paper

DANSy Overview

Overview of the general workflow

Getting started

For this work, we recommend creating a local copy of this repository by copying the following into your terminal:

git clone https://github.com/NaegleLab/DANSy_Applications

We recommend creating a virtual environment using:

conda create env -f dansy_apps.yml

Activate the environment using conda activate dansy_apps for specific scripts or select the dansy_apps kernel for Jupyter notebooks.

Proteome Reference File

DANSy relies on reference files generated by CoDIAC. We have provided the reference files for the analysis conducted in our manuscript in the data folder. This version of the reference file was generated on March 24th, 2026, and will be the default file used across all applications.

If you wish to generate a new reference file to use for analysis, you will to take the following steps. First download the SwissProt MetaData file from Gencode into your local copy of this repo. Then, modify the whole_proteome_reference.py file by changing the reference file suffix variable to the current date and verify the gencode file name is correct. Activate your dansy environment and run the following code. (Note: This can take up to 2 hours after a fresh install as it will also establish a biomart sqlite database for use in other notebooks.)

conda activate dansy_apps
python scripts/whole_proteome_reference.py
conda deactivate

If you are going to use the new build, you will then have to tell each of the notebooks to use the current build of the reference file by adjusting the dansy.import_reference_files() commands to look for your reference file instead. import_proteome_files(ref_file_dir='data/Current_Human_Proteome',ref_file_suffix=new_fetch_data_suffix).

Additional Datasets

For more specific analysis, the following datasets are recommended to be downloaded from their source.

Analysis Dataset Source/Code
Fusion Gene Analysis ChimerSeq Excel File from ChimerDB
PTM Systems Provided in the Multispecies Reference Files imported from UniProt using the reference proteome fetching script and reference file generating script
Cancer Cell Line Encyclopedia (CCLE) Fusion and CRISPR Screen Data From the DepMap project. We used the 26Q1 files. We specifically downloaded the 1) OmicsFusionFilteredSupplementary.csv 2) Models.csv and 3) CRISPRGeneEffect.csv files.

What is provided and how to run the code.

We have provided several Jupyter notebooks that serve as examples of either DANSy. Below are short summaries of their applications and the results, which are discussed in our manuscript.

For custom uses of DANSy, please visit our DANSy repository, which provides more general use cases and information on how to get started with the DANSy package for new datasets.

Complete human proteome analysis

Analyzing the domain architectures across the human proteome and broadly characterizing the resulting network and information encoded by different versions related to n-gram length. Here, we find limiting n-grams to 10-grams in length will recapitulate network characteristics of the complete proteome and that specific domains such as the protein kinase, zinc finger C2H2, and EGF-like domains are n-grams that frequently lie along the shortest path and are connected to the most other n-gram nodes in the network.

Associated notebooks:

PTM System Analysis

Focused on reversible post-translational modification systems (e.g. phosphorylation, methylation, acetylation) that operate under a reader-writer-eraser paradigm. Characterizing broad properties of how individual components combine in domain architectures and identifying general grammatical rules where eraser domains do not require additional reader domains to modify their activity. Meanwhile, reader domains will more frequently create domain combinations with writer domains, but rarely with eraser domains.

Associated notebooks:

PTM System Evolution

Additional analysis on the phosphorylation systems compares the network characteristics of domains associated with phosphotyrosine (pTyr) and phosphoserine/threonine (pSer/Thr) systems during the evolutionary period from yeast to humans. The pTyr system is evolutionarily younger and rapidly expanded during the transition to metazoans. Our analysis suggests that during this transition, species were sampling several configurations of the network before converging to a similar set of grammatical rules observed in other PTM systems.

Associated notebooks:

Fusion Gene Analysis

Here, we analyze how fusion genes may provide an avenue to explore new grammatical structures of domain architectures. We studied fusion genes reported in TCGA and CCLE and their predicted chimera protein domain architectures. We find that the domain architectures of gene fusions rarely generate novel domain architectures to suggest they largely respect existing domain combination rules of the natural proteome. Using the CCLE CRISPR screen data, we do not observe specific domains to be highly enriched in fusions whose partner genes create dependency effects in the cell line models. However, we did observe that kinase domains were the most frequently involved in gene fusions. Thus, we further explored kinase fusions to understand if the flexibility in generating domain combinations in then natural proteome reflected its tendency to form fusions.

Associated notebooks:

About

Example applications of the DANSy linguistic network analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages