SpliCeAT: Integrated pipeline for detection and quantification of aberrant transcripts with novel splicing events

This repository contains the following Snakemake pipelines and scripts, to be run in this order:

Preparatory Step (00_get_ref) ¹
Differential splicing detection (01_ds_detection)
Generation of augmented transcriptome (02_augment_transcriptome)
Differential expression analysis (03_de_analysis)

Pipeline Structure

SpliCeAT/
├── config/
├── workflow/
│   ├── common_rules/
│   ├── envs/
│   └── modules/
│       ├── Snakefile
│       ├── 00_get_ref/
│       │   ├── workflow
│       │   │   ├── Snakefile
│       │   │   ├── rules/
│       │   │   └── scripts/
│       │   └── logs
│       ├── 01_ds_detection/
│       │   ├── workflow/
│       │   │   ├── Snakefile
│       │   │   ├── rules/
│       │   │   └── scripts/
│       │   └── logs/
│       ├── 02_augment_transcriptome/
│       │   ├── workflow/
│       │   │   ├── Snakefile
│       │   │   ├── rules/
│       │   │   └── scripts/
│       │   └── logs/
│       └── 03_de_analysis/
│           ├── workflow/
│           │   ├── Snakefile
│           │   ├── rules/
│           │   └── scripts/
│           └── logs/
└── results/
    ├── samples.tsv
    ├── r0_get_ref/
    ├── r1_ds_detection/
    ├── r2_augment_transcriptome/
    └── r3_de_analysis/

Each module can be run seperately from its specific folder or from the master Snakefile (improvements pending) within the modules folder. Module specific snakefiles are designed to run from outside the corresponding workflow directories to avoid path conflicts.

What you need before starting:

FASTQ sample files of 2 conditions (control & treatment), preprocessed ²
BAM files aligned to Ensembl references using STAR, indexed using samtools ³

Start here:

Download repo

git clone -b restructure https://github.com/meg-hz/SpliCeAT.git

Experiment Design File

Place your design.tsv in the config directory
Each sample must specify the paired fastq files
If alignment has been done prior to running the pipeline (STAR activate: False), then an additional column (bam_file) must be present containing paths of the resultant alignment files ⁴
The pipeline supports pairwise comparison, so the group column should specify two groups (e.g., control and treatment). An example experiment design file is in config/design.tsv

sample_name	group	fq1	fq2	bam_file
CTX_104	treated	/path_to_fq/CTX_104_1.fq.gz	/path_to_fq/CTX_104_2.fq.gz	/path_to_bam/CTX_104.sortedByCoord.bam
CTX_108	treated	/path_to_fq/CTX_108_1.fq.gz	/path_to_fq/CTX_108_2.fq.gz	/path_to_bam/CTX_108.sortedByCoord.bam
CTX_120	control	/path_to_fq/CTX_120_1.fq.gz	/path_to_fq/CTX_120_2.fq.gz	/path_to_bam/CTX_120.sortedByCoord.bam
CTX_125	control	/path_to_fq/CTX_125_1.fq.gz	/path_to_fq/CTX_125_2.fq.gz	/path_to_bam/CTX_125.sortedByCoord.bam
CTX_128	treated	/path_to_fq/CTX_128_1.fq.gz	/path_to_fq/CTX_128_2.fq.gz	/path_to_bam/CTX_128.sortedByCoord.bam
CTX_147	control	/path_to_fq/CTX_147_1.fq.gz	/path_to_fq/CTX_147_2.fq.gz	/path_to_bam/CTX_147.sortedByCoord.bam
CTX_148	control	/path_to_fq/CTX_148_1.fq.gz	/path_to_fq/CTX_148_2.fq.gz	/path_to_bam/CTX_148.sortedByCoord.bam
CTX_154	treated	/path_to_fq/CTX_154_1.fq.gz	/path_to_fq/CTX_154_2.fq.gz	/path_to_bam/CTX_154.sortedByCoord.bam

Configuration File

An example configuration file is provided in the config/config.yaml.
Each of the underlying tools can be skipped by specifying activate: False.
The absolute path of the pipeline must be specified by the user
In order to run majiq you must provide the location of a valid majiq license file
The default mode of tool consensus is set to check for event level overlapping. Set event_overlap: False to obtain gene level overlap between splicing event detection tools.
The absolute path of the results directory can be specified by the user, if no input is provided, results will be stored in the pipeline directory.

Run Snakemake Pipeline

The workflow is configured to use conda, which should download and configure all of the needed environments. If you are using Snakemake > 4.8.0, then you can run the workflow in a combination of conda and conainers as described in Ad-hoc combination of Conda package management with containers

Execute a Snakemake dry run with

snakemake -np

to check the parameters of the run. Once ready to run, execute

snakemake --use-conda --cores 24

This step isn't required if user is providing their own reference files. This would require modifications to be made (at the user's discretion) to the Snakemake rules that require the corresponding references. ↩
Majiq does not tolerate ambiguous bases(N) in the BAM or Fasta files. Appropriate trimming and filtering might need to be performed using Trimmomatic or Fastp to remove reads containing these bases prior to running this pipeline. ↩
The required BAM files can be generated by the preparatory module by setting STAR activate: to True and running the preparatory step. ↩
The pipelines expect RNA-seq alignments/BAM files to be labelled as sample_Aligned.sortedByCoord.out.bam (STAR output format). Nevertheless, modifications can be made (at the user's discretion) in the Snakemake rules to account for alignments generated by other tools (e.g. HISAT2). Also note that the corresponding indexed file (sample_Aligned.sortedByCoord.out.bam.bai) must be present in the same folder as the bam files. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 464 Commits
config		config
images		images
workflow		workflow
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpliCeAT: Integrated pipeline for detection and quantification of aberrant transcripts with novel splicing events

Pipeline Structure

What you need before starting:

Start here:

Download repo

Experiment Design File

Configuration File

Run Snakemake Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpliCeAT: Integrated pipeline for detection and quantification of aberrant transcripts with novel splicing events

Pipeline Structure

What you need before starting:

Start here:

Download repo

Experiment Design File

Configuration File

Run Snakemake Pipeline

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages