EasyCM

Snakemake pipeline for cell-type marker gene discovery from scRNA-seq pseudobulk data. Per cell type: paired DESeq2 (~ subtype + sample) vs all other cells, apeglm LFC shrinkage, then fGSEA against Reactome+KEGG. Consumes pre-made pseudobulk matrices from EasyPseudobulk.

All R scripts must run inside the EasyDE micromamba environment (provides DESeq2, apeglm, fGSEA).

Pipeline Overview

Six rules, single {celltype} wildcard. Rules 02–04 run in parallel across all cell types.

01  Validate config + inputs           (once)
02  Prepare pseudobulk matrices        (per celltype)
03  DESeq2 + apeglm LFC shrinkage      (per celltype)
04  fGSEA pathway enrichment           (per celltype, optional)
05  Aggregate results + status plot    (once, after all celltypes)
06  Pipeline summary + DEG bar chart   (once, final)

For each cell type, rule 02 builds a combined count matrix in which every sample appears twice — once as interest (cells of that type) and once as other (all remaining cells). DESeq2 then tests ~ subtype + sample to isolate the cell-type effect while blocking on donor. Positive log2FoldChange = upregulated in the cell type of interest.

Quick Start

# 1. Install
micromamba env create -f installation/EasyCM_install.yml

# 2. Place inputs
#    data/counts/cell_mtx/{CellType}_persample_RNA_counts.tsv
#    data/counts/allbut_mtx/{CellType}_persample_RNA_counts.tsv
#    data/sample_metadata.csv

# 3. Edit config
cp config/config.yaml config/my_config.yaml

# 4. Dry run, then run
micromamba run -n EasyDE snakemake -n --config pipeline_config=config/my_config.yaml
micromamba run -n EasyDE snakemake --profile profiles/local \
    --config pipeline_config=config/my_config.yaml

# 5. Inspect
column -t -s, results/pipeline_summary.csv

Project Layout

EasyCM/
├── Snakefile
├── config/config.yaml                <- edit this
├── profiles/{local,slurm}/config.yaml
├── data/
│   ├── counts/{cell_mtx,allbut_mtx}/ <- pseudobulk TSVs from EasyPseudobulk
│   └── sample_metadata.csv
├── resources/gsea_files/             <- GMT + gene exclusion lists
├── workflow/scripts/
│   ├── 01_validate.R
│   ├── 02_prepare_pseudobulk.R
│   ├── 03_run_deseq.R
│   ├── 04_run_fgsea.R
│   ├── 05_aggregate_results.R
│   ├── 06_pipeline_summary.R
│   └── utils/{io,logging,filter,validation}_utils.R
├── installation/EasyCM_install.yml
└── docs/

Edit	Where
Analysis parameters	`config/config.yaml`
Execution settings	`profiles/*/config.yaml`
Inputs	`data/counts/`, `data/sample_metadata.csv`

Documentation

Guide	Contents
Installation	Environment setup
Configuration	Config reference, input formats
Running	Snakemake invocation, manual step-by-step
Methods	Paired design, DESeq2, apeglm, fGSEA
Output	File tree, column definitions, status taxonomy
Troubleshooting	Common errors, logs

Pipeline Status Values

Rule 05 writes celltype_summary.csv with a status per cell type:

Status	Meaning
`success`	DESeq2 completed, marker genes found
`success_no_significant`	DESeq2 ran, no genes passed FDR
`skipped_no_samples`	Cell type absent or zero samples after filtering
`skipped_min_cells`	Too few samples after donor deduplication
`skipped_preflight`	Too few genes passed filtering
`skipped`	Other skip — see `error_message`
`error`	DESeq2 or upstream crashed
`not_run`	No log or output found

Rule 05 emits status_overview.pdf (status + DEG counts heatmap). Rule 06 emits pipeline_summary.pdf (DEG bar chart, ordered by marker count).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EasyCM

Pipeline Overview

Quick Start

Project Layout

Documentation

Pipeline Status Values

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
docs		docs
installation		installation
profiles		profiles
resources/gsea_files		resources/gsea_files
workflow/scripts		workflow/scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EasyCM

Pipeline Overview

Quick Start

Project Layout

Documentation

Pipeline Status Values

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages