Deconvolution of Bulk RNA-seq DEGs Using scRNA-seq Reference

This repository contains the analysis pipeline used to deconvolve bulk-tissue RNA-seq differential expression results using a single-cell RNA-seq (scRNA-seq) reference dataset, enabling identification of cell-type–specific gene expression signatures underlying bulk transcriptomic changes.

The workflow integrates:

Bulk RNA-seq DESeq2 results
A curated scRNA-seq reference from mouse cortex
Dimensionality reduction and correlation-based analyses

Methodological Background and References

The analysis pipeline implemented in this repository is based on the reference profile–based deconvolution framework described by Marquez-Galera et al. (2022), which leverages publicly available single-cell RNA-seq datasets to interpret cell-type–specific contributions to gene expression signatures derived from bulk-tissue RNA-seq. In this approach, differentially expressed gene (DEG) lists obtained from bulk RNA-seq are projected onto a curated scRNA-seq reference, and linear dimensionality reduction together with correlation-based clustering is used to identify cell-type–specific gene modules embedded within bulk transcriptional signatures.

This strategy was originally applied to dissect sublayer- and cell-type–specific transcriptional changes in the hippocampal CA1 region under physiological and epileptic conditions (Cid et al., 2021). That work demonstrated that bulk RNA-seq signatures reflect a convolution of cell-type–specific expression programs and changes in cellular composition, and showed that reactive microglial and glial gene signatures can be unmasked by integrating bulk RNA-seq DEGs with single-cell reference data, underscoring the importance of cellular heterogeneity in the interpretation of bulk transcriptomic analyses.

The single-cell RNA-seq reference used in this pipeline is derived from the Mouse Whole Cortex and Hippocampus SMART-seq dataset generated by the Allen Institute for Brain Science and described by Yao et al. (2021), which provides a comprehensive transcriptomic taxonomy of neuronal and non-neuronal cell types across the mouse isocortex and hippocampal formation.

The present implementation adapts these published frameworks to cortical bulk RNA-seq data, following the same conceptual and analytical principles while allowing flexible subsetting of the scRNA-seq reference and dynamic visualization of cell-type–specific DEG signatures. Consistent with the original studies, this pipeline is intended for qualitative deconvolution and hypothesis generation rather than quantitative estimation of cell-type proportions.

Overview of the Analysis Pipeline

The analysis is organized into four main steps:

Identification and filtering of bulk RNA-seq DEGs
Construction of the scRNA-seq reference object
Subsetting and preprocessing of the scRNA-seq reference to match bulk tissue composition
Deconvolution of bulk DEGs using the scRNA-seq reference

Each step is implemented as a standalone, reproducible script.

Step 1: Load and Filter Bulk RNA-seq Differentially Expressed Genes (DEGs)

Purpose:
Identify statistically significant DEGs from bulk RNA-seq and prepare ranked gene lists for downstream integration with scRNA-seq data.

Key operations:

Load DESeq2 results table
Remove genes without valid gene symbols
Filter by adjusted p-value (padj < 0.1)
Rank genes by log fold change
Split genes by direction of regulation (control vs experimental)
Select the top 250 DEGs per condition

Inputs:

S1_Table_DESeq.csv (DESeq2 results)

Outputs:

ctrl_DEGs: top downregulated genes
c4_DEGs: top upregulated genes

Step 2: Build the scRNA-seq Reference Dataset

Purpose:
Load raw single-cell gene expression data and metadata, and construct a Seurat object for downstream analysis.

Key operations:

Load cell metadata and gene expression matrix
Convert data into Seurat-compatible format
Create a Seurat object without initial filtering
Retain all genes and cells to preserve reference completeness

Inputs:

matrix.csv (gene expression counts)
metadata.csv (cell annotations)

Outputs:

sc_data: Seurat object containing the scRNA-seq reference dataset

Step 3: Subset scRNA-seq Reference to Match Bulk Tissue Composition

Purpose:
Restrict the scRNA-seq reference dataset to cortical cell populations that are biologically relevant to the bulk-tissue RNA-seq experiment.

Key operations:

Inspect region and subclass metadata distributions
Subset cells based on neocortical region labels
Remove hippocampal and non-isocortical subclasses
Set cell-type subclass labels as active identities
Normalize, scale, and identify variable genes
Perform PCA, t-SNE, and UMAP
Dynamically generate cell-type–adaptive color palettes
Save dimensionality reduction plots

Inputs:

sc_data from Step 2

Outputs:

Filtered and normalized scRNA-seq reference
PCA, t-SNE, and UMAP plots (.png)
Extracted plot legends for figure assembly

Step 4: Deconvolution of Bulk DEGs Using scRNA-seq Reference

Purpose:
Interpret bulk RNA-seq DEGs by identifying cell-type–specific gene expression patterns using the scRNA-seq reference.

Key operations:

Intersect bulk DEGs with scRNA-seq gene universe
Use equally sized DEG sets for comparability
Scale scRNA-seq expression for DEG genes only
Perform PCA using DEG-driven expression
Identify genes contributing to DEG-associated variance
Compute gene–gene Pearson correlation matrices
Perform hierarchical clustering
Visualize correlation structure using heatmaps

Inputs:

ctrl_DEGs / c4_DEGs from Step 1
sc_data from Step 3

Outputs:

DEG-driven PCA plots
Correlation heatmaps (.pdf)
Cell-type–specific gene signature visualizations

Software and R Packages

This pipeline was developed and tested using:

R (≥ 4.2)
Seurat
data.table
ggplot2
cowplot
patchwork
gplots

Install required packages using:

install.packages(c("data.table", "ggplot2", "cowplot", "patchwork", "gplots"))
install.packages("Seurat")

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
behavior_code		behavior_code
ephys_code		ephys_code
README.md		README.md
Step1.R		Step1.R
Step2.R		Step2.R
Step3.R		Step3.R
Step4.R		Step4.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deconvolution of Bulk RNA-seq DEGs Using scRNA-seq Reference

Methodological Background and References

Overview of the Analysis Pipeline

Step 1: Load and Filter Bulk RNA-seq Differentially Expressed Genes (DEGs)

Step 2: Build the scRNA-seq Reference Dataset

Step 3: Subset scRNA-seq Reference to Match Bulk Tissue Composition

Step 4: Deconvolution of Bulk DEGs Using scRNA-seq Reference

Software and R Packages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deconvolution of Bulk RNA-seq DEGs Using scRNA-seq Reference

Methodological Background and References

Overview of the Analysis Pipeline

Step 1: Load and Filter Bulk RNA-seq Differentially Expressed Genes (DEGs)

Step 2: Build the scRNA-seq Reference Dataset

Step 3: Subset scRNA-seq Reference to Match Bulk Tissue Composition

Step 4: Deconvolution of Bulk DEGs Using scRNA-seq Reference

Software and R Packages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages