This repository contains the analysis pipeline used to deconvolve bulk-tissue RNA-seq differential expression results using a single-cell RNA-seq (scRNA-seq) reference dataset, enabling identification of cell-type–specific gene expression signatures underlying bulk transcriptomic changes.
The workflow integrates:
- Bulk RNA-seq DESeq2 results
- A curated scRNA-seq reference from mouse cortex
- Dimensionality reduction and correlation-based analyses
The analysis pipeline implemented in this repository is based on the reference profile–based deconvolution framework described by Marquez-Galera et al. (2022), which leverages publicly available single-cell RNA-seq datasets to interpret cell-type–specific contributions to gene expression signatures derived from bulk-tissue RNA-seq. In this approach, differentially expressed gene (DEG) lists obtained from bulk RNA-seq are projected onto a curated scRNA-seq reference, and linear dimensionality reduction together with correlation-based clustering is used to identify cell-type–specific gene modules embedded within bulk transcriptional signatures.
This strategy was originally applied to dissect sublayer- and cell-type–specific transcriptional changes in the hippocampal CA1 region under physiological and epileptic conditions (Cid et al., 2021). That work demonstrated that bulk RNA-seq signatures reflect a convolution of cell-type–specific expression programs and changes in cellular composition, and showed that reactive microglial and glial gene signatures can be unmasked by integrating bulk RNA-seq DEGs with single-cell reference data, underscoring the importance of cellular heterogeneity in the interpretation of bulk transcriptomic analyses.
The single-cell RNA-seq reference used in this pipeline is derived from the Mouse Whole Cortex and Hippocampus SMART-seq dataset generated by the Allen Institute for Brain Science and described by Yao et al. (2021), which provides a comprehensive transcriptomic taxonomy of neuronal and non-neuronal cell types across the mouse isocortex and hippocampal formation.
The present implementation adapts these published frameworks to cortical bulk RNA-seq data, following the same conceptual and analytical principles while allowing flexible subsetting of the scRNA-seq reference and dynamic visualization of cell-type–specific DEG signatures. Consistent with the original studies, this pipeline is intended for qualitative deconvolution and hypothesis generation rather than quantitative estimation of cell-type proportions.
The analysis is organized into four main steps:
- Identification and filtering of bulk RNA-seq DEGs
- Construction of the scRNA-seq reference object
- Subsetting and preprocessing of the scRNA-seq reference to match bulk tissue composition
- Deconvolution of bulk DEGs using the scRNA-seq reference
Each step is implemented as a standalone, reproducible script.
Purpose:
Identify statistically significant DEGs from bulk RNA-seq and prepare ranked gene lists for downstream integration with scRNA-seq data.
Key operations:
- Load DESeq2 results table
- Remove genes without valid gene symbols
- Filter by adjusted p-value (
padj < 0.1) - Rank genes by log fold change
- Split genes by direction of regulation (control vs experimental)
- Select the top 250 DEGs per condition
Inputs:
S1_Table_DESeq.csv(DESeq2 results)
Outputs:
ctrl_DEGs: top downregulated genesc4_DEGs: top upregulated genes
Purpose:
Load raw single-cell gene expression data and metadata, and construct a Seurat object for downstream analysis.
Key operations:
- Load cell metadata and gene expression matrix
- Convert data into Seurat-compatible format
- Create a Seurat object without initial filtering
- Retain all genes and cells to preserve reference completeness
Inputs:
matrix.csv(gene expression counts)metadata.csv(cell annotations)
Outputs:
sc_data: Seurat object containing the scRNA-seq reference dataset
Purpose:
Restrict the scRNA-seq reference dataset to cortical cell populations that are biologically relevant to the bulk-tissue RNA-seq experiment.
Key operations:
- Inspect region and subclass metadata distributions
- Subset cells based on neocortical region labels
- Remove hippocampal and non-isocortical subclasses
- Set cell-type subclass labels as active identities
- Normalize, scale, and identify variable genes
- Perform PCA, t-SNE, and UMAP
- Dynamically generate cell-type–adaptive color palettes
- Save dimensionality reduction plots
Inputs:
sc_datafrom Step 2
Outputs:
- Filtered and normalized scRNA-seq reference
- PCA, t-SNE, and UMAP plots (
.png) - Extracted plot legends for figure assembly
Purpose:
Interpret bulk RNA-seq DEGs by identifying cell-type–specific gene expression patterns using the scRNA-seq reference.
Key operations:
- Intersect bulk DEGs with scRNA-seq gene universe
- Use equally sized DEG sets for comparability
- Scale scRNA-seq expression for DEG genes only
- Perform PCA using DEG-driven expression
- Identify genes contributing to DEG-associated variance
- Compute gene–gene Pearson correlation matrices
- Perform hierarchical clustering
- Visualize correlation structure using heatmaps
Inputs:
ctrl_DEGs/c4_DEGsfrom Step 1sc_datafrom Step 3
Outputs:
- DEG-driven PCA plots
- Correlation heatmaps (
.pdf) - Cell-type–specific gene signature visualizations
This pipeline was developed and tested using:
- R (≥ 4.2)
- Seurat
- data.table
- ggplot2
- cowplot
- patchwork
- gplots
Install required packages using:
install.packages(c("data.table", "ggplot2", "cowplot", "patchwork", "gplots"))
install.packages("Seurat")