Skip to content

Cruz-Martin-Lab/PV-mC4_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deconvolution of Bulk RNA-seq DEGs Using scRNA-seq Reference

This repository contains the analysis pipeline used to deconvolve bulk-tissue RNA-seq differential expression results using a single-cell RNA-seq (scRNA-seq) reference dataset, enabling identification of cell-type–specific gene expression signatures underlying bulk transcriptomic changes.

The workflow integrates:

  • Bulk RNA-seq DESeq2 results
  • A curated scRNA-seq reference from mouse cortex
  • Dimensionality reduction and correlation-based analyses

Methodological Background and References

The analysis pipeline implemented in this repository is based on the reference profile–based deconvolution framework described by Marquez-Galera et al. (2022), which leverages publicly available single-cell RNA-seq datasets to interpret cell-type–specific contributions to gene expression signatures derived from bulk-tissue RNA-seq. In this approach, differentially expressed gene (DEG) lists obtained from bulk RNA-seq are projected onto a curated scRNA-seq reference, and linear dimensionality reduction together with correlation-based clustering is used to identify cell-type–specific gene modules embedded within bulk transcriptional signatures.

This strategy was originally applied to dissect sublayer- and cell-type–specific transcriptional changes in the hippocampal CA1 region under physiological and epileptic conditions (Cid et al., 2021). That work demonstrated that bulk RNA-seq signatures reflect a convolution of cell-type–specific expression programs and changes in cellular composition, and showed that reactive microglial and glial gene signatures can be unmasked by integrating bulk RNA-seq DEGs with single-cell reference data, underscoring the importance of cellular heterogeneity in the interpretation of bulk transcriptomic analyses.

The single-cell RNA-seq reference used in this pipeline is derived from the Mouse Whole Cortex and Hippocampus SMART-seq dataset generated by the Allen Institute for Brain Science and described by Yao et al. (2021), which provides a comprehensive transcriptomic taxonomy of neuronal and non-neuronal cell types across the mouse isocortex and hippocampal formation.

The present implementation adapts these published frameworks to cortical bulk RNA-seq data, following the same conceptual and analytical principles while allowing flexible subsetting of the scRNA-seq reference and dynamic visualization of cell-type–specific DEG signatures. Consistent with the original studies, this pipeline is intended for qualitative deconvolution and hypothesis generation rather than quantitative estimation of cell-type proportions.


Overview of the Analysis Pipeline

The analysis is organized into four main steps:

  1. Identification and filtering of bulk RNA-seq DEGs
  2. Construction of the scRNA-seq reference object
  3. Subsetting and preprocessing of the scRNA-seq reference to match bulk tissue composition
  4. Deconvolution of bulk DEGs using the scRNA-seq reference

Each step is implemented as a standalone, reproducible script.


Step 1: Load and Filter Bulk RNA-seq Differentially Expressed Genes (DEGs)

Purpose:
Identify statistically significant DEGs from bulk RNA-seq and prepare ranked gene lists for downstream integration with scRNA-seq data.

Key operations:

  • Load DESeq2 results table
  • Remove genes without valid gene symbols
  • Filter by adjusted p-value (padj < 0.1)
  • Rank genes by log fold change
  • Split genes by direction of regulation (control vs experimental)
  • Select the top 250 DEGs per condition

Inputs:

  • S1_Table_DESeq.csv (DESeq2 results)

Outputs:

  • ctrl_DEGs: top downregulated genes
  • c4_DEGs: top upregulated genes

Step 2: Build the scRNA-seq Reference Dataset

Purpose:
Load raw single-cell gene expression data and metadata, and construct a Seurat object for downstream analysis.

Key operations:

  • Load cell metadata and gene expression matrix
  • Convert data into Seurat-compatible format
  • Create a Seurat object without initial filtering
  • Retain all genes and cells to preserve reference completeness

Inputs:

  • matrix.csv (gene expression counts)
  • metadata.csv (cell annotations)

Outputs:

  • sc_data: Seurat object containing the scRNA-seq reference dataset

Step 3: Subset scRNA-seq Reference to Match Bulk Tissue Composition

Purpose:
Restrict the scRNA-seq reference dataset to cortical cell populations that are biologically relevant to the bulk-tissue RNA-seq experiment.

Key operations:

  • Inspect region and subclass metadata distributions
  • Subset cells based on neocortical region labels
  • Remove hippocampal and non-isocortical subclasses
  • Set cell-type subclass labels as active identities
  • Normalize, scale, and identify variable genes
  • Perform PCA, t-SNE, and UMAP
  • Dynamically generate cell-type–adaptive color palettes
  • Save dimensionality reduction plots

Inputs:

  • sc_data from Step 2

Outputs:

  • Filtered and normalized scRNA-seq reference
  • PCA, t-SNE, and UMAP plots (.png)
  • Extracted plot legends for figure assembly

Step 4: Deconvolution of Bulk DEGs Using scRNA-seq Reference

Purpose:
Interpret bulk RNA-seq DEGs by identifying cell-type–specific gene expression patterns using the scRNA-seq reference.

Key operations:

  • Intersect bulk DEGs with scRNA-seq gene universe
  • Use equally sized DEG sets for comparability
  • Scale scRNA-seq expression for DEG genes only
  • Perform PCA using DEG-driven expression
  • Identify genes contributing to DEG-associated variance
  • Compute gene–gene Pearson correlation matrices
  • Perform hierarchical clustering
  • Visualize correlation structure using heatmaps

Inputs:

  • ctrl_DEGs / c4_DEGs from Step 1
  • sc_data from Step 3

Outputs:

  • DEG-driven PCA plots
  • Correlation heatmaps (.pdf)
  • Cell-type–specific gene signature visualizations

Software and R Packages

This pipeline was developed and tested using:

  • R (≥ 4.2)
  • Seurat
  • data.table
  • ggplot2
  • cowplot
  • patchwork
  • gplots

Install required packages using:

install.packages(c("data.table", "ggplot2", "cowplot", "patchwork", "gplots"))
install.packages("Seurat")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors