Skip to content

Dougherty-Lab/astrocyte_sn-mpra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repository contains analysis code and custom processing scripts used for massively parallel reporter assay (MPRA) experiments, single-nucleotide mutagenesis studies, imaging analyses, and RNA structural feature prediction. It includes manuscript analysis notebooks, reusable functions, and a custom barcode counting pipeline for sequencing-based MPRA datasets.

Repository Contents

Manuscript and Data Analysis

File Description
SN_MPRA_manuscript_code.Rmd Analysis code for the original tiling single-nucleotide MPRA (SN-MPRA) library.
SN_MPRA_followup_manuscript_code.Rmd Analysis code for follow-up single nucleotide mutagenesis MPRA experiments.
imaging_stats_analysis.R Statistical analysis scripts for imaging datasets.
localization_mpra_functions.R Shared functions used by SN_MPRA_manuscript_code.Rmd and SN_MPRA_followup_manuscript_code.Rmd.

RNA Structure / Sequence Feature Analysis

File Description
rG4.ipynb RNA G-quadruplex (rG4) detection notebook adapted from the online rG4detector package.
vienna_RNA.ipynb RNA folding / ΔG prediction notebook adapted from the ViennaRNA package.

Custom Barcode Counting Pipeline

These files comprise a custom pipeline for counting barcodes from sequencing reads generated in MPRA experiments.

File Description
MPRA_count.py Main component of the custom barcode counting workflow.
count.py Barcode counting utility script.
count.txt Supporting configuration or reference file used in barcode counting.
countSetup.py Setup/configuration script for counting pipeline.
merge.py Script for merging intermediate count files or sequencing outputs.
pandaseq.sh Shell script for paired-end read assembly using PANDAseq.
processFQ.py FASTQ preprocessing script for barcode counting pipeline.
setup_multi.py Setup script for multiprocessing barcode matching workflow.
string_match_multi.c C implementation for high-speed string matching.
string_match_multi.pyx Cython wrapper/source for accelerated string matching.
string_match_multi.cpython-313-x86_64-linux-gnu.so Compiled shared object for Python integration of string matching module.

Requirements

Software requirements depend on which components are used. Common dependencies may include:

  • R (with tidyverse, rmarkdown, and statistical packages)
  • Python 3.x
  • Jupyter Notebook
  • Cython
  • GCC / C compiler
  • PANDAseq
  • ViennaRNA
  • rG4detector and associated Python packages

Typical Workflow

  1. Preprocess sequencing reads using pandaseq.sh and processFQ.py
  2. Count barcodes using the custom counting scripts
  3. Analyze MPRA datasets using the R Markdown notebooks
  4. Run imaging statistics using imaging_stats_analysis.R
  5. Predict RNA structural features using the provided notebooks

Notes

  • Some notebooks/scripts incorporate code adapted from external packages (rG4detector, ViennaRNA).
  • File paths and input formats may need to be updated for your local environment.
  • Compiled binaries (.so) may need to be rebuilt depending on operating system and Python version.

Citation

If using this repository in academic work, please cite the associated manuscript(s) and relevant external tools/packages.

Contact

For questions or collaboration inquiries, please open an issue or contact the repository owner.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors