This repository contains analysis code and custom processing scripts used for massively parallel reporter assay (MPRA) experiments, single-nucleotide mutagenesis studies, imaging analyses, and RNA structural feature prediction. It includes manuscript analysis notebooks, reusable functions, and a custom barcode counting pipeline for sequencing-based MPRA datasets.
| File | Description |
|---|---|
SN_MPRA_manuscript_code.Rmd |
Analysis code for the original tiling single-nucleotide MPRA (SN-MPRA) library. |
SN_MPRA_followup_manuscript_code.Rmd |
Analysis code for follow-up single nucleotide mutagenesis MPRA experiments. |
imaging_stats_analysis.R |
Statistical analysis scripts for imaging datasets. |
localization_mpra_functions.R |
Shared functions used by SN_MPRA_manuscript_code.Rmd and SN_MPRA_followup_manuscript_code.Rmd. |
| File | Description |
|---|---|
rG4.ipynb |
RNA G-quadruplex (rG4) detection notebook adapted from the online rG4detector package. |
vienna_RNA.ipynb |
RNA folding / ΔG prediction notebook adapted from the ViennaRNA package. |
These files comprise a custom pipeline for counting barcodes from sequencing reads generated in MPRA experiments.
| File | Description |
|---|---|
MPRA_count.py |
Main component of the custom barcode counting workflow. |
count.py |
Barcode counting utility script. |
count.txt |
Supporting configuration or reference file used in barcode counting. |
countSetup.py |
Setup/configuration script for counting pipeline. |
merge.py |
Script for merging intermediate count files or sequencing outputs. |
pandaseq.sh |
Shell script for paired-end read assembly using PANDAseq. |
processFQ.py |
FASTQ preprocessing script for barcode counting pipeline. |
setup_multi.py |
Setup script for multiprocessing barcode matching workflow. |
string_match_multi.c |
C implementation for high-speed string matching. |
string_match_multi.pyx |
Cython wrapper/source for accelerated string matching. |
string_match_multi.cpython-313-x86_64-linux-gnu.so |
Compiled shared object for Python integration of string matching module. |
Software requirements depend on which components are used. Common dependencies may include:
- R (with tidyverse, rmarkdown, and statistical packages)
- Python 3.x
- Jupyter Notebook
- Cython
- GCC / C compiler
- PANDAseq
- ViennaRNA
- rG4detector and associated Python packages
- Preprocess sequencing reads using
pandaseq.shandprocessFQ.py - Count barcodes using the custom counting scripts
- Analyze MPRA datasets using the R Markdown notebooks
- Run imaging statistics using
imaging_stats_analysis.R - Predict RNA structural features using the provided notebooks
- Some notebooks/scripts incorporate code adapted from external packages (
rG4detector,ViennaRNA). - File paths and input formats may need to be updated for your local environment.
- Compiled binaries (
.so) may need to be rebuilt depending on operating system and Python version.
If using this repository in academic work, please cite the associated manuscript(s) and relevant external tools/packages.
For questions or collaboration inquiries, please open an issue or contact the repository owner.