ContextSV

A long-read, whole-genome structural variant (SV) caller with copy number predictions from coverage and SNP B-allele frequency. Inputs are long read alignments (BAM) and reference genome (FASTA), a VCF with high-quality SNPs (e.g. via Clair3, NanoCaller), and per-chromosome VCF files with SNP population frequencies (e.g. from gnomAD). Class documentation is available at https://wglab.openbioinformatics.org/ContextSV

Installation

Anaconda

First, install Anaconda.

Next, create a new environment. This installation has been tested with Python 3.10, Linux 64-bit.

conda create -n contextsv python=3.10
conda activate contextsv

ContextSV and its dependencies can then be installed using the following command:

conda install -c wglab -c conda-forge -c bioconda contextsv

# Or using mamba (faster dependency resolution):
mamba install -c wglab contextsv

After installation, you should have access to the following commands in your terminal:

contextsv: the main SV caller
contextsv-cnv-plot: utility to generate CNV plots from ContextSV JSON output
contextscore: ContextScore utility for post-filtering of low-confidence SV calls

Example usage:

# SV calling example:
contextsv \
  --bam sample.bam \
  --ref hg38.fa \
  --outdir output/ \
  --threads 4 \
  --snp snps.vcf \
  --eth nfe \
  --pfb gnomadv4_filepaths.txt \
  --assembly-gaps hg38-gaps.bed \   # optional: assembly gaps file
  --save-cnv                        # optional: save CNV calls in JSON

# SV post-filtering example:
contextscore \
  --input input.vcf \
  --output scored.vcf \
  --sample-coverage 30 \
  --buildver hg38 \
  --threshold 0.2 \
  --annovar /path/to/annovar \
  --annovar-db /path/to/humandb


# CNV plotting example:
contextsv-cnv-plot ./output/CNVCalls.json chr3 --formats html,svg --output-dir ./CNV_Plots

Docker

First, install Docker. Pull the latest image from Docker hub, which contains the latest release and its dependencies.

docker pull genomicslab/contextsv

Example usage:

# SV calling:
docker run --rm genomicslab/contextsv --help

# SV post-filtering:
docker run --rm \
  -v /path/to/data:/mnt \
  genomicslab/contextsv \
  contextscore \
  --help

# CNV plotting:
docker run --rm \
  -v /path/to/data:/mnt \
  genomicslab/contextsv \
  contextsv-cnv-plot \
  --help

Building from source (for testing/development)

ContextSV requires HTSLib as a dependency that can be installed using Anaconda. Create an environment containing HTSLib:

conda create -n htsenv -c bioconda -c conda-forge htslib
conda activate htsenv

Then follow the instructions below to build ContextSV:

git clone https://github.com/WGLab/ContextSV
cd ContextSV
make

ContextSV can then be run:

./build/contextsv --help

Options:
  -b, --bam <bam_file>          Long-read BAM file (required)
  -r, --ref <ref_file>          Reference genome FASTA file (required)
  -s, --snp <vcf_file>          Long-read SNP VCF file (required)
  -o, --outdir <output_dir>     Output directory (required)
  -t, --threads <thread_count>  Number of threads, chromosome-level parallelization (default: 1)
  -h, --hmm <hmm_file>          HMM parameter file for copy number predictions (included in the repository)
  -e, --eth <eth_file>          Ethnicity as used in gnomAD (e.g. "asj" for Ashkenazi Jewish, "nfe" for Non-Finnish European, etc.)
  -p, --pfb <pfb_file>          File containing per-chromosome population allele frequency filepaths as described in this documentation
     --assembly-gaps <gaps_file> Assembly gaps file in BED format available from UCSC Genome Browser (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/gap.txt.gz for GRCh38)
     --save-cnv                 Save CNV data in JSON for downstream plotting with contextsv-cnv-plot
     --debug                    Debug mode with verbose logging
     --version                  Print version and exit
  -h, --help                    Print usage and exit

Downloading gnomAD SNP population frequencies

SNP population allele frequency information is used for copy number predictions in this tool (see PennCNV for specifics). We recommend downloading this data from the Genome Aggregation Database (gnomAD).

Download links for genome VCF files are located here (last updated April 3, 2024):

gnomAD v4.0.0 (GRCh38): https://gnomad.broadinstitute.org/downloads#4
gnomAD v2.1.1 (GRCh37): https://gnomad.broadinstitute.org/downloads#2

Script for downloading gnomAD VCFs

download_dir="~/data/gnomad/v4.0.0/"

chr_list=("1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "X" "Y")

for chr in "${chr_list[@]}"; do
    echo "Downloading chromosome ${chr}..."
    wget "https://storage.googleapis.com/gcp-public-data--gnomad/release/4.0/vcf/genomes/gnomad.genomes.v4.0.sites.chr${chr}.vcf.bgz" -P "${download_dir}"
done

Finally, create a text file that specifies the chromosome and its corresponding gnomAD filepath. This file will be passed in as an argument:

gnomadv4_filepaths.txt

1=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr1.vcf.bgz
2=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr2.vcf.bgz
3=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr3.vcf.bgz
...
X=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chrX.vcf.bgz
Y=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chrY.vcf.bgz

Revision history

For release history, please visit here.

Getting help

Please refer to the contextSV issue pages for posting your issues. We will also respond your questions quickly. Your comments are critical to improve our tool and will benefit other users.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
conda		conda
data		data
include		include
lib		lib
python		python
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Doxyfile		Doxyfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
scores.png		scores.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContextSV

Installation

Anaconda

Docker

Building from source (for testing/development)

Downloading gnomAD SNP population frequencies

Script for downloading gnomAD VCFs

Revision history

Getting help

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ContextSV

Installation

Anaconda

Docker

Building from source (for testing/development)

Downloading gnomAD SNP population frequencies

Script for downloading gnomAD VCFs

Revision history

Getting help

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages