A workflow to map long reads back to an assembly.
This workflow will:
- Align reads via
minimap2,pbmm2, orwinnowmap. - Optionally, filter unmapped reads and non-primary alignments.
git clone git@github.com:logsdon-lab/Snakemake-Aligner.git
cd Snakemake-AlignerFiles can be passed multiple ways in the samples section of config.yaml:
By path.
samples:
- name: "1"
asm_fa: "1.fa"By fofn.
samples:
- name: "1"
asm_fofn: "1.fofn"By directory and file regex.
samples:
- name: "1"
asm_dir: "1/"
asm_rgx: ".*\\.fa.gz$"By fofn.
samples:
- name: "1"
read_fofn: "1.fofn"By directory and file extension.
samples:
- name: "1"
read_dir: "1/"
read_rgx: ".*\\.bam$"By path.
samples:
- name: "1"
reads: [
"1/r1.bam"
]General configuration can be filled in config.yaml:
# Aligner to use.
# Either "winnowmap", "minimap2", or "pbmm2".
aligner: "winnowmap"
# To override default aligner params.
aligner_opts: "--MD -ax map-pb"
# Output directory
output_dir: "results/align"
# Log directory
logs_dir: "logs/align"
# Benchmarks directory
benchmarks_dir: "benchmarks/align"
# Job resources. Memory in GB.
threads_aln: 8
mem_aln: 30GAligner default parameters:
winnowmap-y -a --eqx --cs -x map-pb -I8g
pbmm2--log-level DEBUG --preset SUBREAD --min-length 5000
minimap2-y -a --eqx --cs -x map-hifi -I8g
To keep tags of BAM files:
samples:
- name: "1"
keep_tags: ["ML", "MM"]
# Or globally across all samples:
keep_tags: ["ML", "MM"]To keep reads greater than some length:
samples:
- name: "1"
min_read_length: 30000
# Or globally across all samples:
min_read_length: 30000To output CRAM files:
output_format: cramIf output BAM files and need csi index.
use_bam_csi: truesnakemake -np -c 1 --configfile config/config.yamlTo incorporate this into a workflow.
SAMPLE_NAMES = ["sample_1"]
CFG = {
"samples": [
{
"name": sm,
"asm_fa": f"{sm}.fa",
"read_dir": f"reads/{sm}/",
"read_ext": "bam",
}
for sm in SAMPLE_NAMES
],
**config["align"]
}
module Align:
snakefile:
github(
"logsdon-lab/Snakemake-Aligner",
path="workflow/Snakefile",
branch="main"
)
config: CFG
use rule * from Align as align_*
rule all:
input:
expand(rules.align.input, sm=SAMPLE_NAMES),To run the dry-run workflow. Workflow with real files is a WIP.
snakemake --configfile test/config.yaml -c 1 -np