Skip to content

Release v3.0.0: SNP calling with GATK 4.1 includes Slurm compatibility

Latest

Choose a tag to compare

@ChaochihL ChaochihL released this 02 Jun 19:27

This release includes the following changes.

Slurm workload manager is supported for all handlers.

GATK v4.1.2 on the Slurm queueing system is supported for the following handlers:

  • Haplotype_Caller
  • Added Genomic_DB_Import handler (this combines GVCF files prior to running Genotype_GVCFs handler)
  • Genotype_GVCF
  • Create_HC_Subset (preparation steps for GATK Variant Recalibrator)
  • Variant_Recalibrator

GATK v4.1.2 on non-PBS queueing systems is supported for the following handlers:

  • Haplotype_Caller
  • Genotype_GVCF
  • Variant_Filtering

Additional changes:

  • VCF annotation visualization to assist filtering has also been added.
  • Jupyter Notebook template for exploring VCF files prior to variant recalibration/filtering steps is now available in the HelperScripts directory
  • Realigner_Target_Creator and Indel_Realigner handlers have been separated from the main pipeline because the functionality is only available in GATK 3 or earlier and we still need indel realignment for other downstream tools. Please fill out Config_Indel_Realign for indel realignment steps.
  • Main Config file has been updated accordingly with updates to handlers. A few new variables have been added.
  • Haplotype_Caller, Genomics_DB_Import, and Genotype_GVCFs now handle parallelizing across regions using job arrays.
  • This version allows you to re-run specific job array numbers with an optional -t custom_array_indices argument from the command line (instead of having to re-create your sample list for failed/aborted jobs). So you can now run it like this:
./sequence_handling SAM_Processing /path/to/config -t 1-5,10,12

Without the -t flag, by default runs all samples in your list. So you can still run sequence_handling like this: ./sequence_handling SAM_Processing /path/to/config
This will work for any handler that utilizes job arrays.

  • Create_HC_Subset can now handle very large VCF files (>1TB vcf files) in a reasonable manner
  • Variant_Recalibrator now has additional features:
    • Can specify recalibration "mode" to recalibrate both indels and snps, indels only, or snps only
    • Allows specification of a custom set of annotations in the config file
    • Allows specification of additional options/flags to include
    • Allows more control over setting resource datasets as known, training, or truth sets
    • Automatically indexes raw vcf file and resource files if they are not already indexed