ResMAG is a state-of-the-art and user-friendly Snakemake workflow designed for the analysis of metagenomic data. It integrates multiple bioinformatics tools and algorithms to facilitate key steps in metagenome analysis, including quality control, assembling, bin refinement, metagenome-assembled genome (MAG) reconstruction and taxonomic classification. ResMAG has a special focus on highly diverse samples, such as wastewater, and the identification of antibiotic resistance genes.
Binning Techniques: Employ a collection of two state-of-the-art binning tools to partition metagenomic contigs into individual bins, allowing for comprehensive and accurate analysis.
MAG Reconstruction: Utilize cutting-edge algorithms to reconstruct high-quality metagenome-assembled genomes (MAGs), especially from highly diverse microbial communities, like wastewater.
Taxonomic Classification: Apply advanced taxonomic classification methods to assign taxonomic labels to reads, contigs and MAGs and identify the microbial community composition within the metagenomic samples.
Antibiotic Resistance Gene Identification: Perform in-depth analysis to detect and characterize antibiotic resistance genes within the metagenomic data, providing valuable insights into antimicrobial resistance profiles.
Performance Refinement: Continuously optimize the pipeline by incorporating the latest advancements in metagenomics research, ensuring the highest accuracy and efficiency in metagenomic data analysis.
Create a snakemake environment using mamba via:
mamba create -c conda-forge -c bioconda -n snakemake snakemake snakemake-storage-plugin-fs
For installation details, see the instructions in the Snakemake documentation.
The GTDB needs to be downloaded and decompressed, it requires about 140 Gb.
- Change to the directory where the GTDB should be stored
- Download the latest or your desired version of GTDB
Please make sure this version is compatible with GTDB-tk version
2.6.1wget https://data.ace.uq.edu.au/public/gtdb/data/releases/latest/auxillary_files/gtdbtk_package/full_package/gtdbtk_data.tar.gz - Decompress the downloaded archive
tar xzf gtdbtk_data.tar.gz - After successful step 3: the archive can be removed
- Please specify the path to your decompressed GTDB in the config file (see Configuring workflow)
Please follow the steps outlined in the UniCARD repository to create a UniCARD database for resistance gene annotation. And specify the path to your database in the config file (see Configuring workflow)
Please obtain a copy of this workflow by cloning this repository.
git clone https://github.com/IKIM-Essen/metagenomics_res.git
- Edit the
config/config.yamlfile:- Specify a project name (
project-name) - Specify filtering options for human reads (
human-filtering) - Specify host filtering options, if you have a non-human host (
host-filtering) - Specify options for different databases:
- GTDB database needs to be downloaded before (see Download GTDB)
- UniCARD database needs to be create before (see Create UniCARD database)
- other databases (kaiju, CheckM2, CARD, genomad) can be given as a local path or downloaded when running the pipeline
- Specify a project name (
- Provide sample information in the
config/pep/samples.csvfile while keeping the header and the format as:
sample_name,fq1,fq2
sample1,path/to/your/fastq/sample1_R1.fastq.gz,path/to/your/fastq/sample1_R2.fastq.gz
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Executing the workflow:
snakemake --use-conda --cores $N -k
using $N cores. It is recommended to use all available cores.
A list of tools used in the pipeline:
Pull requests and feature suggestions are very welcome! Feel free to fork and submit improvements. For any other questions, or feedback, please contact the project maintainer Josefa Welling (@josefawelling) at josefa.welling@uk-essen.de.
We appreciate your input and support in using and improving ResMAG.
We would like to express our gratitude towards Katharina Block, Adrian Doerr, Miriam Balzer, Alexander Thomas, Johannes Köster, Ann-Kathrin Doerr and the IKIM who have contributed to the development and testing of ResMAG. Their valuable insights and feedback have been helpful throughout the creation of the workflow.
A paper is on its way. If you use ResMAG in your work, don't forget to give credits to the authors by citing the URL of this repository.
ResMAG is licensed under the BSD-2 Clause.