Analysis tool for metataxonomics using 16S-ITS-23S rRNA operon region.
With the advancement of long-read sequencing technologies, the field of metataxonomics has entered a new phase. Analyzing the 16S-ITS-23S rRNA operon region (~4,300 bp) for microbial (bacterial/archaeal) community profiling provides more taxonomic information than analyzing only partial 16S rRNA gene sequences using short-read sequencing, allowing for species-level analysis. MIrROR provides a curated database and analysis tool for metaxonomics using 16S-ITS-23S rRNA operon region.
MIrROR release 02 database has been published in Scientific Data.
- Built from 1,690,470 genomes (1,674,514 bacterial + 15,956 archaeal) from NCBI
- Final curated dataset: 476,579 sequences, 249,907 genomes, 29,051 species
- Archaeal genomes included for the first time
- Taxonomic reclassification using GTDB R220
# Install dependencies
conda install -c bioconda minimap2 # long-read mapper (required)
conda install -c bioconda krona rename # Krona visualization (optional)
pip install pandas matplotlib # stacked bar plots (optional)
# Clone repository
git clone https://github.com/seoldh/MIrROR.git
cd MIrROR
./MIrROR.py -h
# Download MIrROR release 02 database
mkdir DBDIR
wget -P DBDIR https://zenodo.org/records/17639192/files/MIrROR_r02.fa
wget -P DBDIR https://zenodo.org/records/17639192/files/MIrROR_r02.tsv
# After primer-aware curation (see below), build your own index:
# minimap2 -d DBDIR/MIrROR_r02.mmi MIrROR_r02_curated.fa
# Usage examples
MIrROR.py -K -d DBDIR -o result_BC01 ./sample_data/BC01.fastq
MIrROR.py -V -d DBDIR -o result_paf ./sample_data/list_PAF.txtMIrROR uses a curated 16S-ITS-23S rRNA operon database.
See MIrROR Website for more information.
| Release | Sequences | Genomes | Species | Taxonomy | Download |
|---|---|---|---|---|---|
| r01 | 97,781 | 43,653 | 9,485 | GTDB R89 | legacy |
| r02 (recommended) | 476,579 | 249,907 | 29,051 | GTDB R220 | Zenodo |
The Zenodo repository for release 02 contains three files:
MIrROR_r02.fa— full-length FASTA sequences (start here for custom curation)MIrROR_r02.mmi— pre-built minimap2 index (built from full, untrimmed sequences)MIrROR_r02.tsv— taxonomy + operon copy number table (required for MIrROR)
We recommend building your own minimap2 index from MIrROR_r02.fa after trimming sequences to match the primer set used in your experiment.
A practical workflow:
# 1. Import FASTA into QIIME2
qiime tools import --type 'FeatureData[Sequence]' \
--input-path MIrROR_r02.fa \
--output-path MIrROR_r02.qza
# 2. Extract amplicon region with your primer sequences (example: primer set #5)
qiime feature-classifier extract-reads \
--i-sequences MIrROR_r02.qza \
--p-f-primer CCTACGGGNBGCWSCAG \
--p-r-primer ACCRCCCCAGTHRAACT \
--p-n-jobs 4 \
--o-reads MIrROR_r02_trimmed.qza
# 3. Export and build minimap2 index
qiime tools export --input-path MIrROR_r02_trimmed.qza --output-path trimmed/
minimap2 -d DBDIR/MIrROR_r02.mmi trimmed/dna-sequences.fasta| Dependency | Purpose | Install |
|---|---|---|
| Minimap2 | Read mapping (required) | conda install -c bioconda minimap2 |
| KronaTools | Krona plots (-K) |
conda install -c bioconda krona rename |
| pandas | Stacked bar plots (-S) |
pip install pandas |
| matplotlib | Stacked bar plots (-S) |
pip install matplotlib |
__ __ ___ ____ ___ ____ _ _
| \/ |_ _|_ __| _ \ / _ \| _ \ __ _/ | / |
| |\/| || || '__| |_) | | | | |_) | \ \ / / | | |
| | | || || | | _ <| |_| | _ < \ V /| |_| |
|_| |_|___|_| |_| \_\\___/|_| \_\ \_/ |_(_)_|
usage: MIrROR.py [options] (-d DBDIR) INPUTFILE
Input:
INPUTFILE FASTA/FASTQ/PAF file(s) or a sample list [required]
Main options:
-d, --db_dir DBDIR directory containing MIrROR database [required]
-o, --output_dir OUTDIR
specify directory to output files (default: ./Result)
-t, --threads INT number of threads (default: 4)
Mapping options:
-x, --preset STR preset options to optimize alignment for different platforms. map-pb/map-hifi/map-ont -
CLR/HiFi/Nanopore (default: map-ont)
-M, --minibatch NUM number of query bases loaded to memory at once. K/M/G suffix accepted. (default: 500M)
Threshold options:
-m, --residuematches INT
minimum number of residue matches (default: 2500)
-b, --blocklength INT
minimum alignment block length (default: 3500)
-n, --Normalization normalize counts by 16S-23S rRNA operon copy number
Visualization options:
-K, --Krona create a Krona plot (requires KronaTools)
-S, --Stackedplot create stacked bar plots (requires pandas, matplotlib)
-V, --visualized perform all visualization tasks (same as -K -S)
Others:
-h, --help show this help message and exit
-v, --version show program's version number and exit
MIrROR accepts a single FASTA/FASTQ/PAF file or a sample list. Mapping is skipped when only PAF files are provided.
Sample list format (tab-delimited; group columns are optional):
Sample Status Smoke
BC01.fastq Healthy smoker
BC02.fastq Healthy non-smoker
BC03.paf Sick smoker
BC04.paf Sick non-smoker
Primer binding site sequences vary across species and lineages. As a result, no single primer set amplifies all taxa with equal efficiency, and certain species in your sample may be missed depending on the primer pair used. We strongly recommend evaluating candidate primer sets against the taxa most relevant to your study before finalizing your experimental design.
Use the MIrROR Primer Checker for in silico coverage assessment against MIrROR release 02.
| Set | Forward primer | Reverse primer | Notes |
|---|---|---|---|
| #4 (519F–2428R) | CAGCMGCCGCGGTAA |
CCRAMCTGTCTCACGACG |
Recommended for Bacteria + Archaea; best archaeal coverage (75.78% in silico) |
| #5 (341F–2241R) | CCTACGGGNBGCWSCAG |
ACCRCCCCAGTHRAACT |
Recommended for Bacteria; highest bacterial coverage (98.81% in silico) |
| r01 original (27F–2241R) | AGRGTTYGATYHTGGCTCAG |
ACCRCCCCAGTHRAACT |
Used in MIrROR release 01 studies; bacteria only |
Primer sets #4 and #5 are recommended based on in silico analysis in Lee et al. (2026). The r01 original set remains a valid choice for bacteria-focused studies. Regardless of primer choice, always verify coverage for the specific taxa of interest in your sample type before proceeding to wet lab work.
| Directory | File | Description |
|---|---|---|
./ |
RESULT.log |
run log |
./ReadMapping/ |
SAMPLE_minimap.paf |
minimap2 alignment |
./Classification/ |
SAMPLE.txt |
per-sample classification |
./FeatureTable/ |
OUTPUT_std.txt |
abundance table (standard) |
./FeatureTable/ |
OUTPUT_mpa.txt |
abundance table (MetaPhlAn-style) |
./FeatureTable/ |
OUTPUT_std_type2.txt |
abundance table (for Krona) |
./Visualization/ |
stacked_*.pdf |
stacked bar plots per taxonomic level |
./Visualization/krona/ |
SAMPLE.html |
interactive Krona chart |
#Name BC01 BC02
d__Bacteria;p__Bacillota_A;c__Clostridia;...;s__Faecalibacterium_prausnitzii 25 24
If you use MIrROR in your research, please cite both papers:
MIrROR tool and release 01:
Seol D, Lim JS, Sung S, Lee YH, Jeong M, Cho S, Kwak W, Kim H.
Microbial Identification Using rRNA Operon Region: Database and Tool for Metataxonomics with Long-Read Sequence.
Microbiology Spectrum, 10(2): e02017-21 (2022). https://doi.org/10.1128/spectrum.02017-21
MIrROR release 02:
Lee J, Hong J, Seol D, Lee W, Lee J, Kim G, Cho S, Kim H.
MIrROR release 02: Expanded and refined 16S-ITS-23S rRNA operon dataset.
Scientific Data, 13: 714 (2026). https://doi.org/10.1038/s41597-026-06729-y
