PanGBank-cli is a command-line interface to search, retrieve, and download pangenomes from PanGBank via the PanGBank REST API. It acts as a convenient wrapper around the API, making PanGBank data easily accessible directly from the terminal.
PanGBank is a large-scale resource that hosts collections of microbial pangenomes constructed from diverse genome sources using PPanGGOLiN.
With PanGBank-cli you can:
- Search pangenomes by taxon, genome, or collection
- Retrieve a pangenome directly by its numeric ID
- Retrieve detailed metrics for selected pangenomes
- Download pangenome files for downstream analyses
- Map an input genome to its corresponding pangenome in PanGBank and fetch it automatically
For interactive exploration, you can also browse PanGBank collections through the web application: PanGBank Web: https://pangbank.genoscope.cns.fr/
The easiest way to install PanGBank-cli with all dependencies (including Mash):
conda create -n pangbank-cli pangbank-cli
conda activate pangbank-cliInstall using pip:
pip install PanGBank-cliWarning
Installing PanGBank-cli with pip will only set up the Python dependencies. The external tool Mash (required for the match-pangenome command) is not included and must be installed separately to enable full functionality.
# Create a new conda environment with Python
conda create -n pangbank-cli python=3.12 mash=2.3
# Activate the environment
conda activate pangbank-cli
# Clone the repository
git clone https://github.com/labgem/PanGBank-cli.git
cd PanGBank-cli
# Install PanGBank-cli
pip install .# Clone the repository
git clone https://github.com/labgem/PanGBank-cli.git
cd PanGBank-cli
# create and activate a virtual environment:
python -m venv venv
# Activate the virtual environment
# On Linux/macOS:
source venv/bin/activate
# Install PanGBank-cli
pip install .Warning
Installing PanGBank-cli from source with pip will only set up the Python dependencies. The external tool Mash (required for the match-pangenome command) is not included and must be installed separately to enable full functionality.
Once installed, you can access the CLI by running:
pangbank --helpThis will display the list of available commands and options.
Each command has a dedicated help section. For example:
pangbank search-pangenomes --helppangbank list-collectionsDisplays the list all pangenome collections available in PanGBank, along with their description and the number of pangenomes they contain.
Output is formatted as a rich table in the terminal, or as plain TSV when redirected (e.g., pangbank list-collections > collections.tsv).
pangbank search-pangenomes --taxon "g__Escherichia"This command searches PanGBank for pangenomes matching the given taxon.
Results are printed to stdout as plain TSV by default (suitable for piping or redirection). Use --table-path <file> to save directly to a file (e.g., --table-path pangenomes_information.tsv), or --no-table to disable table output.
To narrow the search to a specific collection release, add --release-version <version>:
pangbank search-pangenomes --collection GTDB_refseq --release-version 2.0.0This filter works alongside --latest-only. If both are provided, --release-version selects the release to search and --latest-only does not change the result set.
pangbank search-pangenomes --taxon "g__Chlamydia" \
--collection GTDB_refseq \
--outdir Chlamydia_pangenomes/ \
--downloadThis command searches for Chlamydia pangenomes in the GTDB_refseq collection, then downloads the corresponding pangenome files into Chlamydia_pangenomes/.
pangbank get-pangenome <id>Use this command when you already know the numeric identifier of a pangenome and want to inspect its full metadata without running a broader search first. The command prints the pangenome information to the terminal, and you can add --download to fetch the corresponding HDF5 file into the output directory.
pangbank match-pangenome --input-genome <genome.fasta> --collection GTDB_allMatches the given input genome (FASTA format) to the most similar pangenome in the selected collection using Mash and a precomputed sketch of the collection to identify the closest pangenome. The command outputs detailed information about the best matching pangenome.
Note
- Add the
--downloadflag to download the corresponding pangenome file. - The downloaded file can then be used with PPanGGOLiN’s
projectioncommand to annotate the input genome. See the PPanGGOLiN documentation for details.
PanGBank pangenomes are constructed with PPanGGOLiN and its companion tools. If you use PanGBank or PanGBank-cli in your research, please cite the following references:
PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph Gautreau G et al. (2020) PLOS Computational Biology 16(3): e1007732. doi: 10.1371/journal.pcbi.1007732
panRGP: a pangenome-based method to predict genomic islands and explore their diversity Bazin et al. (2020) Bioinformatics, Volume 36, Issue Supplement_2, Pages i651–i658 doi: 10.1093/bioinformatics/btaa792
panModule: detecting conserved modules in the variable regions of a pangenome graph Bazin et al. (2021) bioRxiv doi: 10.1101/2021.12.06.471380