Emergence of an Antigenically Drifted and Reassorted Influenza B Virus at the end of the 2024-25 Influenza Season
Elgin Akin, David A Villafuerte, Anne P. Werner, Matthew Pinsley, Amary Fall, Omar Abdullah, Julie M Norton, Richard Eric Rothman, Katherine Fenstermacher, Yu-Nong Gong, Eili Klein, Heba H Mostafa, Andrew Pekosz
BioRxiv Preprint doi: https://doi.org/10.1101/2025.07.24.666632
This repository houses all source code for the 8 segment and 1 genome Nextstrain Build for Assessing C.3 Reassortment Rates along with all reassortment tanglegrams for the 2024-25 Influenza B season.
Important
Repository Code of conduct for ALL collaborators using this repo:
- DO NOT push commits to main. Please open a pull request.
- Please document any problems or issues you encounter by opening a github issue
Warning
This repository does NOT house the necessary data to construct the 9 IBV C.3-enriched nextstrain builds. These data are availible on a private OneDrive. Contact Elgin Akin (eakin1@jh.edu) for access.
This repository houses the nextstrain segment build and genome build snakemake pipeline. It functionally represents a typical Pekosz Lab seasonal Influenza B nextstrain build but replaces the automated data ingest pipeline with a manual data ingest pipeline. This pipeline can be found in the in the 01_ingest.qmd notebook in order to accomodate sequences from GISAID. In the future, efforts will be made to automate the cleaning of publically sourced data.
Instructions: Click on each link to visualize a HA:{segment} tanglegram in auspice.
Warning
You must have access to the Pekosz Lab private nextstrain account. Contact Dr. Andy Pekosz apekosz1@ jh.edu and cc Elgin Akin (eakin1@jh.edu) for access
- Archive the current version of the repository.
- This is important because the nextstrain pipeline will overwrite files in the
data/andsource/intermediate/directories. - Make a new directory in the snapshots folder naming it according to the date of the new build YYYMMDD (e.g. 20250615)
- Copy the entire contents of your project into this new directory
- a script has been written to accomplish this: execute
python scripts/snapshot.py -s . -o snapshots/{YYYYMMDD}changing the date to the current date.
- a script has been written to accomplish this: execute
- Once you have manually confirmed that the new directory is a complete copy of the project, you can safely delete the contents of the
data/,results, andsource/intermediate/directories using our safety-enhanced script:- Run
python scripts/snapshot.py.
- Run
- This is important because the nextstrain pipeline will overwrite files in the
- Download updated Influenza B Genomic data from GISAID
- Dates should be filtered to 2020-10-05 to the Present Date.
- Rename the sequences.fasta file to gisaid_vic_sequences.fasta
- Rename the metadata.tsv file to gisaid_vic_metadata.tsv
- Place these files in the
source/directory of your project directory (this repository)
- Manually curate the sequences and metadata files using the 01_ingest.qmd notebook.
- Execute the snakemake pipeline to build the 8 segment and 1 genome builds.
- The snakemake pipeline is located in the
workflow/directory of this repository. - The snakemake pipeline can be executed by running the following command in your terminal:
- The snakemake pipeline is located in the
snakemake --cores 8
The 01_ingest.qmd explains how sequences were down-selected and curated to result in a high quality genomes for 8 segment-specific builds in augur.
The most recently curated consensus dataset can be found in the data/ directory of the most recent data snapshots HERE: Link to OneDrive
This folder represents various snapshots of data downsampling and will be periodically updated with circulating Influenza B strains deposited on GISAID. However, all current and historical builds will be deposited into this folder.
Critically, the data/ directory houses 8 sequence.fasta and 8 metadata.tsv which contain equivalent numbers of genomes across all 8 segment files.
- In your browswer, click the URL window and delete its contents.
- The formula for building a tanglegram in your browser is as follows:
https://nextstrain.org/groups/PekoszLab/akine/ibvc3/vic/**segment**:groups/PekoszLab/akine/ibvc3/vic/**segment**- The segment must be in all lowercase. Options: pb2, pb1, pa, ha, np, na, mp, ns.
For example: a PB2:PA tanglegram can be build and visualized by entering the following URL into your browswer: (https://nextstrain.org/groups/PekoszLab/akine/ibvc3/vic/pb2:groups/PekoszLab/akine/ibvc3/vic/pa)
- refactor this repository's version fludb to accomodate authorship into the nextstrain build
- Add a numerical conversion of glycosylation for HA and NA.
- We really should consider wrapping this up in a package to centralize version control across projects...
- Custom ingest process for piping gisaid and jhh hospital sequence and metadata
- update fludb upload scrupt
upload_jhh.py: JHH location from "JHH" to GISAID-formatted Region / Country / Division / Location: "North America/United States/Maryland/Baltimore" -
upload_gisaid.pyenhancements:- strain name spaces are removed. Former logic was to replace " " with "_".
- Change the default location entry from "JHH" to "North America / United States / Maryland / Baltimore" to match GISAID location formatting.
- Augur export - 'locations' renamed to 'area' - need to update lat_long.tsv config file
To preview builds locally:
auspice view \
--datasetDir auspice/vic
Find and kill local server
lsof -i tcp:4000
kill -9 <PID>
Upload to private nextstrain
python scripts/c.3_nextstrain_upload_private_genomes.py
quick export edit
snakemake --cores 8 --forcerun export