Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 28 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,37 @@

This repository houses all scripts, snakefiles, and configuration files for the [Pekosz Lab nextstrain builds](https://nextstrain.org/groups/PekoszLab) in [JH-CEIRR](https://www.ceirr-network.org/centers/jh-ceirr).

## What is this?

[Nextstrain](https://nextstrain.org) is an open-source platform for real-time tracking of pathogen evolution. It combines phylogenetic analysis with interactive visualization to help researchers understand how viruses spread and evolve. A "build" produces an interactive phylogenetic tree (a JSON file) that can be explored in the browser at [nextstrain.org](https://nextstrain.org) or locally via [Auspice](https://docs.nextstrain.org/projects/auspice/en/stable/).

This repository implements a [Snakemake](https://snakemake.readthedocs.io/en/stable/) pipeline that:
1. Ingests influenza genome sequences and metadata from the Johns Hopkins Hospital (JHH) network and GISAID vaccine strains.
2. Types and quality-controls each sequence using [flusort](scripts/flusort) and [Nextclade](https://clades.nextstrain.org/).
3. Builds separate phylogenetic trees for all 8 genome segments of H1N1, H3N2, and B/Victoria influenza, plus 3 concatenated whole-genome builds (27 builds total).
4. Exports interactive Auspice JSONs that are uploaded to [nextstrain.org/groups/PekoszLab](https://nextstrain.org/groups/PekoszLab).

- Currently, builds are maintained for all 8 segments of circulating H1N1, H3N2, and B/Vic viruses detected through the Johns Hopkins Hospital (JHH) Network.

- 3 concatenated genome builds are also maintained for H1N1, H3N2, and B/Vic viruses.

- As of [2024-11-26](#history), all builds are constructed using a simplified [snakemake](https://snakemake.readthedocs.io/en/stable/) pipeline.

## Prerequisites

Before running this pipeline, ensure you have the following installed on your machine:

| Requirement | Version | Notes |
|-------------|---------|-------|
| [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) | ≥ 23.x | Manages all pipeline dependencies |
| [git](https://git-scm.com/downloads) | any | For cloning this repository |
| Operating System | Linux or macOS | Windows users should use [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) |
| RAM | ≥ 16 GB | IQ-TREE phylogenetic inference is memory-intensive |
| Disk space | ≥ 10 GB | For sequence data, intermediate results, and output JSONs |

> [!NOTE]
> All other software dependencies (augur, iqtree2, mafft, nextclade, blast, snakemake, etc.) are installed automatically via the conda environment in Step 1.

# Quickstart: Getting Started with the 24 segment build for H1N1, H3N2 and B/Victoria

>[!WARNING]
Expand Down Expand Up @@ -117,7 +142,7 @@ nextstrain remote upload \
auspice/${YOUR_BUILD_NAME}.json
```

# 5. Building Internal Reports
## 5. Building Internal Reports

> [!NOTE]
> You can safely generate reports **before running the `snapshot_clean` rule** — the `reports/` folder will be archived automatically during the snapshot process.
Expand All @@ -140,7 +165,7 @@ quarto render scripts/render-reports.qmd --to html --output-dir ../reports/
```
The rendered report will be saved in the reports/ folder and can be viewed in any web browser.

# 6. Create a Build Snapshot and Clean the Working Directory
## 6. Create a Build Snapshot and Clean the Working Directory

> [!WARNING]
> Before starting a new build with updated data, you **must** run this rule.
Expand Down Expand Up @@ -184,7 +209,7 @@ snapshots/
├── 20251111T163000.tar.gz
```

# Peskoz Lab Nextstrain Roadmap
# Pekosz Lab Nextstrain Roadmap

- [ ] Automated [report generation](scripts/report-html-pdf.qmd) for all builds.
- [ ] Add t-SNE implementation for all builds using [pathogen-embed](https://pypi.org/project/pathogen-embed/).
Expand Down