Skip to content

Output cases per day, deme, and variant #23

@huddlej

Description

@huddlej

Description

Building on the work in issue #22, output the number of cases per day, deme, and variant to support models like @marlinfiggins's Rt frequency dynamics models.

Example output looks like:

date	location	variant	sequences
2021-01-02	Alabama	other	3
2021-01-03	Alabama	other	3
2021-01-04	Alabama	other	12
2021-01-05	Alabama	other	73
2021-01-06	Alabama	other	36

See recent variant counts for the USA, for a complete example.

Possible solution

For SARS-CoV-2, "variants" are already well defined as phylogenetic lineages of interest. The closest analog in antigen would be a specific phenotype or a cluster of phenotypes in antigenic space. In @trvrb's original paper, he clustered phenotypes in 2D space as shown below in the bottom right panel:

image

To support this output, we may need to implement similar clustering logic that will group phenotypes into consistent lineages through time. Alternately, we could output cases per specific phenotype (potentially generating hundreds of different "variants").

We might implement this output as part of the same "case counts" output mentioned in #22 or as a separate file. We might also consider whether we want to parameterize how these variants are sampled to recreate the sampling bias present in real data where not all cases can be sequenced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions