Description
Building on the work in issue #22, output the number of cases per day, deme, and variant to support models like @marlinfiggins's Rt frequency dynamics models.
Example output looks like:
date location variant sequences
2021-01-02 Alabama other 3
2021-01-03 Alabama other 3
2021-01-04 Alabama other 12
2021-01-05 Alabama other 73
2021-01-06 Alabama other 36
See recent variant counts for the USA, for a complete example.
Possible solution
For SARS-CoV-2, "variants" are already well defined as phylogenetic lineages of interest. The closest analog in antigen would be a specific phenotype or a cluster of phenotypes in antigenic space. In @trvrb's original paper, he clustered phenotypes in 2D space as shown below in the bottom right panel:

To support this output, we may need to implement similar clustering logic that will group phenotypes into consistent lineages through time. Alternately, we could output cases per specific phenotype (potentially generating hundreds of different "variants").
We might implement this output as part of the same "case counts" output mentioned in #22 or as a separate file. We might also consider whether we want to parameterize how these variants are sampled to recreate the sampling bias present in real data where not all cases can be sequenced.
Description
Building on the work in issue #22, output the number of cases per day, deme, and variant to support models like @marlinfiggins's Rt frequency dynamics models.
Example output looks like:
See recent variant counts for the USA, for a complete example.
Possible solution
For SARS-CoV-2, "variants" are already well defined as phylogenetic lineages of interest. The closest analog in antigen would be a specific phenotype or a cluster of phenotypes in antigenic space. In @trvrb's original paper, he clustered phenotypes in 2D space as shown below in the bottom right panel:
To support this output, we may need to implement similar clustering logic that will group phenotypes into consistent lineages through time. Alternately, we could output cases per specific phenotype (potentially generating hundreds of different "variants").
We might implement this output as part of the same "case counts" output mentioned in #22 or as a separate file. We might also consider whether we want to parameterize how these variants are sampled to recreate the sampling bias present in real data where not all cases can be sequenced.