Conservation tracks (PhastCons/PhyloP 100-way or here) from UCSC are in bigWig format (https://genome.ucsc.edu/goldenPath/help/bigWig.html). They are dense per-base signals. Converting to bigWig to BED blows up the size by a lot.
For sparse annotations (promoters, enhancers,…) it makes sense that to provide BEDs and statgen paints them onto SNPs. For dense bigWigs, there seems to be no reasonable BED representation that isn’t either huge (full per-base) or lossy (thresholded/binned).
Maybe it makes sense to add a helper like (e.g., load_bigwig_annotations(bigwigs, reference) that:
- queries bigWigs at the reference SNPs,
- builds a SNP × annotation matrix,
- returns an AnnotationPanel with continuous columns.
For indels, mapping examples could be: pick the leftmost reference base of the indel as its representative position or, for larger indels, summarize over the reference interval they span (e.g., mean/median/max signal in [start, end))
Conservation tracks (PhastCons/PhyloP 100-way or here) from UCSC are in bigWig format (https://genome.ucsc.edu/goldenPath/help/bigWig.html). They are dense per-base signals. Converting to bigWig to BED blows up the size by a lot.
For sparse annotations (promoters, enhancers,…) it makes sense that to provide BEDs and statgen paints them onto SNPs. For dense bigWigs, there seems to be no reasonable BED representation that isn’t either huge (full per-base) or lossy (thresholded/binned).
Maybe it makes sense to add a helper like (e.g., load_bigwig_annotations(bigwigs, reference) that:
For indels, mapping examples could be: pick the leftmost reference base of the indel as its representative position or, for larger indels, summarize over the reference interval they span (e.g., mean/median/max signal in [start, end))