Replies: 4 comments 7 replies
-
|
This sounds great to me! I'd only maybe consider changing "The coordinates should include partial motifs" to "The coordinates could include partial motifs". |
Beta Was this translation helpful? Give feedback.
-
|
I just want to echo @egor-dolzhenko comment. The plan of action looks great. I would be happy to "test drive" the new coordinates for TRGT once they're available. |
Beta Was this translation helpful? Give feedback.
-
|
At first glance, this all sounds reasonable to me also. I just ran a 4-way comparison of the locus definitions in the
Here are the full results of the comparison. More than half (38 out of 70) of the loci present in both the gnomAD and STRchive jsons currently have different start/end coordinates(!) Many loci differ by only 1 or 2 bases, but a few differ by a lot. For example, our PABPN1 definitions differ by 4 repeats, with STRchive choosing the longer definition while the other sources use the shorter one (we discuss these different competing definitions in the Tandem Repeat Catalogs preprint): Large differences also exist for C9orf72, AR, FMR1, JPH3, MARCHF6, AFF2, DAB1, BEAN1, FGF14, etc. It would be great to resolve these differences as much as possible. rule to say that existing definitions in the original Illumina ExpansionHunter catalog should take precedence (given the number of resources that rely on them)? |
Beta Was this translation helpful? Give feedback.
-
|
I was syncing the latest pathogenic thresholds from STRchive to gnomAD and the same question as for PABPN1 applies to the AR locus. Does STRchive suggest counting the "AGAGACTAGCCCCAGGCAGCAGCAGCAGCAGCAG" sequence when comparing to the pathogenic threshold of 38 x CAG? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm starting a conversation about general strategies to define locus coordinates after useful discussions with @laurelhiatt @avvaruakshay @penhoorn @egor-dolzhenko @bw2. I wanted to give you all a chance to discuss this before we implement anything.
All genomic coordinates in STRchive should be 0-based (0-start, half-open). See here for more details.
Coordinates will need to be adjusted for linking with UCSC, see #149.
Defining the "core" STRchive locus coordinates:
The reasoning:
This is designed specifically to provide the coordinates of the pathogenic part of the sequence.
Defining coordinates for TRGT genotyping
Defining coordinates for ExpansionHunter genotyping
The reasoning:
TRGT performs better when flanking variable regions are included. ExpansionHunter conversely performs better when only pure regions are included.
Beta Was this translation helpful? Give feedback.
All reactions