Add T2T-CHM13v2.0 reference genome support (CHM13-T2T)#250
Open
ljwharbers wants to merge 4 commits into
Open
Conversation
Register CHM13-T2T (nuclear chromosomes 1-22, X, Y; no mitochondrion) as a first-class supported genome. Adds CHECKSUMS entry, extends all chrom_orders dicts, updates CLI help, README, and CHANGELOG. FTP upload of CHM13-T2T.tar.gz is required before install works end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matrix generation reads transcript_path from the package directory (references/chromosomes/transcripts/<genome>/), not the volume, so the per-chromosome transcript files must ship in the repo like every other supported genome. These were missing for CHM13-T2T, causing SigProfilerMatrixGeneratorFunc to fail with FileNotFoundError. Adds the 24 transcript files (chr 1-22, X, Y) so end-to-end matrix generation works for CHM13-T2T. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
15d42fd to
87c3d22
Compare
The previous TSB was built from chm13v2.0_maskedY_rCRS.fa, which hard-masks chrY PAR1 (positions 1-2,394,410) with Ns. This caused "reference base does not match" errors for any VCF called against the iGenomes UCSC CHM13 reference (unmasked PAR1). Rebuilt all 24 TSB files from the iGenomes CHM13 genome.fa (s3://ngi-igenomes/igenomes/Homo_sapiens/UCSC/CHM13/). Only chrY checksum changed; chr1-22 and X are identical between the two references. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Author
|
Happy to share the .tar of the build for you to host if this looks good to you. Just let me know what the preferred sharing method is for you. |
Closed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding support for CHM13. Assisted with Claude, but of course verified personally. Is it possible to host the
.tar.gzalongside your other hosted genomes?Summary
CHM13-T2T(T2T-CHM13v2.0, nuclear chromosomes 1–22, X, Y) as a first-class supported genomeCHECKSUMSinreference_genome_manager.pyso the standardSigProfilerMatrixGenerator install CHM13-T2Tcommand works once the tarball is on the FTP serverCHM13-T2Tto allchrom_ordersdictionaries inSigProfilerMatrixGeneratorFunc.py(9 locations) andMutationMatrixGenerator.pyto preventKeyErroron any code path