Motivation
Topiary currently has no integration path for mutant protein sequences beyond its existing variant-based pipeline. The abandoned isovar-for-protein-sequence branch (2016) explored this but never landed, and the codebase has since evolved significantly.
There are two fundamentally different ways to derive mutant protein sequences, and users may want to supply them from external tools or generate them inline:
-
DNA-only (germline + somatic variants) — predict the mutant protein sequence from a reference genome plus variant calls. Sources:
- Generated by varcode from VCF/MAF inputs
- Loaded from files produced by other variant annotation tools
-
DNA + RNA assembly — assemble RNA reads around variant loci to capture expressed isoforms, including novel splicing. Sources:
- Generated by isovar from BAM + variant inputs
- Loaded from files produced by other RNA-aware tools
Proposed scope
- Define a common internal representation for mutant protein sequences regardless of source
- Support file-based import (e.g. CSV/FASTA with sequence + metadata) so users can bring results from any upstream tool
- Optionally integrate varcode and/or isovar as built-in generators
- Wire into the existing ranking/filtering DSL so downstream analysis works uniformly
Context
Motivation
Topiary currently has no integration path for mutant protein sequences beyond its existing variant-based pipeline. The abandoned
isovar-for-protein-sequencebranch (2016) explored this but never landed, and the codebase has since evolved significantly.There are two fundamentally different ways to derive mutant protein sequences, and users may want to supply them from external tools or generate them inline:
DNA-only (germline + somatic variants) — predict the mutant protein sequence from a reference genome plus variant calls. Sources:
DNA + RNA assembly — assemble RNA reads around variant loci to capture expressed isoforms, including novel splicing. Sources:
Proposed scope
Context
isovar-for-protein-sequence, 7 commits) sketched aProteinFragment/MutantProteinFragmentabstraction but was abandoned mid-refactor