Skip to content

Support mutant protein sequences from multiple sources (varcode, isovar, file import) #102

Description

@iskandr

Motivation

Topiary currently has no integration path for mutant protein sequences beyond its existing variant-based pipeline. The abandoned isovar-for-protein-sequence branch (2016) explored this but never landed, and the codebase has since evolved significantly.

There are two fundamentally different ways to derive mutant protein sequences, and users may want to supply them from external tools or generate them inline:

  1. DNA-only (germline + somatic variants) — predict the mutant protein sequence from a reference genome plus variant calls. Sources:

    • Generated by varcode from VCF/MAF inputs
    • Loaded from files produced by other variant annotation tools
  2. DNA + RNA assembly — assemble RNA reads around variant loci to capture expressed isoforms, including novel splicing. Sources:

    • Generated by isovar from BAM + variant inputs
    • Loaded from files produced by other RNA-aware tools

Proposed scope

  • Define a common internal representation for mutant protein sequences regardless of source
  • Support file-based import (e.g. CSV/FASTA with sequence + metadata) so users can bring results from any upstream tool
  • Optionally integrate varcode and/or isovar as built-in generators
  • Wire into the existing ranking/filtering DSL so downstream analysis works uniformly

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions