Skip to content

Fix #394: in-frame stop-codon deletion with empty 3' UTR is a Deletion#395

Merged
iskandr merged 2 commits into
mainfrom
fix-394-stoploss-empty-3p-utr
Jun 18, 2026
Merged

Fix #394: in-frame stop-codon deletion with empty 3' UTR is a Deletion#395
iskandr merged 2 commits into
mainfrom
fix-394-stoploss-empty-3p-utr

Conversation

@iskandr

@iskandr iskandr commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Problem

Closes #394.

An in-frame deletion that removes the stop codon of a transcript with no 3' UTR sequence crashes:

from pyensembl import ensembl_grch37
import varcode
v = varcode.Variant(contig='16', start=30128151, ref='GGGATGCCTACGTGCCCCC', alt='G', ensembl=ensembl_grch37)
v.effects()
# ValueError: If no amino acids added by StopLoss then it should be Silent

(The reporter's original start=30128150 is off-by-one against the GRCh37 reference; 30128151 is the equivalent ref-matching coordinate. Same MAPK3 region, same crash.)

Root cause

translate_in_frame_mutation sets using_three_prime_utr=True whenever the mutation runs past the reference stop codon — even when the transcript has no 3' UTR sequence to translate into (transcript.three_prime_utr_sequence == "", e.g. MAPK3-006 / ENST00000395199). predict_in_frame_coding_effect then takes the StopLoss branch with an empty aa_alt, which the StopLoss constructor rejects.

Fix

Only emit StopLoss when readthrough actually adds residues (n_aa_alt > 0); otherwise fall through to the existing n_aa_alt == 0 branch, which reports a C-terminal Deletion. With no UTR sequence we cannot predict an extended protein, so a Deletion is the honest classification — matching the suggestion in #394.

The earlier #246 fix only handled transcripts with a non-empty 3' UTR.

-    elif using_three_prime_utr:
+    elif using_three_prime_utr and n_aa_alt > 0:
         return StopLoss(...)
     elif n_aa_alt == 0:
         return Deletion(...)

Tests

The reported MAPK3 variant conveniently exercises both sides of the fix from a single input:

  • ENST00000395199 (no 3' UTR) → Deletion p.GGT354del — the fixed crash
  • ENST00000403394 (804 nt 3' UTR) → StopLoss readthrough — proves the Deletion fallback wasn't over-broadened
  • v.effects() across all MAPK3 transcripts no longer raises
  • plus a splice-free unit test of predict_in_frame_coding_effect for the empty-UTR branch

Added to tests/test_stop_codon_classification_bugs.py. Full suite passes (the one unrelated test_filter_effects_by_transcript_expression failure is pre-existing on main — it's sensitive to the locally-installed Ensembl release).

https://claude.ai/code/session_0149VWj5Rm1rYFf9azu4ry62

An in-frame deletion that removes the stop codon of a transcript with no
3' UTR sequence (e.g. MAPK3-006 / ENST00000395199, whose
three_prime_utr_sequence is "") raised:

    ValueError: If no amino acids added by StopLoss then it should be Silent

translate_in_frame_mutation sets using_three_prime_utr=True whenever the
mutation runs past the reference stop codon, even when there is no UTR
sequence to translate into. predict_in_frame_coding_effect then took the
StopLoss branch with an empty aa_alt, which the StopLoss constructor
rejects.

Only emit StopLoss when readthrough actually adds residues (n_aa_alt > 0);
otherwise fall through to the existing n_aa_alt == 0 branch, which reports
a C-terminal Deletion. This is the honest classification: with no UTR
sequence we cannot predict an extended protein.

The earlier #246 fix only covered transcripts with a non-empty 3' UTR.

Tests: the reported MAPK3 variant exercises both sides from one input —
ENST00000395199 (no UTR) -> Deletion, ENST00000403394 (804nt UTR) ->
StopLoss readthrough — plus a splice-free unit test of the branch.

Claude-Session: https://claude.ai/code/session_0149VWj5Rm1rYFf9azu4ry62
@iskandr iskandr force-pushed the fix-394-stoploss-empty-3p-utr branch from 02538af to 6858f3e Compare June 18, 2026 18:44
The default annotator (protein_diff -> classify_from_protein_diff)
independently mislabeled the same empty-3'UTR stop deletion as a
PrematureStop ("p.G355*") on a clean in-exon variant. That's wrong: the
mutant CDS contains no stop codon at all — the protein simply runs off
the end of the available sequence. A PrematureStop requires a stop codon
to exist.

Guard the PrematureStop branch so it only fires when the mutant
transcript actually terminated at a stop codon (i.e. translation did not
consume every available codon). When translation ran off the end, fall
through to the C-terminal Deletion classification, matching the in-frame
predictor. Genuine premature stops (a real stop codon in the mutant CDS)
are unaffected.

Now both annotators agree this scenario is a Deletion, honoring the #394
decision that an empty-UTR stop deletion is a C-terminal deletion rather
than a stop-loss with no predictable readthrough.

Tests: clean in-exon deletion via the default annotator -> Deletion;
genuine premature-stop insertion (BRCA1) -> still PrematureStop.

Bump version to 6.0.1 and date the CHANGELOG entry.

Claude-Session: https://claude.ai/code/session_0149VWj5Rm1rYFf9azu4ry62
@iskandr iskandr merged commit 18f32b9 into main Jun 18, 2026
8 checks passed
@iskandr iskandr deleted the fix-394-stoploss-empty-3p-utr branch June 18, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error with StopLoss variant effect with an empty 3' UTR

1 participant