[BUG] Sanitize '|' inside CATH/InterPro annotation names

## Context

Sibling of #56 (semicolon sanitization). ProtSpace encodes categorical annotation values as `accession (name)|score1,score2` or `accession (name)|EVIDENCE_CODE`, joining multiple hits with `;`. The `|` separates the **label** from a trailing **score/evidence** suffix.

Frontend parser (`packages/core/src/components/data-loader/utils/conversion.ts` `parseAnnotationValue`, ~L186-220):
- Splits on the **last** pipe: `const lastPipe = trimmed.lastIndexOf('|')`.
- Suffix after it is interpreted as numeric scores, or an evidence code matching `EVIDENCE_CODE_RE = /^(?:[A-Z]{2,5}|ECO:\d+)$/`, otherwise the whole string is kept as the label.

## Severity: lower than #56 (defensive / contract integrity)

Unlike `;`, an in-name `|` does **not** currently corrupt InterPro/CATH output, because:
1. The parser uses `lastIndexOf('|')`, so a real trailing `|score` is always found first and any in-name `|` stays in the label.
2. Names are wrapped in `(name)`, and the score is emitted **after** the closing paren — so for a value with no score, the suffix after an in-name `|` ends in `)` (e.g. `EXP)`), which fails both the numeric check and `EVIDENCE_CODE_RE` (anchored, no `)`). It falls back to "whole string is the label".

So `accession (name)|score` is robust against in-name `|` **as long as names stay inside the parentheses and the score stays outside them**. The risk is real for any value **not** in that parenthesized shape, or if that invariant ever changes.

### Genuine failure modes the backend should preclude
- A bare (non-parenthesized) `label|suffix` value whose label legitimately ends in `|` + 2–5 uppercase letters → mis-read as an evidence code. Example: a name `GO:12345|EXP` (no parens, no score) → frontend yields `label="GO:12345"`, `evidence="EXP"` (wrong).
- A label ending in `|<number>` with no real score → the number is stripped as a score.
- Any future producer change that emits a name outside `( … )` or a score inside it.

Note: the frontend export path only renders already-parsed labels; it does not re-serialize `label|score`, so there is no round-trip data corruption — the impact is mis-parsed labels/scores/evidence in legend, tooltip, sorting and filtering.

## Fix

Mirror the `;` handling: sanitize/normalize `|` out of names before assembling the value, so the `accession (name)|score` contract is unambiguous regardless of value shape.
- `src/protspace/data/annotations/retrievers/interpro_retriever.py` — `acc_with_name = f"{acc} ({name})"` (~L362) and the `|`-joined score line (~L369): strip/replace `|` in `name` (e.g. with `/` or `,`).
- `src/protspace/data/annotations/retrievers/cath_names.py` — name read verbatim (~L93): same sanitization at the source.
- Ideally sanitize `;` and `|` (and normalize) in one shared name-cleaning step.
- Add tests with names containing `|`.

## Related

- #56 — sanitize `;` in CATH/InterPro names (sibling).
- Frontend `;` repair for existing bundles: tsenoner/protspace_web#282.

## Acceptance criteria

- Generated annotation names contain no `|` (and no `;`).
- Documented contract (`docs/annotations.md`) holds: the only structural `|` is the label↔score/evidence separator.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Sanitize '|' inside CATH/InterPro annotation names #58

Context

Severity: lower than #56 (defensive / contract integrity)

Genuine failure modes the backend should preclude

Fix

Related

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] Sanitize '|' inside CATH/InterPro annotation names #58

Description

Context

Severity: lower than #56 (defensive / contract integrity)

Genuine failure modes the backend should preclude

Fix

Related

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions