Summary
In the default Swiss-Prot dataset, the CATH-Gene3D superfamily 6.20.10.10 is labeled "Laminin", but that superfamily has no name in CATH (6.20.10.10 on cathdb). Only 2.10.25.10 legitimately carries the "Laminin" name (2.10.25.10 on cathdb). The name is being incorrectly propagated onto an unnamed superfamily.
Root cause
_parse_cath_names() in src/protspace/data/annotations/retrievers/cath_names.py fills every unnamed 4-level superfamily with its parent topology (3-level) name (lines 101–105):
# Fill unnamed superfamilies with parent topology name
for code in unnamed_superfamilies:
parent = ".".join(code.split(".")[:3])
if parent in names:
names[code] = names[parent]
Evidence from the CATH names file
From the official cath-names.txt (latest release):
2.10.25 3sovA02 :Laminin
2.10.25.10 3sovA02 :Laminin <- superfamily explicitly named "Laminin" (correct)
6.20.10 3s6xC01 :Laminin <- topology named "Laminin"
6.20.10.10 1lmmA01 : <- superfamily has NO name in CATH
6.20.10.20 3s6xC01 : <- also unnamed
6.20.10.30 4glxA05 : <- also unnamed
2.10.25.10 is explicitly assigned "Laminin" → correct. 6.20.10.10 has an empty name → the fallback copies the parent topology 6.20.10's name "Laminin" onto it → wrong.
Why it matters
- The label "Laminin" is not assigned to superfamily
6.20.10.10 by CATH; showing it misrepresents the annotation.
- All sibling unnamed superfamilies under a named topology collapse to the same label —
6.20.10.10, 6.20.10.20, and 6.20.10.30 all become "Laminin", making three distinct superfamilies indistinguishable in the legend.
This is currently intentional behavior (see the module docstring and the test_unnamed_superfamily_inherits_topology test in tests/.../test_cath_names.py), but the resulting labels don't reflect CATH and silently lose information.
Suggested fixes (to discuss)
- Don't inherit — keep the superfamily code (e.g.
6.20.10.10) as the label when CATH has no name, so siblings stay distinct and no false name is shown.
- Inherit but disambiguate — e.g.
Laminin (6.20.10.10) so the parent-derived name is visible but the superfamily remains identifiable.
- Code as identity, name as secondary — keep the CATH code as the identity and only use the topology name as a tooltip/secondary display.
Affected
- File:
src/protspace/data/annotations/retrievers/cath_names.py (_parse_cath_names, lines ~98–105)
- Consumers:
InterProRetriever._resolve_entry_names(), TedRetriever._resolve_cath_name()
- Surfaces in the default dataset shipped to protspace.app
Related: #56 (CATH/InterPro annotation name sanitization).
Summary
In the default Swiss-Prot dataset, the CATH-Gene3D superfamily
6.20.10.10is labeled "Laminin", but that superfamily has no name in CATH (6.20.10.10 on cathdb). Only2.10.25.10legitimately carries the "Laminin" name (2.10.25.10 on cathdb). The name is being incorrectly propagated onto an unnamed superfamily.Root cause
_parse_cath_names()insrc/protspace/data/annotations/retrievers/cath_names.pyfills every unnamed 4-level superfamily with its parent topology (3-level) name (lines 101–105):Evidence from the CATH names file
From the official
cath-names.txt(latest release):2.10.25.10is explicitly assigned "Laminin" → correct.6.20.10.10has an empty name → the fallback copies the parent topology6.20.10's name "Laminin" onto it → wrong.Why it matters
6.20.10.10by CATH; showing it misrepresents the annotation.6.20.10.10,6.20.10.20, and6.20.10.30all become "Laminin", making three distinct superfamilies indistinguishable in the legend.This is currently intentional behavior (see the module docstring and the
test_unnamed_superfamily_inherits_topologytest intests/.../test_cath_names.py), but the resulting labels don't reflect CATH and silently lose information.Suggested fixes (to discuss)
6.20.10.10) as the label when CATH has no name, so siblings stay distinct and no false name is shown.Laminin (6.20.10.10)so the parent-derived name is visible but the superfamily remains identifiable.Affected
src/protspace/data/annotations/retrievers/cath_names.py(_parse_cath_names, lines ~98–105)InterProRetriever._resolve_entry_names(),TedRetriever._resolve_cath_name()Related: #56 (CATH/InterPro annotation name sanitization).