Problem
The demo datasets in app/public/data/*.parquetbundle still carry the legacy pre-binned columns length_fixed and length_quantile (string bin ranges like 50-100, 200-400). These were removed in protspace v4.0.0 in favour of a single raw numeric length that the frontend bins on the fly (fixed/quantile strategies in the legend).
The current default bundle (app/public/data.parquetbundle) was already regenerated and uses length. The other selectable demo datasets were not, so they still show the stale columns — which now surface in the annotation dropdown under "Other" (prettified as "Length fixed" / "Length quantile") with no description.
Affected bundles (all under app/public/data/):
5K, 40K, 35K_ec_brenda, 7K_toxprot, 105K_homoSapiens_drosophilaMelanogaster, 127K_beta_lactamase, 573K_swissprot, beta_lactamase_ec, beta_lactamase_pn, phosphatase.
Why it can't be fixed in place
The bundles store only the bin-range strings, not the raw lengths, so a correct numeric length cannot be reconstructed by editing the parquet — the source pipeline must regenerate it.
Proposed fix
Regenerate each demo bundle through the protspace pipeline so it emits a single numeric length (and drops length_fixed/length_quantile), mirroring the toxprot demo regeneration (protspace/scripts/generate_toxprot_demo.py, design: protspace/docs/superpowers/specs/2026-04-30-toxprot-demo-regeneration-design.md). Requires the source .h5 embeddings + annotation fetches per dataset.
Context
Surfaced while implementing the predicted-annotations / annotation-docs work (#221). The annotation-metadata registry already maps length → numeric "Sequence length"; unknown columns degrade gracefully, which is why the stale columns currently appear under "Other".
Problem
The demo datasets in
app/public/data/*.parquetbundlestill carry the legacy pre-binned columnslength_fixedandlength_quantile(string bin ranges like50-100,200-400). These were removed in protspace v4.0.0 in favour of a single raw numericlengththat the frontend bins on the fly (fixed/quantile strategies in the legend).The current default bundle (
app/public/data.parquetbundle) was already regenerated and useslength. The other selectable demo datasets were not, so they still show the stale columns — which now surface in the annotation dropdown under "Other" (prettified as "Length fixed" / "Length quantile") with no description.Affected bundles (all under
app/public/data/):5K,40K,35K_ec_brenda,7K_toxprot,105K_homoSapiens_drosophilaMelanogaster,127K_beta_lactamase,573K_swissprot,beta_lactamase_ec,beta_lactamase_pn,phosphatase.Why it can't be fixed in place
The bundles store only the bin-range strings, not the raw lengths, so a correct numeric
lengthcannot be reconstructed by editing the parquet — the source pipeline must regenerate it.Proposed fix
Regenerate each demo bundle through the protspace pipeline so it emits a single numeric
length(and dropslength_fixed/length_quantile), mirroring the toxprot demo regeneration (protspace/scripts/generate_toxprot_demo.py, design:protspace/docs/superpowers/specs/2026-04-30-toxprot-demo-regeneration-design.md). Requires the source.h5embeddings + annotation fetches per dataset.Context
Surfaced while implementing the predicted-annotations / annotation-docs work (#221). The annotation-metadata registry already maps
length→ numeric "Sequence length"; unknown columns degrade gracefully, which is why the stale columns currently appear under "Other".