Skip to content

Regenerate demo bundles: replace stale length_fixed/length_quantile with numeric length #271

Description

@tsenoner

Problem

The demo datasets in app/public/data/*.parquetbundle still carry the legacy pre-binned columns length_fixed and length_quantile (string bin ranges like 50-100, 200-400). These were removed in protspace v4.0.0 in favour of a single raw numeric length that the frontend bins on the fly (fixed/quantile strategies in the legend).

The current default bundle (app/public/data.parquetbundle) was already regenerated and uses length. The other selectable demo datasets were not, so they still show the stale columns — which now surface in the annotation dropdown under "Other" (prettified as "Length fixed" / "Length quantile") with no description.

Affected bundles (all under app/public/data/):
5K, 40K, 35K_ec_brenda, 7K_toxprot, 105K_homoSapiens_drosophilaMelanogaster, 127K_beta_lactamase, 573K_swissprot, beta_lactamase_ec, beta_lactamase_pn, phosphatase.

Why it can't be fixed in place

The bundles store only the bin-range strings, not the raw lengths, so a correct numeric length cannot be reconstructed by editing the parquet — the source pipeline must regenerate it.

Proposed fix

Regenerate each demo bundle through the protspace pipeline so it emits a single numeric length (and drops length_fixed/length_quantile), mirroring the toxprot demo regeneration (protspace/scripts/generate_toxprot_demo.py, design: protspace/docs/superpowers/specs/2026-04-30-toxprot-demo-regeneration-design.md). Requires the source .h5 embeddings + annotation fetches per dataset.

Context

Surfaced while implementing the predicted-annotations / annotation-docs work (#221). The annotation-metadata registry already maps length → numeric "Sequence length"; unknown columns degrade gracefully, which is why the stale columns currently appear under "Other".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions