feat(statistics): consume projection statistics in the bundle pipeline#295
Draft
jcoludar wants to merge 7 commits into
Draft
feat(statistics): consume projection statistics in the bundle pipeline#295jcoludar wants to merge 7 commits into
jcoludar wants to merge 7 commits into
Conversation
The protspace_web half of the projection-statistics MVP (engine PR: tsenoner/protspace#61; tracking issue #219). The prep service folds engine-computed stats into the bundle, the reader accepts the optional fifth part, and the UI surfaces the new stage. - data-loader/bundle.ts (+ @protspace/utils bundle-writer): accept 3-5 parts; branch on an empty settings slot (statistics-without-settings) rather than the raw delimiter count. The statistics part is parsed-but-ignored for now — rendering is a committed follow-up, out of scope here. - services/protspace-prep: a best-effort `stats` step AFTER the core bundle and OUTSIDE the pipeline timeout, with its own nested timeout, a bounded non-latching `protspace stats` version probe (lock + timeout + kill, no false-latch on transient errors), and an atomic temp-bundle + os.replace re-bundle so a stats timeout/kill can never corrupt or lose the shipped bundle. - app: a `computing_statistics` SSE stage (progress 95%, "Computing statistics…"). - openspec/changes/add-projection-statistics: proposal, design, spec, tasks. - Invert the stale "five-part bundles are rejected" test; add round-trip and zero-byte settings-slot coverage. Depends on the engine PR (tsenoner/protspace#61) and a stats-bearing protspace release; the prep step feature-probes the subcommand and no-ops if absent, so this is safe to merge ahead of the release (stats simply don't appear yet). Refs #219 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 24, 2026
CI's `quality:ci` runs `format:check` before lint/quality and prettier flagged this openspec doc (markdown list-continuation indentation). Pure formatting — no content change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ction metadata Phase 1B of route-projection-statistics. The engine now folds per-projection faithfulness (kNN-overlap / trustworthiness / continuity) into each projection's info_json under a `quality` object. The projection-metadata panel flattened info_json only one level, so a nested `quality` rendered as a raw JSON.stringify blob. - Extract the metadata-row building into a pure, tested helper (projection-metadata-helpers.ts), matching the package's *-helpers pattern. - Expand info_json.quality into discrete per-metric rows: each shows its value plus compact provenance (distance metric, k); a skipped metric (value null) renders as N/A with its marker; a flat scalar shape is tolerated. - projection-metadata.ts now delegates to the helper (no behavior change beyond the quality expansion). Tests: projection-metadata-helpers.test.ts (flatten, quality expansion, skip, flat-scalar). Full core suite green (1073). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gn/spec/tasks) The design for routing each statistic to the bundle part whose existing frontend consumer matches its granularity (tsenoner's PR #61 review): faithfulness → projections_metadata.info_json.quality, per-protein cluster/silhouette → protein_annotations, aggregate validity → statistics.parquet. Includes the 4-lens fan-out review outcomes and the phased plan (Phase 1 low-risk routing; Phase 2 per-protein annotations behind a flag). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bundle Phase 2A of route-projection-statistics. `protspace stats` now enriches the annotations parquet in place with per-protein cluster-membership + silhouette columns when given -a; the prep re-bundle step passes the annotations path so the following `bundle -a` carries those columns (and faithfulness rides in projections_metadata, both from the same stats call). Still best-effort: any stats failure leaves the core bundle untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gs part Phase 2A.4 of route-projection-statistics. The stats step now writes the auto-generated cluster-membership legend styles (`stats --settings-out cluster_styles.json`) and the re-bundle folds them in (`bundle --settings ...`) so clusters are colored when selected. The --settings flag is only added when the styles file exists, so an older engine without --settings-out degrades gracefully (columns still ship, just without pre-baked colors). Still best-effort. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… section Phase 2B of route-projection-statistics. The color-by dropdown already auto-discovers the computed `cluster_<proj>` / `silhouette_<proj>` annotation columns (no allowlist hides them, and content-based inference types membership as categorical and silhouette as continuous). This adds a dedicated "Statistics" section to `groupAnnotations` so the ~12 computed columns (one pair per projection) don't flood the catch-all "Other" group. Test: cluster_/silhouette_ columns land in Statistics (sorted), non-computed labels stay in Other. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The protspace_web half of the projection-statistics MVP (issue #219). The engine
(
protspace) computes per-projection cluster-validity + faithfulness statistics and bakes them intothe
.parquetbundleas an optional fifth part; this PR makes the web side produce and toleratethat part. Rendering of the statistics is a deliberate follow-up — out of scope here.
What's in this PR
@protspace/coredata-loader +@protspace/utils): accept 3–5 parts.The reader now branches on an empty settings slot (statistics-without-settings ⇒ a zero-byte
4th part) rather than the raw delimiter count, so the optional fifth
statistics.parquetis readwithout error. The statistics part is parsed-but-ignored for now;
createParquetBundlere-export still drops it (documented).
services/protspace-prep): a best-effortstatsstep that runs after thecore bundle is produced and outside the pipeline timeout budget, so it can never cost the job
or lose the bundle. It has:
stats_timeout_seconds), caught locally so it never reaches theparent handler;
protspace stats(single-flight lock + hardtimeout + kill of a hung subprocess; transient spawn errors are not latched, so a later job
retries);
.parquetbundleandos.replaces it, so astats timeout/kill mid-write can't corrupt the already-shipped bundle.
computing_statisticsSSE stage wired throughFastaPrepStageand the exploreruntime (progress creep stopped, 95%, "Computing statistics…").
added 5-part round-trip + zero-byte settings-slot coverage;
services/protspace-prep/testsexercise the success / timeout / probe-absent paths.
openspec/changes/add-projection-statistics/(proposal, design, spec, tasks).Verification
@protspace/corebundle suite: 24 passed (incl. the inverted 5-part + zero-byte-slot tests).protspace-preppipeline suite: 19 passed.lint-staged && quality && docs:annotations:check && docs:build): green.Data-format change
Additive, backward compatible — existing 3- and 4-part bundles read and write unchanged.
Refs #219