Skip to content

Add Cell Annotation AnnData serializer, schema validator, and atomic .h5ad writer#72

Draft
Copilot wants to merge 3 commits into
cell_annotationfrom
copilot/hartmannlab-15-implement-ann-data-serializer
Draft

Add Cell Annotation AnnData serializer, schema validator, and atomic .h5ad writer#72
Copilot wants to merge 3 commits into
cell_annotationfrom
copilot/hartmannlab-15-implement-ann-data-serializer

Conversation

Copilot AI commented Mar 17, 2026

Copy link
Copy Markdown

This PR implements the Cell Annotation checkpoint artifact as an AnnData .h5ad, with the agreed schema persisted in uns, checksum tracking, validation, and atomic writes. It also adds focused coverage for round-trip fidelity and schema invariants, plus a checked-in fixture for serializer consumers.

  • Serializer / artifact schema

    • Adds serialize_heatmap_state(display, *, flowsom, meta) -> AnnData in ueler/viewer/plugin/cell_annotation/serialize.py
    • Persists the required Cell Annotation schema in uns, including:
      • artifact with schema hash and checksums
      • ui ordering/orientation state
      • palette
      • zscore_params
      • filters
      • row_linkage and row_linkage_basis
      • marker_sets
      • flowsom
      • checkpoint
    • Stores:
      • X as z-scored medians (float32)
      • layers["median"] as raw medians when present
      • optional obsm embeddings
  • Validation

    • Adds validate_artifact(path_or_adata) -> AnnData
    • Enforces schema presence and cross-block invariants:
      • flowsom.training_markers == marker_sets.training
      • row_linkage_basis.marker_ids == marker_sets.linkage
      • row_order / col_order are permutations of the axes
      • selected_channels_ordered references known markers
      • z-score metadata is present and aligned with var_names
      • palette values are valid hex colors
    • Verifies checksum integrity for matrix payloads and on-disk .h5ad content
  • Atomic persistence

    • Adds write_h5ad_atomic(adata, dst_path)
    • Uses the existing atomic replace helper pattern to avoid visible partial artifacts
    • Writes a stable embedded file checksum into the artifact metadata
  • API surface

    • Exports the serializer, validator, and atomic writer from ueler.viewer.plugin.cell_annotation
  • Fixtures / coverage

    • Adds tests/test_cell_annotation_serialize.py
    • Adds a round-trip fixture under tests/data/cell_annotation_roundtrip_fixture.h5ad
    • Covers schema presence, checksum mismatch detection, invalid color rejection, order validation, z-score metadata requirements, and atomic-write cleanup behavior
  • Dependency updates

    • Declares AnnData support in pyproject.toml
    • Constrains numpy / pandas ranges to stay compatible with the existing codebase assumptions

Example usage:

from ueler.viewer.plugin.cell_annotation import (
    serialize_heatmap_state,
    validate_artifact,
    write_h5ad_atomic,
)

adata = serialize_heatmap_state(display, flowsom=flowsom, meta=meta)
write_h5ad_atomic(adata, checkpoint_path)
restored = validate_artifact(checkpoint_path)
Original prompt

This section details on the original issue you should resolve

<issue_title>Sub-issue: AnnData Serializer, Schema & Validator</issue_title>
<issue_description># Serializer: AnnData artifact + schema + validator + atomic writer

Tracked by: #15
Milestone: M2 — Serialization & Manifest
Labels: area:cell-annotation, area:storage, type:feature, size:M, priority:P1

Summary

Implement serialize_heatmap_state(display, flowsom, meta) -> AnnData, persist the agreed schema (marker sets, checkpoint, linkage, z-score provenance, etc.), compute checksums, and add a robust validator. Provide an atomic .h5ad writer.

Scope

In

  • plugins/cell_annotation/serialize.py
    • serialize_heatmap_state(display: dict, *, flowsom: dict, meta: dict) -> anndata.AnnData
    • validate_artifact(path_or_adata) -> anndata.AnnData
    • write_h5ad_atomic(adata, dst_path) (uses store’s atomic helpers)
  • Persist in uns:
    • artifact {version, schema_hash, checksums}
    • ui {orientation,row_sort,col_sort,selected_channels_ordered}
    • palette {meta_cluster_colors_present, meta_cluster_colors_all}
    • zscore_params {method, per_marker stats, clipped}
    • filters {expr, structured, source}
    • row_linkage, row_linkage_basis {marker_ids, distance}
    • marker_sets {training, display_extra, available, linkage, expanded_training, panel}
    • flowsom {training_markers, imputation/projection blocks, availability, seed/grid/params/deps, hashes}
    • checkpoint {id:uuidv7, parents:[], op, step_id?, description, created_at, producer, id_namespace}
  • Matrices:
    • X = z-scored medians (float32; rows=clusters, cols=markers)
    • layers["median"] = raw medians (float32; optional)
    • obsm embeddings (optional)
  • Chunking defaults for .h5ad and checksum (sha256) computation

Out

  • No UI; manifest update happens in the Manifest sub-issue
  • No Zarr yet (optional later)

Deliverables

  • Serializer + validator module with docstrings
  • Unit-tested invariants & checksum checks
  • Example round-trip fixture saved under tests/data/

Acceptance Criteria

  • Round-trip save → load returns identical obs_names, var_names, row_order, col_order
  • uns contains all required blocks; flowsom.training_markers == marker_sets.training
  • row_linkage_basis.marker_ids == marker_sets.linkage
  • Checksums in uns.artifact.checksums match disk bytes
  • Validator fails loudly for: bad hex colors, non-permutation orders, missing z-score params when X is marked z-scored
  • Atomic write leaves no visible partials on crash simulation

Test Plan

  • Unit: schema presence & types; z-score params consistency; checksum mismatch detection
  • Integration: use dummy display and flowsom dicts → write/read → compare fidelity
  • Perf: write time under 2s @ 2k clusters × 60 markers with float32 chunks

Dependencies

Comments on the Issue (you are @copilot in this section)


📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

Copilot AI and others added 2 commits March 18, 2026 00:04
Co-authored-by: yulewu <38241047+yulewu@users.noreply.github.com>
Co-authored-by: yulewu <38241047+yulewu@users.noreply.github.com>
Copilot AI changed the title [WIP] [HartmannLab/UELer#15] Implement AnnData serializer and validator Add Cell Annotation AnnData serializer, schema validator, and atomic .h5ad writer Mar 18, 2026
Copilot AI requested a review from yulewu March 18, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants