Skip to content

High-resolution morton location channel for ragged/CSR fieldsΒ #87

Description

@espg

πŸ€– from Claude

Motivation

Ragged/CSR fields (#48 β€” e.g. the per-cell t-digest in the new atl03_tdigest_healpix.yaml) currently carry values per cell. A natural extension is a ragged field that also carries per-observation locations β€” a sketch that records where its contributions came from, not just their magnitudes.

The grid's cell_ids/morton coordinates are deliberately coarse (one value per cell at child_order), so they can't serve as a per-observation location channel. And storing per-obs (lat, lon) float pairs is wasteful and lossy-by-convention. A high-resolution morton index (order ~26/27, up to mortie's max of 29) encodes a position as a single integer to sub-meter precision and composes directly with mortie's geometric primitives for distance / containment / neighborhood queries β€” no decode-to-lat/lon round-trip.

What this needs

  1. Retain the per-observation high-resolution morton. HealpixGrid.assign() now resolves points at order 29 before coarsening (PR Lift HEALPix assign reference order to 29 + order-19 t-digest / gain-bias templatesΒ #86 lifted the reference order 18 β†’ 29), so the full-resolution index is computed β€” but it is discarded once cells_of groups observations by cell. A location-carrying ragged field needs that per-obs index retained as a column the reducer sees.
  2. A reducer that packs locations into CSR. Alongside (or within) a value sketch, emit per-observation (or per-centroid) morton indices as part of the ragged payload, written via write_ragged_to_zarr.
  3. Reader + geometric ops. A reader that exposes the location channel and operates on it via the mortie geometric primitives (espg/mortie MCP ServerΒ #59 and read_plan: select mortie linestring vs shapely by grid type (HEALPix -> mortie)Β #53). Exact signatures to be pinned once those land in a mortie release β€” they're outside this repo's scope to link here.

Open questions

  • Resolution: fixed (26/27/29) or configurable per field? Precision vs. index width / storage.
  • Granularity: one morton index per observation, or per t-digest centroid? (A centroid is a merged group, so its "location" would be an aggregate β€” a representative/centroid morton β€” which itself may want a mortie primitive to compute.)
  • Schema surface: a new field attribute (e.g. location: morton on a ragged field) vs. a paired companion ragged field carrying the location channel.
  • mortie primitives: which ops does the reader need (nearest-neighbor, within-radius, cell containment)? Confirm #59/#53 cover them.
  • Layout: independent ragged field vs. an extended CSR payload interleaving value + location.

References

Drafted for review β€” labelled plan; redirect scope/approach on the thread.

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockedflag for claudeenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions