You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ragged/CSR fields (#48 β e.g. the per-cell t-digest in the new atl03_tdigest_healpix.yaml) currently carry values per cell. A natural extension is a ragged field that also carries per-observation locations β a sketch that records where its contributions came from, not just their magnitudes.
The grid's cell_ids/morton coordinates are deliberately coarse (one value per cell at child_order), so they can't serve as a per-observation location channel. And storing per-obs (lat, lon) float pairs is wasteful and lossy-by-convention. A high-resolution morton index (order ~26/27, up to mortie's max of 29) encodes a position as a single integer to sub-meter precision and composes directly with mortie's geometric primitives for distance / containment / neighborhood queries β no decode-to-lat/lon round-trip.
What this needs
Retain the per-observation high-resolution morton.HealpixGrid.assign() now resolves points at order 29 before coarsening (PR Lift HEALPix assign reference order to 29 + order-19 t-digest / gain-bias templatesΒ #86 lifted the reference order 18 β 29), so the full-resolution index is computed β but it is discarded once cells_of groups observations by cell. A location-carrying ragged field needs that per-obs index retained as a column the reducer sees.
A reducer that packs locations into CSR. Alongside (or within) a value sketch, emit per-observation (or per-centroid) morton indices as part of the ragged payload, written via write_ragged_to_zarr.
Resolution: fixed (26/27/29) or configurable per field? Precision vs. index width / storage.
Granularity: one morton index per observation, or per t-digest centroid? (A centroid is a merged group, so its "location" would be an aggregate β a representative/centroid morton β which itself may want a mortie primitive to compute.)
Schema surface: a new field attribute (e.g. location: morton on a ragged field) vs. a paired companion ragged field carrying the location channel.
mortie primitives: which ops does the reader need (nearest-neighbor, within-radius, cell containment)? Confirm #59/#53 cover them.
Layout: independent ragged field vs. an extended CSR payload interleaving value + location.
π€ from Claude
Motivation
Ragged/CSR fields (#48 β e.g. the per-cell t-digest in the new
atl03_tdigest_healpix.yaml) currently carry values per cell. A natural extension is a ragged field that also carries per-observation locations β a sketch that records where its contributions came from, not just their magnitudes.The grid's
cell_ids/mortoncoordinates are deliberately coarse (one value per cell atchild_order), so they can't serve as a per-observation location channel. And storing per-obs(lat, lon)float pairs is wasteful and lossy-by-convention. A high-resolution morton index (order ~26/27, up to mortie's max of 29) encodes a position as a single integer to sub-meter precision and composes directly with mortie's geometric primitives for distance / containment / neighborhood queries β no decode-to-lat/lon round-trip.What this needs
HealpixGrid.assign()now resolves points at order 29 before coarsening (PR Lift HEALPix assign reference order to 29 + order-19 t-digest / gain-bias templatesΒ #86 lifted the reference order 18 β 29), so the full-resolution index is computed β but it is discarded oncecells_ofgroups observations by cell. A location-carrying ragged field needs that per-obs index retained as a column the reducer sees.write_ragged_to_zarr.Open questions
location: mortonon a ragged field) vs. a paired companion ragged field carrying the location channel.#59/#53cover them.References
assignreference order to 29 (the enabler) + the t-digest / gain-bias order-19 test templates.src/zagg/stats/tdigest.py,src/zagg/csr.py,src/zagg/readers/tdigest_tensor.py.Drafted for review β labelled
plan; redirect scope/approach on the thread.