Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .binder/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@
# there is a single source of truth for the notebook dependencies (pyproject.toml).
#
# Only PyPI/conda-forge resolvable packages appear here: no ``git+`` URLs and no
# spherely fork. The three Binder-runnable notebooks
# (custom_aggregations, rasterized_zarr, jupyterhub_example) read the public,
# anonymous source.coop store or synthetic data, so the exact-S2 spherely
# SpatialIndex backend is never on their import path.
# spherely fork. The Binder-runnable notebooks read synthetic data, the public
# anonymous source.coop store, or anonymous CMR-STAC granule *metadata*
# (jupyterhub_example, shardmap_viewer) -- all of which use the default HEALPix
# ``mortie`` backend, so the exact-S2 spherely SpatialIndex backend is never on
# their import path.
name: zagg-binder
channels:
- conda-forge
Expand Down
12 changes: 7 additions & 5 deletions .binder/postBuild
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@
# repo2docker runs this after building the conda environment.
#
# Install zagg from the checked-out repo together with its ``analysis`` extra
# (the notebook runtime: jupyter/notebook, xarray, cartopy, matplotlib, ...) and
# (the notebook runtime: jupyter/notebook, xarray, cartopy, matplotlib, ...),
# the ``catalog`` extra (stac-geoparquet), which the anonymous CMR-STAC
# catalog-build cell in jupyterhub_example needs. Keeping the dependency set in
# catalog-build cells in jupyterhub_example / shardmap_viewer need, and the
# ``viz`` extra (ipyleaflet -- a pure-Python Jupyter widget) for the
# shardmap_viewer notebook's interactive map. Keeping the dependency set in
# pyproject.toml's extras means the Binder image and a local
# ``pip install "zagg[analysis,catalog]"`` resolve the same way.
# ``pip install "zagg[analysis,catalog,viz]"`` resolve the same way.
#
# We deliberately do NOT install:
# * the spherely exact-S2 fork (not on PyPI; not needed -- catalog building
Expand All @@ -23,8 +25,8 @@ set -euo pipefail
# irrelevant for running the example notebooks).
git fetch --tags --unshallow 2>/dev/null || git fetch --tags 2>/dev/null || true
if git describe --tags --abbrev=0 >/dev/null 2>&1; then
python -m pip install --no-cache-dir ".[analysis,catalog]"
python -m pip install --no-cache-dir ".[analysis,catalog,viz]"
else
HATCH_VCS_PRETEND_VERSION="0.0.0+binder" \
python -m pip install --no-cache-dir ".[analysis,catalog]"
python -m pip install --no-cache-dir ".[analysis,catalog,viz]"
fi
7 changes: 7 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,13 @@ This produces a JSON file (e.g., `shardmap_ATL06_2024-01-06_2024-04-07.json`) th
parent morton cells to the S3 URLs of HDF5 granules containing data for those
cells. The processing step consumes this file.

To inspect the chunking interactively -- shard outlines, granule footprints,
and a grid that appears on zoom -- use the shard-map viewer
(`pip install zagg[viz]`). See the
[shard-map viewer notebook](https://github.com/englacial/zagg/blob/main/notebooks/shardmap_viewer.ipynb),
which runs on a synthetic example (no network needed) and includes manual
in-browser verification instructions.

## Local Processing

The simplest path -- no AWS Lambda needed:
Expand Down
254 changes: 254 additions & 0 deletions notebooks/shardmap_viewer.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "cell-0",
"metadata": {},
"source": "# Shard-map viewer\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/englacial/zagg/main?urlpath=lab/tree/notebooks/shardmap_viewer.ipynb)\n\n_Runs end-to-end on [Binder](https://mybinder.org/v2/gh/englacial/zagg/main?urlpath=lab/tree/notebooks/shardmap_viewer.ipynb): it queries real ICESat-2 ATL06 granule **metadata** anonymously from NASA CMR-STAC (no Earthdata Login -- the CMR-STAC search is anonymous) and renders with `ipyleaflet`. The Binder image already provides `zagg[analysis,catalog,viz]` (incl. `stac-geoparquet` and `ipyleaflet`) via the repo's `.binder/` environment._\n\nThis notebook demonstrates the `zagg.viz` shard-map viewer (issue\n[#38](https://github.com/englacial/zagg/issues/38)) with **real ICESat-2\nATL06 granules** fetched anonymously from NASA CMR-STAC.\n\nThe viewer has two layers:\n\n- **Headless render core** (`zagg.viz.render_shardmap` and friends) — pure\n Python, no browser/widget stack. Turns a `ShardMap` (plus an optional\n `Catalog`) into WGS84 GeoJSON `FeatureCollection` dicts: shard outlines\n and granule footprints.\n- **ipyleaflet wrapper** (`zagg.viz.show_shardmap`) — builds an interactive\n context basemap + shard layer + toggleable granule-footprint layer.\n\n**Polar-aware projection.** `show_shardmap` picks the display CRS from the\nmap's extent: a polar AOI renders in NASA polar-stereographic (EPSG:3031\nAntarctic / EPSG:3413 Arctic) with a **GIBS** basemap — undistorted at the\npole — while mid-latitude AOIs stay on Web Mercator + OpenStreetMap.\n\n## Install\n\nOn Binder this is already set up by `.binder/postBuild`. Locally:\n\n```bash\npip install \"zagg[analysis,catalog,viz]\" # core + stac-geoparquet + ipyleaflet widget\n```\n\nThe `viz` extra pulls in `ipyleaflet` (the interactive map); `catalog` pulls\nin `stac-geoparquet` (the CMR-STAC catalog). **Requires an internet\nconnection** for the anonymous CMR-STAC query in section 1 — no credentials\nare needed.\n\n## Areas of interest\n\nThe notebook queries two AOIs:\n\n1. **Antarctic Peninsula** (`[-65, -70, -55, -64]` WGS84 lon/lat) — January\n 2020. Dense ICESat-2 coverage near the pole; O(10–40) granules in two\n weeks. Renders in EPSG:3031 (Antarctic Polar Stereographic).\n2. **Jakobshavn Glacier, West Greenland** (`[-52, 68, -45, 72]`) — June\n 2020. Second example showing the same pipeline on an Arctic AOI\n (EPSG:3413).\n\nBoth inputs (ShardMap JSON and STAC-geoparquet Catalog) are supported from\nday one — see section 3 for the round-trip to disk and reload."
},
{
"cell_type": "markdown",
"id": "cell-1",
"metadata": {},
"source": [
"## 1. Antarctic Peninsula — fetch real ATL06 granules from NASA CMR-STAC\n",
"\n",
"`CMRSource` speaks directly to NASA's CMR-STAC endpoint (`requests`).\n",
"No Earthdata Login or credentials are needed for anonymous granule-metadata\n",
"queries. The returned `Catalog` wraps a stac-geoparquet Arrow table (one row\n",
"per granule, both S3 and HTTPS asset hrefs preserved)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-2",
"metadata": {},
"outputs": [],
"source": "from zagg.catalog.sources import CMRSource, Query\n\n# Antarctic Peninsula AOI — two weeks in January 2020.\nAOI_BBOX = (-65.0, -70.0, -55.0, -64.0) # lon_min, lat_min, lon_max, lat_max\nSTART_DATE = \"2020-01-01\"\nEND_DATE = \"2020-01-15\"\n\nquery = Query(\n short_name=\"ATL06\",\n version=\"007\",\n start_date=START_DATE,\n end_date=END_DATE,\n region=AOI_BBOX,\n provider=\"NSIDC_CPRD\",\n)\n\ncatalog = CMRSource().fetch(query)\nprint(f\"Fetched {len(catalog)} granules for {query.collection} over {AOI_BBOX}\")"
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-3",
"metadata": {},
"outputs": [],
"source": [
"# Inspect the catalog schema — each row is a STAC Item with WKB geometry.\n",
"print(\"Schema columns:\", catalog.table.schema.names[:10], \"...\")\n",
"print(\"Total rows:\", catalog.table.num_rows)\n",
"\n",
"# Decode a few granule records to confirm footprints are present.\n",
"records = catalog.granule_records()\n",
"print(f\"\\n{len(records)} records with valid footprint geometry.\")\n",
"for rec in records[:3]:\n",
" print(f\" {rec['id']} | https: {(rec['https'] or 'None')[:70]}...\")"
]
},
{
"cell_type": "markdown",
"id": "cell-4",
"metadata": {},
"source": [
"## 2. Build a ShardMap on a HEALPix grid\n",
"\n",
"We use a HEALPix grid (`parent_order=6, child_order=12`) — the same\n",
"configuration as `src/zagg/configs/atl06.yaml`. The `mortie` backend is\n",
"used for HEALPix intersection and requires no extra install. If `spherely`\n",
"is installed, pass `backend='auto'` to prefer exact S2 intersections.\n",
"\n",
"`ShardMap.build` maps each grid shard (a parent-order HEALPix cell) to the\n",
"set of granules whose footprint intersects it. The manifest is self-contained\n",
"— it stores `{\"id\", \"s3\", \"https\"}` per granule, so the aggregation runner\n",
"never needs the Catalog again at run time."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-5",
"metadata": {},
"outputs": [],
"source": [
"from zagg.catalog.shardmap import ShardMap\n",
"from zagg.grids import HealpixGrid\n",
"\n",
"# HEALPix grid matching atl06.yaml (parent_order=6, child_order=12, fullsphere).\n",
"grid = HealpixGrid(parent_order=6, child_order=12, layout=\"fullsphere\")\n",
"\n",
"# Build: catalog footprints are intersected with the HEALPix shard cells\n",
"# that cover the AOI. Use backend='auto' to prefer spherely when available.\n",
"shardmap = ShardMap.build(catalog, grid, backend=\"mortie\")\n",
"\n",
"print(f\"ShardMap: {len(shardmap.shard_keys)} shards, \"\n",
" f\"{shardmap.metadata['total_pairs']} granule-shard pairs\")\n",
"print(f\"Build time: {shardmap.metadata['build_wall_s']:.2f}s \"\n",
" f\"backend: {shardmap.metadata['backend']}\")\n",
"print(f\"Grid signature: {shardmap.grid_signature}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-6",
"metadata": {},
"outputs": [],
"source": [
"# Inspect a few shard assignments.\n",
"for key, shard_granules in zip(shardmap.shard_keys[:4], shardmap.granules[:4]):\n",
" ids = [g[\"id\"] for g in shard_granules]\n",
" print(f\" shard {key:6d}: {len(ids)} granule(s) — {ids[:2]}{'...' if len(ids) > 2 else ''}\")"
]
},
{
"cell_type": "markdown",
"id": "cell-7",
"metadata": {},
"source": [
"## 3. Persist to disk and reload (round-trip)\n",
"\n",
"Both the ShardMap JSON and the STAC-geoparquet Catalog are supported as\n",
"file-path inputs to `show_shardmap`. Here we round-trip both to disk to\n",
"demonstrate the saved-file path you would use in practice (e.g. after\n",
"running `python -m zagg.catalog --config atl06.yaml ...`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-8",
"metadata": {},
"outputs": [],
"source": [
"import tempfile\n",
"from pathlib import Path\n",
"\n",
"tmp = Path(tempfile.mkdtemp())\n",
"sm_path = tmp / \"shardmap_ATL06_peninsula_jan2020.json\"\n",
"cat_path = tmp / \"catalog_ATL06_peninsula_jan2020.parquet\"\n",
"\n",
"shardmap.to_json(str(sm_path))\n",
"catalog.to_geoparquet(str(cat_path))\n",
"\n",
"print(f\"ShardMap -> {sm_path} ({sm_path.stat().st_size / 1024:.1f} KB)\")\n",
"print(f\"Catalog -> {cat_path} ({cat_path.stat().st_size / 1024:.1f} KB)\")\n",
"\n",
"# Reload to verify round-trip.\n",
"from zagg.catalog.sources import Catalog\n",
"\n",
"sm_rt = ShardMap.from_json(str(sm_path))\n",
"cat_rt = Catalog.from_geoparquet(str(cat_path))\n",
"print(f\"\\nRound-trip OK: {len(sm_rt.shard_keys)} shards, {len(cat_rt)} granules\")"
]
},
{
"cell_type": "markdown",
"id": "cell-9",
"metadata": {},
"source": [
"## 4. Headless render core: GeoJSON FeatureCollections\n",
"\n",
"`render_shardmap` assembles every layer into one dict of GeoJSON\n",
"`FeatureCollection`s. Pass a `catalog` to include granule footprints. Each\n",
"value is plain, JSON-serializable GeoJSON — no widgets required."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-10",
"metadata": {},
"outputs": [],
"source": [
"from zagg.viz import render_shardmap\n",
"\n",
"layers = render_shardmap(shardmap, catalog)\n",
"{name: (fc and len(fc[\"features\"])) for name, fc in layers.items()}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-11",
"metadata": {},
"outputs": [],
"source": [
"# One feature per shard, with the shard key and its granule count.\n",
"shards_fc = layers[\"shards\"]\n",
"feat = shards_fc[\"features\"][0]\n",
"print(\"geometry type:\", feat[\"geometry\"][\"type\"])\n",
"print(\"properties:\", feat[\"properties\"])\n",
"\n",
"# All populated shards in the Antarctic Peninsula window.\n",
"[f[\"properties\"] for f in shards_fc[\"features\"]][:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-12",
"metadata": {},
"outputs": [],
"source": [
"# One feature per granule footprint, decoded from the real Catalog.\n",
"print(f\"{len(layers['granules']['features'])} granule footprint(s)\")\n",
"[f[\"properties\"][\"id\"] for f in layers[\"granules\"][\"features\"]][:5]"
]
},
{
"cell_type": "markdown",
"id": "cell-15",
"metadata": {},
"source": "## 5. Interactive map — Antarctic Peninsula (ipyleaflet, polar-stereographic)\n\n`show_shardmap` builds an `ipyleaflet.Map` from the saved ShardMap and\nCatalog paths. Both ShardMap JSON and STAC-geoparquet Catalog are accepted\nas file paths or in-memory objects.\n\n**Projection is auto-selected from the map's extent.** This Antarctic AOI is\nentirely south of −60°, so the viewer renders in **EPSG:3031** (Antarctic\nPolar Stereographic) with a NASA **GIBS** polar basemap instead of the\ndistorted Web Mercator default. An Arctic AOI (poleward of +60°) gets\n**EPSG:3413**; mid-latitude AOIs stay on Web Mercator + OpenStreetMap. Pass\n`crs=\"EPSG:3857\"` to force Mercator, or `crs=\"EPSG:3413\"`/`\"EPSG:3031\"` to\nforce a specific pole. Vector layers stay WGS84 GeoJSON — proj4leaflet\nreprojects them client-side.\n\n**Run in JupyterLab** (Binder already has the deps via `.binder/postBuild`;\nlocally `pip install \"zagg[analysis,catalog,viz]\"`) to see the live map.\nUnder headless `nbconvert` the Map object is constructed (no error) but\ntiles won't display.\n\n### Verification checklist\n\n1. **Basemap** — the GIBS polar basemap renders, pans, zooms, and the\n continent is undistorted at the pole (no Web-Mercator stretching).\n2. **Shard outlines** — blue polygons over the Peninsula; click one for\n its `shard_key` and `n_granules` properties.\n3. **Granule footprints toggle** — layer-control (top-right); the ICESat-2\n track footprints appear/disappear."
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-16",
"metadata": {},
"outputs": [],
"source": "from zagg.viz import show_shardmap\n\n# File-path interface — same as after `python -m zagg.catalog ...`.\n# CRS auto-selected from the AOI extent: this Antarctic map renders in EPSG:3031.\nm = show_shardmap(str(sm_path), catalog=str(cat_path), zoom=5)\nprint(\"display CRS:\", m.crs[\"name\"])\nprint(\"layers:\", [getattr(layer, \"name\", type(layer).__name__) for layer in m.layers])\nm"
},
{
"cell_type": "markdown",
"id": "cell-17",
"metadata": {},
"source": [
"## 6. Second AOI — Jakobshavn Glacier, West Greenland\n",
"\n",
"A second example using an Arctic AOI to show the same pipeline working\n",
"independently on a different region. Jakobshavn Glacier is one of\n",
"Greenland's fastest-moving glaciers and a standard ICESat-2 study region."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-18",
"metadata": {},
"outputs": [],
"source": "# render_shardmap / show_shardmap re-imported here for a standalone re-run of\n# this section; CMRSource / Query / ShardMap / grid still come from sections 1-2.\nfrom zagg.viz import render_shardmap, show_shardmap\n\n# Second AOI: Jakobshavn Glacier, West Greenland — June 2020.\nAOI2_BBOX = (-52.0, 68.0, -45.0, 72.0)\nSTART2, END2 = \"2020-06-01\", \"2020-06-15\"\n\nquery2 = Query(\n short_name=\"ATL06\",\n version=\"007\",\n start_date=START2,\n end_date=END2,\n region=AOI2_BBOX,\n provider=\"NSIDC_CPRD\",\n)\n\ncatalog2 = CMRSource().fetch(query2)\nshardmap2 = ShardMap.build(catalog2, grid, backend=\"mortie\")\n\nprint(f\"Greenland AOI: {len(catalog2)} granules, \"\n f\"{len(shardmap2.shard_keys)} shards\")\n\n# Headless render — confirm layers are populated.\nlayers2 = render_shardmap(shardmap2, catalog2)\nprint(\"Layer feature counts:\",\n {name: (fc and len(fc[\"features\"])) for name, fc in layers2.items()})"
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-19",
"metadata": {},
"outputs": [],
"source": "# Save and display interactive map for the Greenland AOI.\ntmp2 = Path(tempfile.mkdtemp())\nsm2_path = tmp2 / \"shardmap_ATL06_greenland_jun2020.json\"\ncat2_path = tmp2 / \"catalog_ATL06_greenland_jun2020.parquet\"\n\nshardmap2.to_json(str(sm2_path))\ncatalog2.to_geoparquet(str(cat2_path))\n\n# Arctic AOI -> auto-selects EPSG:3413 (NSIDC Sea Ice Polar Stereographic North).\nm2 = show_shardmap(str(sm2_path), catalog=str(cat2_path), zoom=5)\nprint(\"display CRS:\", m2.crs[\"name\"])\nm2"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
7 changes: 7 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,13 @@ lambda = [
catalog = [
"stac-geoparquet>=0.7.0",
]
# Interactive shard-map viewer (issue #38). Optional — kept out of core and out
# of `lambda` so the deployment layer stays lean. `pip install zagg[viz]`.
# ipyleaflet is a Jupyter widget; @espg approved adding it for the viewer
# (https://github.com/englacial/zagg/issues/38#issuecomment-4713639466).
viz = [
"ipyleaflet>=0.19",
]
analysis = [
"cubed>=0.24.0",
"cubed-xarray>=0.0.9",
Expand Down
Loading
Loading