Skip to content

bump zarr conventions; migrate type checking to pyright#199

Open
d-v-b wants to merge 9 commits into
EOPF-Explorer:mainfrom
d-v-b:chore/new-conventions-metadata
Open

bump zarr conventions; migrate type checking to pyright#199
d-v-b wants to merge 9 commits into
EOPF-Explorer:mainfrom
d-v-b:chore/new-conventions-metadata

Conversation

@d-v-b

@d-v-b d-v-b commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Updates the data model to the latest GeoZarr-relevant Zarr conventions (via the zarr-cm package) and overhauls the project's type-checking setup.

The work landed in two parts:

1. zarr-cm upgrade

  • Build all convention metadata (multiscales / spatial / proj) through zarr_cm.create_many via a single utils.build_convention_attrs helper, so zarr-cm validates each convention and emits its CMO — instead of hand-assembling zarr_conventions and the spatial:/proj: keys.
  • Track the latest zarr-cm (currently the main git dependency, which ships: a Mapping-covariant aggregate API, public JsonType/JsonValue/JsonDict exports, and a PEP 695 type JsonValue alias). The PEP 695 alias lets pydantic resolve MultiscaleGroupAttrs' ConventionMetadataObject field directly — no model_rebuild() workaround.
  • Regenerated the golden-file snapshots whose embedded convention schema_url/spec_url changed with the new convention releases (URL-only diffs).

2. mypy → pyright migration

zarr-cm's convention TypedDicts now use PEP 728 extra_items, which mypy does not support. The type checker is switched to pyright (which does):

  • pyproject.toml: [tool.mypy][tool.pyright]; mypypyright dev dependency.
  • pre-commit + CI lint job run uv run --frozen pyright.

Reaching a clean pyright run (0 errors across src/ + tests/) involved:

  • removing stale mypy # type: ignore[...] comments;
  • narrowing NotRequired TypedDict access with .get() + guards instead of making keys Required (preserving runtime validation of genuinely-optional Sentinel-1/2 members);
  • replacing casts on external/untyped data with real runtime checks (CRS.from_user_input, a _as_bbox validator, defensive get_zarr_group resolution);
  • constructing models via Model.model_validate(...) rather than **dict.

No typing.Any is used in the changed code. Runtime behavior is preserved.

Verification

  • pyright: 0 errors (src + tests).
  • ruff check + ruff format: clean.
  • Test suite green (incl. Sentinel-1/2 round-trip and golden-file snapshot tests).
  • roborev review (job 114): Pass — no issues found.

Upstream

Filed (and merged) several zarr-cm improvements surfaced by this work: aggregate-API Mapping/type exports and the PEP 695 JsonValue alias.

🤖 Generated with Claude Code

# Check if this variable already exists and is valid
if not force_overwrite and store_exists:
if utils.validate_existing_band_data(existing_dataset, var, ds):
assert existing_dataset is not None # guaranteed by store_exists
for var in ds.data_vars:
if hasattr(ds[var].data, "chunks"):
current_chunks = ds[var].chunks
assert current_chunks is not None # guaranteed by hasattr(..., "chunks")
@codecov-commenter

codecov-commenter commented Jun 19, 2026

Copy link
Copy Markdown

d-v-b and others added 6 commits June 19, 2026 15:52
Route multiscales, spatial, and proj convention metadata through a single
utils.build_convention_attrs() helper that delegates to zarr_cm.create_many,
so zarr-cm validates each convention and emits its CMO. Previously the CMOs
were hand-placed and the spatial:/proj:/multiscales keys assembled by hand,
which skipped zarr-cm validation (e.g. multiscales' layout>=1 and
derived_from=>transform rules) and duplicated key strings.

Type the helper precisely with zarr-cm's TypedDicts (SpatialAttrs,
GeoProjAttrs, MultiscalesAttrs, MultiConventionAttrs) and a CRSLike Protocol
instead of dict[str, Any]. Multiscales data is still produced by the project's
MultiscaleMeta model (it also covers the TMS encoding zarr-cm doesn't model),
but its CMO and validation now go through zarr-cm. Output is byte-identical to
before (verified against the golden-file snapshot tests).

Add tests/test_conversion/test_convention_attrs.py covering the helper,
including that zarr-cm now rejects an invalid multiscales layout.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ignore root-level scratch files (test.py, tmp.json, cli_test.sh) and the
machine-local .claude/ session directory so they stop showing in git status.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
zarr-cm's convention TypedDicts now use PEP 728 `extra_items`, which mypy does
not support. Switch the type checker to pyright (which does) and upgrade to the
zarr-cm main git dep that also ships the supporting fixes:
- Mapping-covariant aggregate API + exported JsonType/JsonValue/JsonDict, so
  build_convention_attrs no longer needs a private `_core` import or a cast.
- PEP 695 `type JsonValue` alias, so pydantic resolves MultiscaleGroupAttrs'
  ConventionMetadataObject field directly (removed the model_rebuild workaround).

Toolchain:
- pyproject: replace [tool.mypy] with [tool.pyright]; mypy -> pyright dev dep.
- .pre-commit-config / CI lint job: run `uv run --frozen pyright`.

Type fixes across src/ and tests/ to reach a clean pyright run (0 errors):
- remove stale mypy `# type: ignore[...]` comments (pyright-unnecessary);
- narrow NotRequired TypedDict access (`.get()` + guard) instead of making keys
  Required, preserving runtime validation of optional Sentinel-1/2 members;
- replace casts on external/untyped data with runtime isinstance/validation;
- model construction via `Model.model_validate(...)` instead of `**dict`.

No `typing.Any` introduced. Runtime behavior unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
)

# calculate_default_transform sizes the grid, so width and height are populated
assert width is not None

# calculate_default_transform sizes the grid, so width and height are populated
assert width is not None
assert height is not None
CONVENTION_NAME = multiscales_cm.CMO["name"]
CONVENTION_DESCRIPTION = multiscales_cm.CMO["description"]
_CONVENTION_NAME = multiscales_cm.CMO.get("name")
assert _CONVENTION_NAME is not None
assert _CONVENTION_NAME is not None
CONVENTION_NAME = _CONVENTION_NAME
_CONVENTION_DESCRIPTION = multiscales_cm.CMO.get("description")
assert _CONVENTION_DESCRIPTION is not None
expected_uuid = multiscales_cm.CMO["uuid"]
if not any(c["uuid"] == expected_uuid for c in value):
expected_uuid = multiscales_cm.CMO.get("uuid")
assert expected_uuid is not None
scale_level_data["transform"] = multiscale_transform

# Add spatial properties
assert "spatial_shape" in overview_level # always populated by the producer above
@d-v-b d-v-b changed the title bump zarr conventions bump zarr conventions; migrate type checking to pyright Jun 21, 2026
@d-v-b d-v-b marked this pull request as draft June 21, 2026 19:11
d-v-b and others added 2 commits June 21, 2026 21:14
Three casts asserted a type derived from Any or a union without verifying it.
Replace each with an isinstance guard that raises TypeError on violation, so
the assumption is enforced at runtime instead of only asserted to the checker:

- sentinel1_reprojection: rio.write_crs() returns Any -> verify xr.Dataset.
- s2_multiscale: output_group[base_path] is Array | Group -> verify zarr.Group.
- s2_multiscale: client.compute() returns Any -> verify distributed.Future.

The remaining casts bridge types on data we already validated (model_dump /
create_many output), satisfy protocol/TypeVar binding on `self`, or widen a
TypedDict for a third-party API — a runtime check there adds no safety.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@d-v-b d-v-b marked this pull request as ready for review June 21, 2026 19:29
@d-v-b d-v-b requested a review from lhoupert June 26, 2026 15:47
@d-v-b

d-v-b commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@lhoupert i forgot to tag you for review when I opened this 🤦 lmk if you want to review it, otherwise i am inclined to merge for speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants