Skip to content

Support shapefile / GeoJSON meshes via geopandas#59

Merged
rajeeja merged 1 commit into
mainfrom
rajeeja/gis-mesh-support
Jun 9, 2026
Merged

Support shapefile / GeoJSON meshes via geopandas#59
rajeeja merged 1 commit into
mainfrom
rajeeja/gis-mesh-support

Conversation

@rajeeja

@rajeeja rajeeja commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • New `domain.load_dataset` and updated `domain.load_grid` that branch on extension: `.shp` and `.geojson` go through `ux.Grid.from_file(..., backend="geopandas")`, HEALPix specs continue through `from_healpix`, everything else falls through to the existing `open_grid` / `open_dataset`.
  • All local tools (`inspection`, `plotting`, `capabilities`, `advanced`, `vector_calc`) now route file opens through these loaders.
  • All `remote_*` functions in `remote/compute_functions.py` inline the same dispatch (Globus Compute serializes the function body and ships it, so module-level helpers don't survive). A NOTE at the top of the file documents the constraint and warns "change one, change all." Earlier draft's unused `_remote_load_grid` / `_remote_load_dataset` helpers removed to prevent a future refactor that would break serialization.
  • Extension list narrowed to `.shp` / `.geojson` only; `.shx` / `.dbf` dropped (not valid entry points — geopandas auto-discovers the siblings).
  • README router: drop `(5 min)` / `(15 min)` suffixes for consistency; concrete times in step-by-step docs are kept.

Verification

  • `uv run pre-commit run --all-files`: clean.
  • `uv run pytest tests/ --ignore=tests/test_remote_agent.py`: 295 / 295 passed (+2 new shapefile/geojson cases).
  • Local fixtures from `~/uxarray/test/meshfiles`:
    • UGRID `outCSne30.ug` → 5400 faces, format `UGRID` ✓
    • SCRIP `ne30pg2/grid.nc` → 21600 faces, format `Scrip` ✓
    • Shapefile `chicago_neighborhoods.shp` → 101 faces, format `Shapefile` ✓
    • GeoJSON `sample_chicago_buildings.geojson` → 10 faces, format `GeoJSON` ✓
  • Remote NetCDF on chrysalis (regression):
    • `inspect_mesh_remote` on Polaris `oi240lr240/base_mesh.nc` → 10302 faces, `hpc:chrysalis` ✓
    • `calculate_area_remote` on the same → total_area 12.566 (4π unit sphere ✓), `hpc:chrysalis` ✓

Test plan

  • Pre-commit clean
  • 295 tests pass
  • All four mesh formats load locally through both `load_grid` and the `inspect_mesh` tool wrapper
  • Remote chrysalis path unchanged (NetCDF fall-through verified)
  • Remote GIS path not tested — would require uploading a .shp to chrysalis, which is the user's responsibility; the same inline branch is exercised in unit tests and on the local NetCDF path.

…DME router

Adds GIS vector formats (.shp, .geojson) as first-class mesh inputs alongside
the existing UGRID / MPAS / SCRIP / NetCDF / HEALPix paths.

Loaders:
- domain/mesh.py: new load_dataset() companion to load_grid(). Both branch on
  extension: .shp / .geojson go through ux.Grid.from_file(..., backend="geopandas"),
  HEALPix specs continue to use ux.Grid.from_healpix(), everything else falls
  through to ux.open_grid / ux.open_dataset.
- domain/__init__.py: re-export load_dataset.

Tools — route every local file open through the new loaders so the GIS path
applies uniformly (no behaviour change for existing formats):
- tools/inspection.py, tools/plotting.py, tools/capabilities.py,
  tools/advanced.py, tools/vector_calc.py: ux.open_dataset / ux.open_grid →
  load_dataset / load_grid.

Remote — Globus Compute serializes each remote_* function body and ships it to
the worker, so closures over a module-level helper aren't reliable across SDK
versions. Each remote_* function inlines the same ~6 lines of extension
dispatch. A NOTE at the top of remote/compute_functions.py explains why and
warns "change one, change all". The earlier draft also contained unused
_remote_load_grid / _remote_load_dataset helpers that have been removed to
prevent a future maintainer from refactoring them in and breaking serialization.

Extension list: dropped .shx / .dbf from the accepted-extension list — only
.shp is a valid entry point (geopandas picks up the siblings automatically),
and a user passing .shx or .dbf directly would have gotten a confusing
geopandas error rather than the helpful fall-through.

Tests:
- tests/test_inspect_mesh.py: +test_inspect_shapefile_mesh, +test_inspect_geojson_mesh.
- tests/test_plotting.py: 5 tests updated to patch load_dataset instead of
  the now-unused ux.open_dataset import path.
- 295 / 295 pass locally.
- Verified against real fixtures in ~/uxarray/test/meshfiles: outCSne30 ugrid
  (5400 faces), ne30pg2 scrip (21600 faces), chicago_neighborhoods.shp
  (101 faces), sample_chicago_buildings.geojson (10 faces).
- Verified remote NetCDF path unchanged: inspect_mesh_remote +
  calculate_area_remote against the Polaris oi240lr240 base_mesh on
  chrysalis return identical results to prior runs (10302 faces,
  total_area = 4π unit sphere) with execution_venue=hpc:chrysalis.

README: dropped "(5 min)", "(15 min)" suffixes from the four-row router for
consistency; concrete times in step-by-step docs are kept.
@rajeeja rajeeja merged commit ae70ced into main Jun 9, 2026
9 checks passed
@rajeeja rajeeja deleted the rajeeja/gis-mesh-support branch June 9, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant