The goal of gdx is to integrate GDAL with xarray, especially for the multidimensional API which is still relatively underutilized.
- apply xarray indexes when relevant in Raster
- apply xarray indexes when relevant in Multidim
- define stance on representation of “default transform”, and when GCPs, RPCs, or geolocation arrays are present
- explore when we need to control driver choice (netcdf and hdf in particular)
- explore registering as an xarray backend, via
engine = "gdal"
Here’s a basic example:
from gdx import GDALBackendEntrypoint
backend = GDALBackendEntrypoint()
dsn = "/vsicurl/https://projects.pawsey.org.au/idea-sealevel-glo-phy-l4-nrt-008-046/data.marine.copernicus.eu/SEALEVEL_GLO_PHY_L4_NRT_008_046/cmems_obs-sl_glo_phy-ssh_nrt_allsat-l4-duacs-0.125deg_P1D_202506/2025/08/nrt_global_allsat_phy_l4_20250825_20250825.nc"
ds = backend.open_dataset(f'vrt://{dsn}?sd_name=vgos', chunks = {}, multidim = False)
ds1 = backend.open_dataset(dsn, multidim = True, chunks = {}) We have a Raster xarray:
ds
<xarray.Dataset> Size: 17MB
Dimensions: (x: 2880, y: 1440)
Coordinates:
* x (x) float64 23kB -180.0 -179.9 -179.8 -179.6 ... 179.6 179.8 179.9
* y (y) float64 12kB 90.0 89.88 89.75 89.62 ... -89.62 -89.75 -89.88
Data variables:
band_1 (y, x) int32 17MB dask.array<chunksize=(1440, 2880), meta=np.ndarray>
Attributes:
crs: GEOGCS["unknown",DATUM["unnamed",SPHEROID["Spheroid",63781...
geotransform: (-180.0, 0.125, 0.0, 90.0, 0.0, -0.125)and a Multidim xarray:
ds1
<xarray.Dataset> Size: 166MB
Dimensions: (latitude: 1440, nv: 2, longitude: 2880, time: 1)
Coordinates:
* latitude (latitude) float32 6kB -89.94 -89.81 -89.69 ... 89.69 89.81 89.94
* nv (nv) int32 8B 0 1
* longitude (longitude) float32 12kB -179.9 -179.8 -179.7 ... 179.8 179.9
* time (time) float32 4B 2.763e+04
Data variables:
lat_bnds (latitude, nv) float32 12kB dask.array<chunksize=(1440, 2), meta=np.ndarray>
lon_bnds (longitude, nv) float32 23kB dask.array<chunksize=(2880, 2), meta=np.ndarray>
sla (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
err_sla (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
ugosa (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
err_ugosa (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
vgosa (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
err_vgosa (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
adt (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
ugos (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
vgos (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
flag_ice (time, latitude, longitude) int32 17MB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
Attributes: (12/44)
Conventions: CF-1.6
Metadata_Conventions: Unidata Dataset Discovery v1.0
cdm_data_type: Grid
comment: Sea Surface Height measured by Altimetry...
contact: servicedesk.cmems@mercator-ocean.eu
creator_email: servicedesk.cmems@mercator-ocean.eu
... ...
summary: DUACS Near-Real-Time Level-4 sea surface...
time_coverage_duration: P1D
time_coverage_end: 2025-08-25T12:00:00Z
time_coverage_resolution: P1D
time_coverage_start: 2025-08-24T12:00:00Z
title: NRT merged all satellites Global Ocean G...There’s one variable called ‘band_1’ for the raster:
ds.band_1.isel(x = 0)
# <xarray.DataArray 'band_1' (y: 1440)> Size: 6kB
# dask.array<getitem, shape=(1440,), dtype=int32, chunksize=(1440,), chunktype=numpy.ndarray>
# Coordinates:
# x float64 8B -180.0
# * y (y) float64 12kB 90.0 89.88 89.75 89.62 ... -89.62 -89.75 -89.88
# Attributes:
# nodata: -2147483647.0
# scale: 0.0001
# offset: 0.0we can access actual values
## the raw values for now
ds.band_1.sel(x = 100, y = -50, method = "nearest").values
#> array(441, dtype=int32)
ds1.sla.isel(longitude = 0, latitude = 1000).values
#> array([2404], dtype=int32)What about a ZARR from CMEMS?
from gdx import GDALBackendEntrypoint
dsn = 'ZARR:"/vsicurl/https://s3.waw3-1.cloudferro.com/mdl-arco-time-045/arco/SEALEVEL_GLO_PHY_L4_MY_008_047/cmems_obs-sl_glo_phy-ssh_my_allsat-l4-duacs-0.125deg_P1D_202411/timeChunked.zarr"'
backend = GDALBackendEntrypoint()
#ds = backend.open_dataset(filename_or_obj, multidim = True, chunks = None)
ds = backend.open_dataset(dsn, multidim = True, chunks = {})
ds
# <xarray.Dataset> Size: 2TB
# Dimensions: (latitude: 1440, longitude: 2880, time: 11902, nv: 2)
# Coordinates:
# * latitude (latitude) float32 6kB -89.94 -89.81 -89.69 ... 89.81 89.94
# * longitude (longitude) float32 12kB -179.9 -179.8 ... 179.8 179.9
# * time (time) float32 48kB 1.571e+04 1.571e+04 ... 2.761e+04
# * nv (nv) int32 8B 0 1
# Data variables: (12/13)
# adt (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# err_sla (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# err_ugosa (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# err_vgosa (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# flag_ice (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# lat_bnds (latitude, nv) float32 12kB dask.array<chunksize=(1440, 2), meta=np.ndarray>
# ... ...
# sla (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# tpa_correction (time) int32 48kB dask.array<chunksize=(1,), meta=np.ndarray>
# ugos (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# ugosa (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# vgos (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# vgosa (time, latitude, longitude) int32 197GB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
# Attributes: (12/43)
# Conventions: CF-1.6
# Metadata_Conventions: Unidata Dataset Discovery v1.0
# cdm_data_type: Grid
# comment: Sea Surface Height measured by Altimetry...
# contact: servicedesk.cmems@mercator-ocean.eu
# coordinates: lat_bnds lon_bnds
# ... ...
# summary: SSALTO/DUACS Delayed-Time Level-4 sea su...
# time_coverage_duration: P1D
# time_coverage_end: 2023-12-31T12:00:00Z
# time_coverage_resolution: P1D
# time_coverage_start: 2023-12-30T12:00:00Z
# title: DT merged all satellites Global Ocean Gr...This example is a virtualized mosaic of NetCDF in multidim VRT.
big_virtual_mdim = "/vsicurl/https://gist.githubusercontent.com/mdsumner/18c5d302d00b9a456bb73d30ac758764/raw/f26e1b2e202f759d6aace4d7deb3e04ea3c85f15/mdim.vrt"
bvm = backend.open_dataset(big_virtual_mdim, multidim = True, chunks = {})
# <xarray.Dataset> Size: 3TB
# Dimensions: (Time: 5479, st_ocean: 51, yt_ocean: 1500, xt_ocean: 3600)
# Coordinates:
# * Time (Time) float64 44kB 1.132e+04 1.132e+04 ... 1.68e+04 1.68e+04
# * st_ocean (st_ocean) float64 408B 2.5 7.5 12.5 ... 3.603e+03 4.509e+03
# * yt_ocean (yt_ocean) float64 12kB -74.95 -74.85 -74.75 ... 74.75 74.85 74.95
# * xt_ocean (xt_ocean) float64 29kB 0.05 0.15 0.25 0.35 ... 359.8 359.9 360.0
# Data variables:
# temp (Time, st_ocean, yt_ocean, xt_ocean) int16 3TB dask.array<chunksize=(5479, 51, 1500, 3600), meta=np.ndarray>
bvm.sel(xt_ocean = slice(140, 150), yt_ocean = slice(-55, -45), st_ocean = slice(8, 13)).isel(Time = -1).temp.values
# array([[[-30770, -30784, -30799, ..., -30418, -30424, -30445],
# [-30755, -30771, -30788, ..., -30418, -30425, -30446],
# [-30744, -30764, -30788, ..., -30417, -30426, -30448],
# ...,
# [-29852, -29868, -29889, ..., -29413, -29338, -29325],
# [-29835, -29851, -29883, ..., -29385, -29327, -29324],
# [-29821, -29840, -29879, ..., -29353, -29319, -29322]]],
# shape=(1, 100, 100), dtype=int16)What a joy to simply be able to use GDAL for what it is good at without intermediate layers.
from gdx import GDALBackendEntrypoint
backend = GDALBackendEntrypoint()
backend.open_dataset("/vsicurl/https://noaa-nbm-grib2-pds.s3.amazonaws.com/blend.20251031/16/core/blend.t16z.core.f260.co.grib2", multidim = False, chunks = None)<xarray.Dataset> Size: 989MB
Dimensions: (
y: 1597,
x: 2345)
Coordinates:
* x (x) float64 19kB ...
* y (y) float64 13kB ...
* crs int64 8B ...
Data variables: (12/33)
APTMP:2 m above ground:260 hour fcst (y, x) float64 30MB ...
CAPE:surface:260 hour fcst (y, x) float64 30MB ...
CWASP:surface:260 hour fcst (y, x) float64 30MB ...
DPT:2 m above ground:260 hour fcst (y, x) float64 30MB ...
DSWRF:surface:260 hour fcst (y, x) float64 30MB ...
FICEAC:surface:254-260 hour acc fcst (y, x) float64 30MB ...
... ...
WDIR:80 m above ground:260 hour fcst (y, x) float64 30MB ...
WDIR:10 m above ground:260 hour fcst (y, x) float64 30MB ...
GUST:10 m above ground:260 hour fcst (y, x) float64 30MB ...
WIND:30 m above ground:260 hour fcst (y, x) float64 30MB ...
WIND:80 m above ground:260 hour fcst (y, x) float64 30MB ...
WIND:10 m above ground:260 hour fcst (y, x) float64 30MB ...
Indexes:
┌ x RasterIndex
└ y
crs CRSIndex (crs=PROJCS["unnamed",GEOGCS["Coordinate System imported from GRIB file" ...)
There’s a lot more to do, scaling works but I turned that off to test for now. .
Template a list of netcdf files and mosaic them to VRT, then open with this xarray backend. (Note this requires GDAL>=3.12.0 ).
month = "202501"
url = [f"/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/{month}/oisst-avhrr-v02r01.{month}{(day+1):02d}.nc" for day in range(31)]
gdal.Run("mdim mosaic", input = url, output = "oisst.vrt", array = "sst")
from gdx import GDALBackendEntrypoint
backend = GDALBackendEntrypoint()
backend.open_dataset("oisst.vrt", multidim = True)
# <xarray.Dataset> Size: 64MB
# Dimensions: (lat: 720, lon: 1440, time: 31, zlev: 1)
# Coordinates:
# * lat (lat) float64 6kB -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
# * lon (lon) float64 12kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9
# * time (time) float64 248B 1.717e+04 1.72e+04 ... 1.717e+04 1.717e+04
# * zlev (zlev) float64 8B 0.0
# Data variables:
# sst (time, zlev, lat, lon) int16 64MB ...
# Please note that the gdx project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.