diff --git a/docs/concepts/data-structures.md b/docs/concepts/data-structures.md
new file mode 100644
index 0000000..33666ba
--- /dev/null
+++ b/docs/concepts/data-structures.md
@@ -0,0 +1,151 @@
+# Data structures
+
+This page defines the data-representation concepts the rest of the guide leans
+on. Two orthogonal axes — *gridded vs. ungridded* and *structured vs.
+unstructured* — set up the vocabulary, **regridding** is the operation that
+moves data between them, and a **datacube** is the analysis-ready destination.
+
+Reference material on satellite product taxonomy (swath geometry, analysis-ready
+data specifications, agency processing-level schemes) is out of scope here — see
+the glossary's
+[Satellite data products](../glossary.md#satellite-data-products) section,
+whose entries link to canonical external references.
+
+## Two spaces
+
+Most of what follows describes the relationship between two distinct spaces:
+
+- **Index space** — the integer-indexed structure of the array itself. A value
+ is addressed by its position `(i, j[, k])`. Has no inherent units.
+- **World space** (also "physical space" or, in geospatial work, "geographic
+ space") — where each value actually sits in a [CRS](../glossary.md). Has
+ units (meters, degrees, kelvin-along-a-vertical-axis, …) defined by the CRS.
+
+**Gridded** data is data that has both spaces, plus a mapping between them
+(the "grid geometry"). **Ungridded** data lives only in world space, with no
+index space at all — each record carries its own world-space coordinates.
+
+## Gridded vs. ungridded
+
+The first question to ask of any dataset: **is it on a grid at all?**
+
+- **Gridded** data lives in **both** index space and world space, with the
+ mapping between them — the "grid geometry" — stored separately from the
+ values. The mapping can one of three common forms:
+ - an **affine transform plus a CRS** — the GeoTIFF / COG convention.
+ `(x, y) = affine @ (i, j)`; **no per-cell coordinate arrays stored**.
+ - **1-D coordinate arrays per axis** for a regular grid (`x[nx]`, `y[ny]`)
+ — the CF / NetCDF / Zarr convention.
+ - **2-D coordinate arrays per cell** for a curvilinear grid
+ (`x[i, j]`, `y[i, j]`).
+
+ Examples: a satellite Level-3 product on a regular lat/lon grid, a
+ climate-model output, a reanalysis dataset, a COG.
+
+- **Ungridded** data lives **only in world space** — no index space, no array
+ structure. Each value carries its own coordinates `(x[k], y[k])` in whatever
+ CRS the producer chose (geographic *or* projected). There's no row `i` and
+ column `j`, only individual observations. Examples: weather stations, ocean
+ buoys, GNSS receivers, lidar/GPS point clouds, in-situ vertical profiles,
+ aircraft tracks.
+
+{ width="100%" }
+
+The same physical quantity (say, surface temperature) can be represented either
+way. Ungridded observations are often the *input* to a regridding step that
+produces a gridded product (see [Regridding](#regridding-resampling) below).
+
+## Structured vs. unstructured
+
+A second, *orthogonal* axis applies to gridded data: **how is cell connectivity
+defined?** This axis says nothing about ungridded data — ungridded data has no
+grid topology at all.
+
+- **Structured grid:** cells form a regular logical array addressable by integer
+ indices `(i, j[, k])`; **connectivity is implicit** (neighbors of `(i, j)`
+ are `(i±1, j)` and `(i, j±1)`). Includes **regular** (rectilinear) grids and
+ **curvilinear** grids — logically rectangular but physically warped, common in
+ ocean models.
+- **Unstructured grid (mesh):** cells (triangles, polygons, sometimes mixed) are
+ joined by an **explicit connectivity list**; nodes have variable numbers of
+ neighbors. Examples: ICON, MPAS, FVCOM, finite-element meshes. Storage and
+ access patterns are fundamentally different from structured grids.
+- **Discrete Global Grid Systems (DGGS):** a third option that doesn't fit
+ neatly into structured-vs-unstructured. A DGGS tiles the *whole* sphere with
+ a single (often equal-area) cell family and a hierarchical refinement scheme;
+ cells are addressed by a **specialized cell ID**, with connectivity and
+ refinement encoded in the ID's arithmetic — no `(i, j)` array shape, and no
+ explicit connectivity list. Examples: **HEALPix** (equal-area quadrilateral
+ cells with ring/nested indexing), **H3** (hexagonal cells with a hierarchical
+ hex ID), **S2** (quadrilateral cells on a cubed sphere), **rHEALPix**,
+ **cubed-sphere**. Standardized by the
+ [OGC DGGS abstract specification](https://www.ogc.org/standards/dggs/).
+
+{ width="100%" }
+
+The two axes combine: a dataset can be **gridded + structured** (most satellite
+Level-3 products), **gridded + unstructured** (an ocean-model output on a
+triangular mesh), **gridded + DGGS** (a HEALPix cosmology map; an H3 hex map),
+or **ungridded** (irrelevant to this axis — there's no grid).
+
+## Regridding / resampling
+
+**Regridding** (or **resampling**) is the operation that moves data from one
+spatial sampling to another — from ungridded or unstructured input onto a
+regular grid, or between two grids. It is the verb connecting the preceding
+nouns to the datacube that follows. The previous section's diagram, read
+right-to-left, illustrates the simplest case: scattered ungridded points
+resampled onto a regular grid.
+
+The choice of **interpolation method** matters more than people expect:
+
+| Method | Behavior | Use for |
+|---|---|---|
+| **Nearest** | Picks the closest source value; preserves values exactly; blocky. | Categorical / class data (land cover, flags). |
+| **Bilinear** (linear) | Weighted average of the four surrounding cells; smooth; blurs sharp edges. | Smooth continuous fields (temperature, reflectance). |
+| **Conservative** | Area-weighted; preserves area-integrated totals across cells. | Extensive quantities — fluxes, precipitation, mass. |
+
+A useful rule of thumb: **match the method to the quantity.** The wrong choice
+silently corrupts downstream analysis — bilinear on precipitation does not
+conserve total water; nearest on a categorical mask preserves classes but
+bilinear on the same mask produces nonsense fractional categories.
+
+Common Python tooling: **`xESMF`** (xarray-friendly, supports conservative
+regridding via ESMF), **`pyresample`** (especially for swath → grid),
+**`rasterio.warp.reproject`** (GDAL-backed, the GeoTIFF/COG path),
+**`scipy.interpolate`** for one-off cases.
+
+For empirical performance trade-offs across these tools, see Development Seed's
+[warp/resample profiling benchmark][warp-resample-profiling], which measures
+memory and time across local vs. S3 storage and NetCDF, Zarr, and GeoTIFF
+sources.
+
+[warp-resample-profiling]: https://developmentseed.org/warp-resample-profiling/
+
+## Datacube
+
+A **datacube** is a labeled, regularly-gridded N-dimensional array — dimensions
+carry coordinates (e.g. `time`, `level`, `lat`, `lon`, `band`), and the data is
+addressable by those coordinates rather than only by integer index. Typical
+sizes span 3–5 dimensions.
+
+A datacube is inherently a **structured, gridded** representation. It is most
+often the *product* of gridding either ungridded or unstructured-mesh data onto
+regular grids — i.e. the destination of the previous section's operation.
+Common containers include Zarr (cloud-optimized), NetCDF, and HDF5; the
+in-memory representation is typically an Xarray `Dataset` when using Python.
+
+For a deeper look at how a datacube's dimensions reduce to common viewing
+shapes (maps, time series, profiles, animations), see the
+[visualization overview](../visualization/overview.md).
+
+{ width="100%" }
+
+## External references
+
+- **UGRID conventions** (unstructured-mesh in NetCDF):
+
+- **CF conventions:**
+- **xESMF** (regridding for xarray):
+- **pyresample** (geospatial resampling):
+- **Open Data Cube:**
diff --git a/docs/concepts/images/gridded-vs-ungridded.svg b/docs/concepts/images/gridded-vs-ungridded.svg
new file mode 100644
index 0000000..b0f4bc6
--- /dev/null
+++ b/docs/concepts/images/gridded-vs-ungridded.svg
@@ -0,0 +1,71 @@
+
+
diff --git a/docs/glossary.md b/docs/glossary.md
index c77a631..b42f649 100644
--- a/docs/glossary.md
+++ b/docs/glossary.md
@@ -75,7 +75,8 @@ Range request
Datacube
: A multi-dimensional array of data, for example time × level × latitude ×
- longitude × band, typically spanning 3 to 5 dimensions.
+ longitude × band, typically spanning 3 to 5 dimensions. See
+ [Data structures](concepts/data-structures.md#datacube).
Chunk
: A contiguous block of a chunked array, read and written as a unit. Chunk
@@ -102,3 +103,91 @@ STAC (SpatioTemporal Asset Catalog)
OPeNDAP
: A protocol for remote access to subsets of scientific datasets over HTTP.
+
+## Data structures
+
+Foundational concepts covered on the
+[Data structures](concepts/data-structures.md) page.
+
+Index space
+: The integer-indexed structure of an array. A value is addressed by its
+ position `(i, j[, k])`; has no inherent units. See
+ [Data structures](concepts/data-structures.md#two-spaces).
+
+World space
+: Where each value actually sits in a CRS — also "physical space" or
+ "geographic space". Has units defined by the CRS (meters, degrees, …). See
+ [Data structures](concepts/data-structures.md#two-spaces).
+
+Gridded
+: Data tied to the cells (or nodes) of a grid; a value's location is implied
+ by its index in the array. See
+ [Data structures](concepts/data-structures.md#gridded-vs-ungridded).
+
+Ungridded
+: Scattered or point observations that are not arranged on a grid; each
+ value carries its own explicit coordinates. See
+ [Data structures](concepts/data-structures.md#gridded-vs-ungridded).
+
+Structured grid
+: A grid whose cells form a regular logical array addressable by integer
+ indices, with connectivity implicit. Includes regular (rectilinear) and
+ curvilinear grids. See
+ [Data structures](concepts/data-structures.md#structured-vs-unstructured).
+
+Unstructured grid
+: A mesh whose cells are joined by an explicit connectivity list, with
+ variable numbers of neighbors per node. See
+ [Data structures](concepts/data-structures.md#structured-vs-unstructured).
+
+DGGS (Discrete Global Grid System)
+: A global tessellation of the sphere by a single cell family (often
+ equal-area), with hierarchical refinement and a specialized cell-ID
+ indexing scheme — connectivity is implicit in the ID arithmetic rather
+ than in `(i, j)` array shape or an explicit connectivity list. Examples:
+ HEALPix, H3, S2, cubed-sphere. See
+ [Data structures](concepts/data-structures.md#structured-vs-unstructured).
+
+Regridding (resampling)
+: The operation that moves data from one spatial sampling to another, for
+ example from ungridded points onto a regular grid. Method matters: nearest,
+ bilinear, and conservative each suit different quantities. See
+ [Data structures](concepts/data-structures.md#regridding-resampling).
+
+## Satellite data products
+
+Pointers to canonical external references for satellite Earth-observation
+product taxonomy. These terms are out of scope for an in-depth treatment here.
+
+Swath
+: The strip of Earth's surface observed by a sensor as the platform moves
+ along its orbit; the sensor's native acquisition geometry, indexed by
+ along-track × across-track with 2-D geolocation arrays. Typical of
+ Level-1/Level-2 products, distinct from a Level-3 product resampled onto a
+ regular map grid. See Copernicus SentiWiki —
+ [Sentinel-1 products][s1-products].
+
+Analysis-ready data (ARD)
+: Any dataset that has been preprocessed such that it fulfills the quality
+ standards required by the analysis to be performed on it. For satellite
+ Earth observation specifically, CEOS-ARD (formerly CARD4L) is the
+ community standard. See Stern et al., *Frontiers in Climate* (2021) —
+ [Pangeo Forge: Crowdsourcing Analysis-Ready, Cloud Optimized Data
+ Production][ard-frontiers].
+
+Data processing levels
+: A maturity ladder describing how far a product has been processed, from raw
+ instrument data (Level 0) to model output (Level 4). The numbers are *not*
+ portable across agencies — NASA, ESA/Copernicus, and USGS each define their
+ own scheme. See NASA Earthdata —
+ [Data Processing Levels][nasa-levels].
+
+Timeliness (NRT/STC/NTC)
+: ESA's latency axis for a product: Near Real Time (hours), Short Time
+ Critical, and Non Time Critical (best calibration accuracy). Independent of
+ processing level. See Copernicus SentiWiki — [Sentinel-3 products][s3-products].
+
+[s1-products]: https://sentiwiki.copernicus.eu/web/s1-products
+[ard-frontiers]: https://www.frontiersin.org/journals/climate/articles/10.3389/fclim.2021.782909/full
+[nasa-levels]: https://www.earthdata.nasa.gov/learn/earth-observation-data-basics/data-processing-levels
+[s3-products]: https://sentiwiki.copernicus.eu/web/sentinel-3
diff --git a/docs/visualization/images/grid-types.svg b/docs/visualization/images/grid-types.svg
new file mode 100644
index 0000000..8cf3fe2
--- /dev/null
+++ b/docs/visualization/images/grid-types.svg
@@ -0,0 +1,103 @@
+
+
diff --git a/mkdocs.yml b/mkdocs.yml
index d89f35f..2dc0d34 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -15,6 +15,7 @@ extra:
nav:
- "index.md"
- Glossary: "glossary.md"
+ - Data structures: "concepts/data-structures.md"
- Datacube Worst Practices:
- Common production gotchas:
- Tiny data chunks: "worst-practices/tiny-chunks.ipynb"