Skip to content

feat: Sentinel-3 OLCI L1 EFR → GeoZarr exporter#2

Draft
d-v-b wants to merge 23 commits into
chore/new-conventions-metadatafrom
feat/sentinel3-export
Draft

feat: Sentinel-3 OLCI L1 EFR → GeoZarr exporter#2
d-v-b wants to merge 23 commits into
chore/new-conventions-metadatafrom
feat/sentinel3-export

Conversation

@d-v-b

@d-v-b d-v-b commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a Sentinel-3 OLCI L1 EFR → GeoZarr exporter — the first Sentinel-3 product type — modeled on the existing Sentinel-2 exporter and adapted to OLCI's curvilinear swath geometry.

Stacked on EOPF-Explorer#199 (chore/new-conventions-metadata). This PR's base is that branch, so the diff here is OLCI-only. Merge after EOPF-Explorer#199.

Design: docs/superpowers/specs/2026-06-21-sentinel3-olci-export-design.md · Plan: docs/superpowers/plans/2026-06-21-sentinel3-olci-export.md

What it does

  • Native swath geometry, no reprojection. OLCI L1 has no projected CRS — it's geolocated by per-pixel 2-D latitude/longitude. The exporter preserves that (CF 2-D coordinates), and emits the GeoZarr spatial convention with no spatial:transform.
  • Real multiscale overviews. /2 pyramid via fill-aware 2×2 block averaging of the 21 uint16 radiance bands (scale_factor/_FillValue preserved); coordinate arrays are decimated (real measured positions, not interpolated). Levels are declared with the GeoZarr multiscales convention (relative transform only).
  • Measurements-first scope. measurements/ (21 bands + 2-D geolocation) becomes the multiscale group; conditions/quality/orphans are copied through faithfully.
  • Structural detection (is_sentinel3_olci_dataset) + CLI auto-detect in convert + dedicated convert-s3-olci-optimized command.

New code

  • src/eopf_geozarr/s3_olci_optimization/olci_band_mapping, olci_multiscale (reduce_swath, swath_spatial_attrs), olci_converter (convert_olci_optimized, detection).
  • src/eopf_geozarr/data_api/s3_olci.pySentinel3OlciRoot model.
  • tests/ — band-mapping, multiscale (incl. odd-dimension reduction), data-api/detection, integration + golden-file snapshot; real-product structure-dump fixture in tests/_test_data/s3_examples/.
  • docs/notebooks/sentinel3_olci_geozarr.ipynb — end-to-end demo (open EOPF OLCI → convert → 4-quadrant multiscale visualization), executed against the real EODC sample product. Adds a notebooks dep group.
  • README.md / docs/converter.md — OLCI usage.

Verification

  • Built and reviewed task-by-task; whole-branch reviewed; several real defects caught & fixed (overview min-dimension off-by-one; v2→v3 encoding strip; dtype/scale-offset faithfulness via mask_and_scale=False; a shared-sanitize_array_attrs regression that affected S2 — reverted, S2 snapshot byte-unchanged; odd-column overview size mismatch found by running the notebook on real 4865-col data).
  • Whole suite: 299 passed / 7 skipped (excl. known-flaky S2 cli_e2e subprocess tests); pyright 0 errors; ruff clean; no typing.Any in new code; S1/S2/generic paths unaffected.

Deferred (documented follow-ups)

  • keep_scale_offset / enable_sharding / spatial_chunk / compression_level accepted but not yet wired into encoding.
  • GeoZarr-converting the tie-point geometry / 3-D meteorology; reprojection-to-grid option; SLSTR / SRAL / SYNERGY — separate specs.

🤖 Generated with Claude Code

d-v-b and others added 23 commits June 21, 2026 22:14
First Sentinel-3 exporter, scoped to OLCI L1 EFR. Grounded in a real EOPF
product introspected from the EODC sample store: 21 radiance bands on a single
~300m curvilinear swath grid geolocated by per-pixel 2D lat/lon (no projected
CRS). Decision: preserve native swath geometry with CF 2D coordinates (no
reprojection); /2 decimation pyramid; measurements-first scope; mirror the S2
package + data_api model + CLI auto-detection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10-task TDD plan mirroring the S2 exporter: OLCI band mapping, data_api model,
swath /2 decimation, structural detection, swath GeoZarr metadata,
convert_olci_optimized entry point, CLI auto-detect + convert-s3-olci-optimized,
real-product JSON fixture, golden-file snapshot, and verification/docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pyproject already pins zarr-cm>=0.4.1; update the lockfile from the git-main
dev build to the released 0.4.1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…p type)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ion test

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add swath_spatial_attrs() function to return GeoZarr spatial: convention
data for curvilinear swath geometry with pixel registration but no
affine transform or bbox (geolocation carried by 2-D lat/lon arrays).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… DataTree omits overviews

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add structure-dump fixture (shapes shrunk to 16x16 / tie-points 16x4)
from S3A_OL_1_EFR NT product fetched via HTTPS .zmetadata, add
s3_olci_group_example conftest fixture, and extend test_s3_olci.py with
is_sentinel3_olci_dataset and Sentinel3OlciRoot round-trip tests.
Also update Sentinel3OlciMeasurementsMembers to include time_stamp and
fix quality/conditions member types to allow nested GroupSpec children.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e time_stamp rows

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a Supported Products section to README.md and a dedicated
Sentinel-3 OLCI L1 EFR Conversion section to docs/converter.md,
covering: auto-detection via `eopf-geozarr convert`, the dedicated
`convert-s3-olci-optimized` command with all key flags, output
layout (measurements + r2/r4/... overviews + conditions/quality
passthrough), and the v1 scope note (encoding wiring is a follow-up).

Also add `# test: skip` to the first code block in the OLCI
implementation plan doc so test_docs.py skips the non-runnable
planning snippets.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s passthrough, fixture dims

FIX 0 — reduce_swath: genuine 2×2 fill-aware block-average for radiance
  vars (OLCI_BANDS), stride-decimate for 2-D coord arrays (lat/lon/alt);
  replaces literal [::2,::2] decimation of radiance that added no information.

FIX 1 — declare overviews as GeoZarr multiscales: build_convention_attrs
  now receives a MultiscalesAttrs layout (asset "." + "r2"/"r4"/… with
  relative scale=[2,2] transform) and writes the CMO to measurements attrs.

FIX 2 — copy measurements/orphans through (was silently dropped because
  to_dataset() discards child groups); iterate dt_input["/measurements"].children
  and _copy_subtree each one.

FIX 3 — fixture dim consistency: renamed tie-point 'columns' → 'tie_columns'
  in geometry/meteorology arrays, fixed orphan 2-D arrays from [16,16] to
  [16,4] (removed_pixels=4), fixed instrument 2-D arrays to [bands,detectors]
  at consistent sizes, fixed 3-D meteo arrays to [4,16,4], fixed
  nb_removed_pixels shape; xr.open_datatree now opens the whole tree cleanly.
  Snapshot regenerated (conditions, quality, measurements/orphans, r2 overview).

FIX 4 — wire in OLCI_BANDS: is_sentinel3_olci_dataset uses OLCI_BANDS[0],
  reduce_swath identifies radiance vars via OLCI_BANDS frozenset.

FIX 5 — remove --enable-sharding / --keep-scale-offset from README and
  docs/converter.md copy-paste example; keep them in flags table with
  "accepted but not yet wired" note.

All 31 OLCI tests pass; pyright 0 errors; ruff clean; no typing.Any.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…vert tie_columns); regen snapshot

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…egen snapshot

- CLI and snapshot test now open OLCI source with mask_and_scale=False so
  radiance stays uint16 with scale_factor/_FillValue in .attrs; CF decoding
  no longer silently strips scale/offset or widens dtype to float64.
- Add _sanitize_data_vars() in olci_converter.py: strips _eopf_attrs, dtype,
  valid_min, valid_max from measurement data-var attrs before writing to
  GeoZarr store (both native and overviews); preserves _FillValue,
  scale_factor, add_offset, units, standard_name, coordinates.
- Update sanitize_array_attrs() in conversion/utils.py: always strip
  _eopf_attrs/dtype/valid_min/valid_max; only strip _FillValue when
  is_decoded_float=True (raw-integer path keeps it in attrs for downstream
  fill handling).
- Update _clear_encoding docstring: clarify converter expects raw
  (non-mask-scaled) input; only Zarr v2 codec encoding is stripped.
- Regenerate snapshot: oa01_radiance dtype=uint16, scale_factor/add_offset/
  _FillValue present, no _eopf_attrs/dtype/valid_min/valid_max.
- Add _assert_radiance_dtype_and_attrs() regression guard in snapshot test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CI-local attr sanitizer

The OLCI work modified the shared sanitize_array_attrs in conversion/utils.py,
changing S2/S1/generic output attrs and breaking 3 S2 golden-file snapshot tests.

- Restore sanitize_array_attrs to c613e24 original: always strips _eopf_attrs +
  _FillValue; strips dtype/fill_value/valid_min/valid_max and rewrites
  digital_counts units only when is_decoded_float=True.
- Add _sanitize_olci_array_attrs in olci_converter.py: strips _eopf_attrs/dtype/
  valid_min/valid_max while preserving _FillValue (needed for raw uint16 data
  opened with mask_and_scale=False).
- Replace sanitize_array_attrs usage in _sanitize_data_vars with the new helper.
- Add unit test asserting _FillValue is preserved and stale keys are stripped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mensions

reduce_swath used slice(None, None, factor) (i.e. [::factor]) for coordinate
arrays, which yields ceil(N/factor) elements when N is odd.  coarsen with
boundary="trim" yields floor(N/factor).  For real OLCI products with 4865
columns (odd), this caused a 1-element mismatch between radiance and coordinate
arrays in every overview group, making xr.open_dataset raise:
  ValueError: conflicting sizes for dimension 'columns':
    length 2433 on 'altitude' and length 2432 on 'oa01_radiance'

Fix: pre-compute dim_trim = (N // factor) * factor for each swath dimension
and change coordinate isel slices to slice(0, dim_trim, factor) so both paths
produce exactly floor(N/factor) elements for any N.

Add 4 unit tests (rows=7, cols=5) and 1 integration test (rows=10, cols=9)
that confirm consistent shapes across radiance and coordinate arrays after
reduction, and that overview groups open without conflicting-sizes errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Notebook walks through opening an EOPF OLCI L1 EFR product (real EODC sample,
with an offline fallback to the bundled fixture), detecting it, converting to a
multiscale GeoZarr store, and visualizing the pyramid by splitting one field of
view into four quadrants each rendered from a different overview level.

Adds a `notebooks` dependency group (jupyter, matplotlib, nbformat). Executed
end-to-end against the real product; outputs (incl. the quadrant figure) are
saved in the notebook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant