feat: Sentinel-3 OLCI L1 EFR → GeoZarr exporter#2
Draft
d-v-b wants to merge 23 commits into
Draft
Conversation
First Sentinel-3 exporter, scoped to OLCI L1 EFR. Grounded in a real EOPF product introspected from the EODC sample store: 21 radiance bands on a single ~300m curvilinear swath grid geolocated by per-pixel 2D lat/lon (no projected CRS). Decision: preserve native swath geometry with CF 2D coordinates (no reprojection); /2 decimation pyramid; measurements-first scope; mirror the S2 package + data_api model + CLI auto-detection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10-task TDD plan mirroring the S2 exporter: OLCI band mapping, data_api model, swath /2 decimation, structural detection, swath GeoZarr metadata, convert_olci_optimized entry point, CLI auto-detect + convert-s3-olci-optimized, real-product JSON fixture, golden-file snapshot, and verification/docs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pyproject already pins zarr-cm>=0.4.1; update the lockfile from the git-main dev build to the released 0.4.1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…p type) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ion test Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add swath_spatial_attrs() function to return GeoZarr spatial: convention data for curvilinear swath geometry with pixel registration but no affine transform or bbox (geolocation carried by 2-D lat/lon arrays). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… DataTree omits overviews Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add structure-dump fixture (shapes shrunk to 16x16 / tie-points 16x4) from S3A_OL_1_EFR NT product fetched via HTTPS .zmetadata, add s3_olci_group_example conftest fixture, and extend test_s3_olci.py with is_sentinel3_olci_dataset and Sentinel3OlciRoot round-trip tests. Also update Sentinel3OlciMeasurementsMembers to include time_stamp and fix quality/conditions member types to allow nested GroupSpec children. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e time_stamp rows Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a Supported Products section to README.md and a dedicated Sentinel-3 OLCI L1 EFR Conversion section to docs/converter.md, covering: auto-detection via `eopf-geozarr convert`, the dedicated `convert-s3-olci-optimized` command with all key flags, output layout (measurements + r2/r4/... overviews + conditions/quality passthrough), and the v1 scope note (encoding wiring is a follow-up). Also add `# test: skip` to the first code block in the OLCI implementation plan doc so test_docs.py skips the non-runnable planning snippets. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s passthrough, fixture dims FIX 0 — reduce_swath: genuine 2×2 fill-aware block-average for radiance vars (OLCI_BANDS), stride-decimate for 2-D coord arrays (lat/lon/alt); replaces literal [::2,::2] decimation of radiance that added no information. FIX 1 — declare overviews as GeoZarr multiscales: build_convention_attrs now receives a MultiscalesAttrs layout (asset "." + "r2"/"r4"/… with relative scale=[2,2] transform) and writes the CMO to measurements attrs. FIX 2 — copy measurements/orphans through (was silently dropped because to_dataset() discards child groups); iterate dt_input["/measurements"].children and _copy_subtree each one. FIX 3 — fixture dim consistency: renamed tie-point 'columns' → 'tie_columns' in geometry/meteorology arrays, fixed orphan 2-D arrays from [16,16] to [16,4] (removed_pixels=4), fixed instrument 2-D arrays to [bands,detectors] at consistent sizes, fixed 3-D meteo arrays to [4,16,4], fixed nb_removed_pixels shape; xr.open_datatree now opens the whole tree cleanly. Snapshot regenerated (conditions, quality, measurements/orphans, r2 overview). FIX 4 — wire in OLCI_BANDS: is_sentinel3_olci_dataset uses OLCI_BANDS[0], reduce_swath identifies radiance vars via OLCI_BANDS frozenset. FIX 5 — remove --enable-sharding / --keep-scale-offset from README and docs/converter.md copy-paste example; keep them in flags table with "accepted but not yet wired" note. All 31 OLCI tests pass; pyright 0 errors; ruff clean; no typing.Any. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…vert tie_columns); regen snapshot Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…egen snapshot - CLI and snapshot test now open OLCI source with mask_and_scale=False so radiance stays uint16 with scale_factor/_FillValue in .attrs; CF decoding no longer silently strips scale/offset or widens dtype to float64. - Add _sanitize_data_vars() in olci_converter.py: strips _eopf_attrs, dtype, valid_min, valid_max from measurement data-var attrs before writing to GeoZarr store (both native and overviews); preserves _FillValue, scale_factor, add_offset, units, standard_name, coordinates. - Update sanitize_array_attrs() in conversion/utils.py: always strip _eopf_attrs/dtype/valid_min/valid_max; only strip _FillValue when is_decoded_float=True (raw-integer path keeps it in attrs for downstream fill handling). - Update _clear_encoding docstring: clarify converter expects raw (non-mask-scaled) input; only Zarr v2 codec encoding is stripped. - Regenerate snapshot: oa01_radiance dtype=uint16, scale_factor/add_offset/ _FillValue present, no _eopf_attrs/dtype/valid_min/valid_max. - Add _assert_radiance_dtype_and_attrs() regression guard in snapshot test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CI-local attr sanitizer The OLCI work modified the shared sanitize_array_attrs in conversion/utils.py, changing S2/S1/generic output attrs and breaking 3 S2 golden-file snapshot tests. - Restore sanitize_array_attrs to c613e24 original: always strips _eopf_attrs + _FillValue; strips dtype/fill_value/valid_min/valid_max and rewrites digital_counts units only when is_decoded_float=True. - Add _sanitize_olci_array_attrs in olci_converter.py: strips _eopf_attrs/dtype/ valid_min/valid_max while preserving _FillValue (needed for raw uint16 data opened with mask_and_scale=False). - Replace sanitize_array_attrs usage in _sanitize_data_vars with the new helper. - Add unit test asserting _FillValue is preserved and stale keys are stripped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mensions
reduce_swath used slice(None, None, factor) (i.e. [::factor]) for coordinate
arrays, which yields ceil(N/factor) elements when N is odd. coarsen with
boundary="trim" yields floor(N/factor). For real OLCI products with 4865
columns (odd), this caused a 1-element mismatch between radiance and coordinate
arrays in every overview group, making xr.open_dataset raise:
ValueError: conflicting sizes for dimension 'columns':
length 2433 on 'altitude' and length 2432 on 'oa01_radiance'
Fix: pre-compute dim_trim = (N // factor) * factor for each swath dimension
and change coordinate isel slices to slice(0, dim_trim, factor) so both paths
produce exactly floor(N/factor) elements for any N.
Add 4 unit tests (rows=7, cols=5) and 1 integration test (rows=10, cols=9)
that confirm consistent shapes across radiance and coordinate arrays after
reduction, and that overview groups open without conflicting-sizes errors.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Notebook walks through opening an EOPF OLCI L1 EFR product (real EODC sample, with an offline fallback to the bundled fixture), detecting it, converting to a multiscale GeoZarr store, and visualizing the pyramid by splitting one field of view into four quadrants each rendered from a different overview level. Adds a `notebooks` dependency group (jupyter, matplotlib, nbformat). Executed end-to-end against the real product; outputs (incl. the quadrant figure) are saved in the notebook. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Sentinel-3 OLCI L1 EFR → GeoZarr exporter — the first Sentinel-3 product type — modeled on the existing Sentinel-2 exporter and adapted to OLCI's curvilinear swath geometry.
Design:
docs/superpowers/specs/2026-06-21-sentinel3-olci-export-design.md· Plan:docs/superpowers/plans/2026-06-21-sentinel3-olci-export.mdWhat it does
latitude/longitude. The exporter preserves that (CF 2-D coordinates), and emits the GeoZarrspatialconvention with nospatial:transform./2pyramid via fill-aware 2×2 block averaging of the 21uint16radiance bands (scale_factor/_FillValuepreserved); coordinate arrays are decimated (real measured positions, not interpolated). Levels are declared with the GeoZarrmultiscalesconvention (relativetransformonly).measurements/(21 bands + 2-D geolocation) becomes the multiscale group;conditions/quality/orphansare copied through faithfully.is_sentinel3_olci_dataset) + CLI auto-detect inconvert+ dedicatedconvert-s3-olci-optimizedcommand.New code
src/eopf_geozarr/s3_olci_optimization/—olci_band_mapping,olci_multiscale(reduce_swath,swath_spatial_attrs),olci_converter(convert_olci_optimized, detection).src/eopf_geozarr/data_api/s3_olci.py—Sentinel3OlciRootmodel.tests/— band-mapping, multiscale (incl. odd-dimension reduction), data-api/detection, integration + golden-file snapshot; real-product structure-dump fixture intests/_test_data/s3_examples/.docs/notebooks/sentinel3_olci_geozarr.ipynb— end-to-end demo (open EOPF OLCI → convert → 4-quadrant multiscale visualization), executed against the real EODC sample product. Adds anotebooksdep group.README.md/docs/converter.md— OLCI usage.Verification
mask_and_scale=False; a shared-sanitize_array_attrsregression that affected S2 — reverted, S2 snapshot byte-unchanged; odd-column overview size mismatch found by running the notebook on real 4865-col data).typing.Anyin new code; S1/S2/generic paths unaffected.Deferred (documented follow-ups)
keep_scale_offset/enable_sharding/spatial_chunk/compression_levelaccepted but not yet wired into encoding.🤖 Generated with Claude Code