Feat/s1 rtc stac builder#180
Conversation
Adds eopf_geozarr.stac.s1_rtc.build_s1_rtc_stac_item, which reads a consolidated S1 GRD RTC Zarr V3 store and returns a pystac.Item with SAR/SAT/projection extensions, WGS84 bbox derived via pyproj, and vv/vh asset sub-paths. Ascending orbit is preferred when both are present. Also adds the generate-stac-s1 CLI subcommand that prints the item as JSON. 8 unit tests cover all acceptance criteria from the plan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four functions in s1_ingest.py silently failed for s3:// paths:
- discover_s1tiling_acquisitions / discover_s1tiling_conditions used
pathlib.Path.glob(), which normalises s3:// to s3:/ and returns 0 matches
- ingest_s1tiling_acquisition / ingest_s1tiling_conditions coerced paths via
Path(), corrupting s3:// URIs, and called .exists() which always returns
False for S3 (existence checks now use path_exists() from fs_utils)
Adds three private helpers:
- _list_tifs(): uses s3fs.S3FileSystem.glob for s3:// prefixes
- _coerce_input_path(): preserves str for s3://, returns Path otherwise
- _rasterio_env(): rasterio.Env(AWSSession) context for s3:// paths;
strips https:// from AWS_ENDPOINT_URL since GDAL expects hostname only
Tested: rasterio.open("s3://...") confirmed working on OVH S3 with
AWS_S3_ENDPOINT or AWS_ENDPOINT_URL set; hostname-only required by GDAL.
Refs #139
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ix_set Per Emmanuel's review on #173: - vv/vh asset hrefs now point to {store}/{orbit} (the zarr group that carries multiscales), matching how S2 reflectance assets point to measurements/reflectance. TiTiler reads tile_matrix_set from that group's multiscales attributes. - create_s1_store now writes tile_matrix_set into the orbit group's multiscales so TiTiler can discover the tiling scheme without error. - Updated test_asset_hrefs to assert the new orbit-group href. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atibility TiTiler's _validate_zarr checks ds.rio.crs on the native resolution group (r10m). Without proj:code in the resolution group's attrs, rioxarray returns None and the group is excluded → groups=[] → bounds unpack error. S2 zarr already has proj:code at the resolution group level; mirror that for S1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…176) The deployed titiler-eopf reader resolves a group's CRS via rioxarray, which reads a CF spatial_ref/crs_wkt coordinate -- not the GeoZarr proj:code attr. S1 GRD RTC stores carried only proj:code, so rio.crs was None and every multiscale group was rejected (HTTP 500 on render/info). The S2 converter already writes a spatial_ref grid-mapping; this brings S1 in line. Also stop writing tile_matrix_set into the orbit-group multiscales: the data-model owner confirmed it is not part of the S1 GRD RTC data model, and the data_api reader already treats it as optional (MISSING default). - add _add_grid_mapping() using pyproj CRS.to_cf() (the same source rioxarray uses, so no hard-coded projection) and call it for every resolution level (both store-creation paths) and the conditions group - remove _create_tile_matrix_set() and its multiscales entry - tests: assert no tile_matrix_set, spatial_ref present + rio.crs resolves to the native EPSG, conditions group carries grid_mapping This fixes at the source what data-pipeline currently works around in ingest_v1_s1_rtc.py (_patch_cf_grid_mapping / _patch_tile_matrix_limits). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
#178) * chore: release 0.9.0 * feat: implement scale-offset and data type casting via codecs * fix: fix dependency declaration * chore: use latest version of cast value * chore: make cast-value a project dependency * test: expand test coverage * fix: upgrade pytest to 9.0.3 (CVE-2025-71176) Fixes insecure /tmp directory handling on UNIX systems. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * - Use zarr-python's presumptive implementation of scale-offset - use scalar map for handling NaN - ensure that downsampled arrays use scalar map + cast value - improve tests across parametrization of relelvant functions * feat: add store-root spatial:bbox and tighten minispec requirements (#164) * feat: add store-root spatial:bbox and tighten minispec requirements Introduces a GeoZarr "Store Root" layer in the minispec so clients can read a summary footprint without walking into child groups, and tightens the multiscale profile so `spatial:bbox` at the root and `spatial:transform` + `spatial:shape` on every layout entry are mandatory. Adds a new `geozarr.store` pydantic module (`GeoZarrStoreAttrs`, `GeoZarrMultiscaleGroupAttrs`, `GeoZarr`) enforcing the tightened profile, and updates the S2 converter to union child-group bboxes into an EPSG:4326 footprint written at the store root. Closes #156. Addresses the clear parts of #163; the array-level and non-multiscale-group parts of that issue need further clarification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: update GeoZarr store model and optimize S2 converter for improved bbox handling * Addresses review feedback on #164: - Adds a "GeoZarr Hierarchy & Identification" subsection to the Store Root section of the minispec, codifying d-v-b's proposed rules: single root, root prefix ends with `.zarr`, suffix occurs at most once in the hierarchy, and explicit terminal-path conditions. - Promotes the store-root CRS to mandatory: at least one of `proj:code`, `proj:wkt2`, `proj:projjson` MUST be set; there is no implicit EPSG:4326 default. Per @vincentsarago's review. - Promotes `zarr_conventions` declaration at the store root from RECOMMENDED to required. - Converter now writes `proj:code: "EPSG:4326"` at the root alongside `spatial:bbox`. - Pydantic `GeoZarrStoreAttrs` enforces the new CRS-required rule. Cross-links the new hierarchy + spatial:extent follow-ups upstream: zarr-developers/geozarr-spec#132 (hierarchy & root identification, also addresses #124 URL parsing) and #133 (STAC-style spatial:extent). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: uv.lock * refactor: use zarr-python 3.2.0 * chore: lockfile * fix: pin fill value based on data type instead of relying on xarray * test: update tests to consistently use nan fill value for floats * chore: bump urllib3 to 2.7.0 * chore: skip quicklook groups (#165) * chore: group dependabot updates for actions and pip (#160) * chore: group dependabot updates for actions and pip - Group all GitHub Actions bumps (minor, patch, major) into one PR - Group minor and patch pip bumps into one PR; major bumps remain ungrouped for individual review Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: change dependabot schedule day to wednesday * ci: add comment_on option to security auditing action * ci: switch dependabot Python ecosystem from pip to uv uv ecosystem reads uv.lock directly, enabling Dependabot to raise PRs for lockfile-pinned versions including security patches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: derive coarse spatial transforms from coordinates (#168) * fix: derive coarse spatial transforms from coordinates * refactor: improve function definitions for clarity and consistency * chore: release 0.10.0 (#162) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(converter): align convert layout to S2, strip _eopf_attrs, fix _FillValue (#171) - Remove TMS layout and /0 /1 /2 numeric overview groups from the general 'convert' CLI; produce S2-style sibling r{2**level} overviews instead. 'convert-s2-optimized' continues to emit the same layout. - Drop the '--tile-width' CLI option. - Sanitize array attrs in all converters: strip stale '_eopf_attrs' and source-only '_FillValue' / 'dtype' / 'valid_min' / 'valid_max' on decoded floats; rewrite 'units: digital_counts' -> 'units: 1'. The sanitizer now returns a new dict and callers reassign 'attrs', so stale keys are actually removed (a previous '.update()' pattern left them in place). - Sanitize coord attrs too (datetime coords in /conditions/meteorology were leaking '_eopf_decode_datetime64' via '_eopf_attrs'). - Set '_FillValue' properly for float measurements so xarray's CF encoder produces the base64-NaN representation needed for 'use_zarr_fill_value_as_mask=True' round-trip (xarray#11345). Fixes #171. - Remove dead 'utils.encode_cf_fill_value' (no callers). - Tests: open each r{N} multiscale level separately; migrate 'test_multiscales_round_trip' from deprecated tms.Multiscales to zcm.Multiscales (with tuplify_json to handle extra='allow' fields). - Regenerate snapshot fixtures; verified 0 '_eopf_attrs', 'tile_matrix*', or 'digital_counts' markers across S2A/S2B/S2C. * test: add guardrails for converter output attrs (#171) Walks the snapshotted GroupSpec JSON fixtures and asserts: - no '_eopf_attrs' anywhere; - no 'tile_matrix*' markers (TMS removal); - no 'units: digital_counts' on float arrays (decoded scale/offset); - every float array under '/measurements/' has '_FillValue' set (required for CF NaN-mask round-trip, xarray#11345). * docs: deprecate v0 references, document r{N} overview layout (#171) - Drop 'tile_width' parameter from API examples (option removed from CLI/API). - Replace '/measurements/r10m/{0,1,2}' nested-pyramid examples with the current flat 'r{2**level}' sibling-group layout. - Remove the V0 vs V1 split in converter.md / examples.md / quickstart.md; the general 'convert' command now produces the same flat layout as 'convert-s2-optimized'. The S2-optimized section is reframed as a feature description rather than a 'V1 vs V0' comparison. - Update architecture.md multiscales snippet to the new model ('layout' / 'asset' / 'derived_from' / 'transform'). - Update faq.md inspection snippet to walk the new layout. * refactor: clean up code formatting and enhance attribute handling in geozarr conversion * fix(cli-e2e): drop deprecated --tile-width flag from tests and example * refactor: remove deprecated v0 layout, sanitize attributes, and fix _FillValue handling * test: add _FillValue masking roundtrip test (#172) Addresses @vincentsarago's PR review: assert float arrays written with the converter's _FillValue convention round-trip through xarray's use_zarr_fill_value_as_mask=True so NaN cells come back masked. * test: drive _FillValue masking test through create_geozarr_dataset (#172) Replace the standalone xarray write with a real converter run: build a small float DataTree, invoke create_geozarr_dataset, then reopen the output with use_zarr_fill_value_as_mask=True and assert masking on the nodata patch. Exercises the converter's _FillValue + encoding path end-to-end, per @vincentsarago's review. * fix: green CI for #172 — S2 convert flat layout + idna CVE (#177) * fix(security): bump idna 3.13 -> 3.17 to clear CVE-2026-45409 pip-audit in the Security Audit workflow blocks on idna 3.13 (CVE-2026-45409, incomplete fix of CVE-2024-3651; fixed in 3.15). Bump the transitive pin via uv lock. pip-audit now reports no known vulnerabilities. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(cli): route Sentinel-2 convert through the optimized flat layout Generic `convert` emitted GeoZarr-0.4 multiscales with native data at the group root and overviews as nested r{2**N} sub-groups. That tree cannot be opened by xr.open_datatree (an overview child's x/y conflict with the parent's inherited x/y), so the `info` and `validate` CLI commands crashed on the converter's own Sentinel-2 output. Detect Sentinel-2 inputs and delegate to the existing, tested convert_s2_optimized path, which emits flat sibling r{N}m levels (the layout PR #172 documents). Non-S2 inputs keep the generic create_geozarr_dataset path. Detection is best-effort and never aborts conversion. Regenerate the geozarr_examples structure snapshots to the flat layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test: make CLI e2e assertions debuggable and fix multiscale round-trip - The CLI logs errors to stdout via structlog; assert on stdout+stderr in test_cli_e2e so a non-zero exit shows the actual error instead of an empty stderr. - test_multiscale_attrs_round_trip tuplified only one side of its equality, so flat-layout level groups carrying list-valued attrs (e.g. spatial:bbox) broke a check that is only about JSON list/tuple normalisation. Tuplify both sides; the model still round-trips losslessly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Davis Vann Bennett <davis.v.bennett@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Emmanuel Mathot <emmanuel.mathot@gmail.com>
The previous S1 RTC preview rendered a single VH band as grayscale with an incorrect rescale (0,219), unsuitable for linear gamma0 RTC values. Declare a render-extension `renders.rgb` config producing the standard dual-pol false- colour composite (R=VV, G=VH, B=VV/VH ratio) with rescale 0–0.1 and tilesize 256, referencing the preferred orbit group. Downstream titiler-based services consume this to generate previews/tiles/tilejson. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A single [min, max] pair applies to all bands in titiler, so emitting three identical pairs was redundant (and forced collapse logic in the consumer). Same rendered output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The registrar hardcoded the Sentinel-1 preview as a single VH band with an incorrect rescale (0,219), producing a near-black grayscale thumbnail unsuited to linear gamma0 RTC values. Make `add_visualization_links` and `add_thumbnail_asset` prefer a producer- declared `renders` config (STAC render extension) when present, falling back to the existing mission defaults otherwise. New `_select_render` picks the preferred render (rgb/visual/thumbnail/default) and `_render_to_query` converts it into a titiler query string (expression, rescale, bidx, tilesize, ...). Paired with EOPF-Explorer/data-model#180, which emits a `renders.rgb` dual-pol RGB composite (R=VV, G=VH, B=VV/VH ratio, rescale 0-0.1) on S1 RTC items, this yields the correct preview/tiles/tilejson without any per-mission code here. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Tracking a known follow-up surfaced by the 32TLR S1 RTC e2e (data-pipeline #226, item F1):
Not required to land this PR — captured here so it's tracked alongside the S1 RTC builder work. |
Brings in #184 (multi-frame masked-timestamp discovery) and the #173→#179 build_s1_rtc_stac_item revert from the s1-tiling line. Conflict in tests/test_s1_rtc_ingest.py resolved by keeping both newly-added discovery tests (s3:// listing + masked-stamp resolution); s1_ingest.py auto-merged (s3fs _list_tifs + the masked-stamp branch are compatible). Full test_s1_rtc_ingest.py suite passes (50).
…ender path (#246)
titiler-eopf reconstructs the store path as
s3://{bucket}/tests-output/{collection}/{item_id}.zarr and ignores the STAC asset
href, so the GeoZarr cube must be named after the item id (s1-rtc-{tile}) for new
tiles to preview. Parse the tile from the s1-rtc- prefix (was s1-grd-rtc-); the
item id (s1-rtc-{tile}) is unchanged, so filename == item id by construction.
Pairs with the data-pipeline direct-write change (store written at the tests-output
path) which replaces the #250 auto-copy. Dev-phase, no consumers — disposable test
data. Revert to s1-grd-rtc- when titiler-eopf#108 (resolve store from href) lands.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
build_s1_rtc_stac_item opened the cube with zarr.open_consolidated, which raises `ValueError: Consolidated metadata ... not found` when the store lacks root consolidated metadata. A per-tile cube grown by appending a time-slice to an existing same-orbit group ends up in exactly that state — re-consolidating an append on the S3 store is unreliable — so live STAC registration failed on the 2nd+ same-orbit acquisition for a tile (it broke the Pyrenees S1 RTC soak). The builder must not require consolidated metadata: titiler reads these stores fine without it. Fall back to a direct zarr.open_group(mode="r") read when the consolidated metadata is absent; re-raise any other ValueError. Adds a regression test: build from a 2-slice same-orbit store created with consolidate=False (fails before the fix in open_consolidated, passes after). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Commit |
…-builder # Conflicts: # .release-please-manifest.json # CHANGELOG.md # pyproject.toml # uv.lock
…ring (#192) (#193) Per-acquisition previews render via titiler `sel=time={INTEGER index}` — a positional index that breaks once a cube's time axis goes non-monotonic (a cross-run append of an earlier-dated scene; proved on 31TEH: `[06-08, 06-07]` → the 06-07 item has no preview). Keeping positional indices correct would need cube reorder + full re-registration on every append (not scalable). The deployed titiler (v0.5.0) already does label-based `da.sel(..., method=...)` via `open_datatree(decode_times=True)`; the only gap is the cube's `time` array, which is a bare int64 with no CF datetime metadata, so it can't be a datetime index. Encoding it lets previews select by exact datetime, which works even on a non-monotonic axis — no reorder, register only the new item. titiler picks a multiscale level by zoom (previews use a coarse level), so `time` must resolve there. Putting it only at the orbit-group level and inheriting fails to open while r10m stays bare int64 (AlignmentError: int64 vs datetime64 on the shared dim). So encode `time` consistently at EVERY level: - new `_create_time_coordinate_array` + `TIME_CF_ATTRS` (units/calendar/ standard_name); created at every level in create + append-group paths. - the append write resizes/writes `time` at every level (was r10m only). - absolute_orbit/relative_orbit/platform stay at r10m (not selected on). - stored dtype stays int64 ns, so register_per_acquisition's raw read is unaffected. Adds TestTimeCFDatetime incl. exact datetime `.sel` on a non-monotonic axis at the native and a coarse level. 55 passed. Refs #192 Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
|
Nudge to land this: the S1 RTC datetime-rendering work (validated end-to-end, 10/10 stable renders by exact datetime) currently pins eopf-geozarr to |
…brary (#195) (#196) * fix(s1-rtc): enrich cube STAC item + fix vv/vh asset model (#195) The S1 RTC cube STAC item carried duplicate vv/vh assets with byte-identical hrefs (bug #1), exposed no asset for the descending orbit group of a dual-orbit cube (bug #2), and was missing most of the metadata the S2 L2A reference carries. Rework `build_s1_rtc_stac_item`: - Asset model: replace the two indistinguishable vv/vh assets with one `gamma0-rtc-backscatter-{asc,desc}` asset per present orbit group, exposing VV/VH as named STAC 1.1 `bands` (no more duplicate hrefs), plus a `border-mask-{orbit}` asset for the valid-data mask. Each γ⁰ asset carries data_type/nodata/unit/gsd. - Identity: add `constellation`, `instruments: [c-sar]`, `gsd` (platform stays per-acquisition — a cube can mix S1A/S1C). - Projection: add proj:bbox, and proj:shape/proj:transform (best-effort from the r10m group attrs). - Datacube extension: cube:dimensions (time as a bounded extent — not a values list, the cube grows by appending — plus x/y) + cube:variables. - Descriptive: title/description; created (earliest acquisition, stable across rebuilds) / updated (build time) via the timestamps extension. - Fold the corrected render rescale (0.0,0.2) into `_rgb_render`. Out of scope (tracked follow-ups): statistics (#157), processing:software / DEM lineage, and per-product CDSE fields are not in the store; per-acquisition item construction is moved in a follow-up commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(s1-rtc): move per-acquisition + coverage STAC construction into the library Consolidate the *pure* S1 RTC STAC construction in eopf_geozarr.stac.s1_rtc so item shape has a single owner (the library); registration (STAC-API upsert, S3 alternates, gateway rewrite, TiTiler links, CDSE discovery) stays in data-pipeline. Adds, ported from the data-pipeline scripts (deployment URLs stripped): - acquisition_id(tile_id, when) - build_s1_rtc_per_acquisition_items(store, *, orbit, collection_id) -> list[pystac.Item]: one single-datetime item per cube time slice of an orbit, reoriented to that orbit (sat:orbit_state + render expression + only that orbit's γ⁰/mask assets), per-slice platform, datacube structure dropped (a single acquisition is not a cube). Deployment-agnostic: carries the render config + datetime; the registration layer derives the cube-endpoint TiTiler links (sel=time={datetime}) from it. - Slice / pick_slice / slice_coverages: coverage-based preview-slice selection (reads border_mask at r720m). Ported the construction unit tests (pick_slice, slice_coverages, and the per-acquisition construction subset); the link/thumbnail/alternate-asset and rescale-override tests stay in data-pipeline as registration concerns. Follow-up (separate data-pipeline PR): bump the data-model pin, delete the moved functions from register_per_acquisition.py / register_v1_s1_rtc.py, and rewire them to consume this library output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(s1-rtc): per-acquisition items carry their own orbit footprint Review follow-up. A per-acquisition item cloned the cube base, so on a dual-orbit cube every item inherited the WGS84 union bbox/geometry of both orbits and the preferred (ascending) orbit's proj:bbox — advertising a footprint wider than, and a data bbox from the wrong orbit relative to, the acquisition's own orbit. Recompute bbox/geometry/proj:bbox from the run orbit's spatial:bbox (factored the bbox->polygon construction into `_bbox_to_geometry`, shared with the cube builder), and give per-acquisition items a single-acquisition description rather than the cube's "datacube" wording. Tests: per-acq footprint = run orbit (not the union); `created` = earliest acquisition (idempotent across rebuilds); ValueError on an invalid or absent orbit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(s1-rtc): omit orbit_state on dual-orbit cubes; expose datacube dimension sizes Two review observations from inspecting the live items: - sat:orbit_state is single-valued, so on a dual-orbit cube (both ascending and descending groups present) a single value mislabels half the slices. Set it (and declare the SAT extension) only for a single-orbit cube; per-acquisition items, which are single-orbit, still carry the real per-orbit value. Ensure per-acq items declare the SAT extension even when cloned from a dual-orbit base that omitted it. - cube:dimensions previously carried only `extent`, so the number of elements per dimension wasn't visible. The time axis is irregularly sampled, so list its discrete `values` (count = number of acquisitions); the regular x/y axes get a `step` (size derivable from extent; exact pixel count is also in proj:shape). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(s1-rtc): datacube time axis carries extent only (drop values enumeration) Enumerating every acquisition as cube:dimensions.time.values does not scale as the cube is appended over the mission. No STAC extension provides a scalable temporal element-count field (datacube temporal_dimension offers only values/step/extent, and step is null for S1's irregular sampling; projection proj:shape is spatial-only; raster has no dimension shape). So the time axis carries a bounded `extent` only. x/y element counts remain available via proj:shape (exact) and the datacube x/y extent+step. The number of acquisitions is obtained from the per-acquisition collection (STAC search count), which stays correct as the cube grows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(s1-rtc): correct datacube variable_type, platform value, created, time values Review of the live items surfaced several metadata issues: - cube:variables used `type` (a non-standard, silently-ignored key) instead of the datacube extension's `variable_type`; border_mask is now `auxiliary`, vv/vh `data`. - per-acquisition `platform` carried the store's short code (e.g. `s1a`); normalize to the STAC convention `sentinel-1a` (mirrors the Sentinel-2 reference). - `created` (timestamps ext) was set to an acquisition time, which misuses the field (it means the item's creation instant); omit it and keep `updated` = build time. - the cube's irregular time axis now lists its discrete `values` (the acquisition instants) so the number of time steps is visible; the regular x/y axes stay extent + step (their pixel count is in proj:shape). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(s1-rtc): robust EPSG parse for datacube reference_system + clarify per-acq proj inheritance Review follow-ups (no behaviour change for the EPSG:NNNN stores in use): - derive the datacube reference_system via pyproj.CRS(...).to_epsg() instead of splitting the proj:code string, so non-"EPSG:NNNN" CRS forms don't break it. - comment why per-acquisition items recompute proj:bbox from their orbit but inherit proj:code/shape/transform (the MGRS grid is shared across orbits). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(s1-rtc): note the dual-orbit time axis spans both orbits in the datacube A dual-orbit cube merges two per-orbit sub-cubes (disjoint time axes) onto a shared grid. Modelling orbit as a dimension would imply a sparse, mostly-empty orbit×time grid; instead orbit stays an attribute of each acquisition, conveyed via the per-orbit assets. Add a `description` on the time dimension so the merged axis isn't misread. Single-orbit cubes get no note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(s1-rtc): distinct per-acquisition titles; clearer asset titles - per-acquisition items inherited the cube's "… — tile {id}" title, so every scene in the acquisitions collection was titled identically. Give each a title with its datetime + orbit so siblings are distinguishable. - simplify the border-mask asset title ("Valid-data mask") and make the zarr-store title say "Sentinel-1" for naming consistency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…v/vh pyramid (#197) * perf(s1-rtc): shard the conditions arrays like the vv/vh pyramid A real S1 RTC cube is 3807 objects / 3.5 GB, of which 3604 (94.7%) are the conditions/gamma_area_<relorbit> arrays: [10980,10980] float32, inner chunk 366², NO sharding_indexed -> ~900 tiny chunk objects each. They are time-invariant yet dominate the object count, which dominated the ingest's S3 upload wall-time (a live pod sat ~34 min in "Uploading store" at 9 millicores). The vv/vh/border_mask display pyramid is already sharded (one shard per time slice over the full spatial extent, inner 366²). Apply that same existing layout to the condition arrays: add shards=(h, w) to the one create_array in ingest_s1tiling_conditions. All condition arrays (gamma_area, lia, incidence_angle) share that write path and the same 2D full-resolution shape, so all collapse from ~900 chunk objects to 1 shard object (cube ~3807 -> ~210). calculate_aligned_chunk_size returns a divisor of the dimension, so (h, w) is a clean multiple of the inner chunk (Zarr v3 shard-divisibility). conditions arrays are NOT in TiTiler's render path (vv/vh/border_mask), so this does not touch the web-render layout; it only makes a client read a condition array in one ranged GET instead of ~900. Values are byte-identical. Tests: +2 (sharding codec present; 9 inner chunks -> 1 on-disk shard object + byte-identical roundtrip). 57 passed. Spec: claude-docs/specs/s1_gamma_area_sharding.md. Cross-repo Task T5 of data-pipeline/claude-docs/plans/s1_ingest_upload_perf.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * docs(s1-rtc): record real-S3 sharding benchmark in the T5 spec Validated on the live OVH bucket (laptop->DE): object collapse 100->1 (prod ~900->1), PUT 1.7x faster even with batched concurrency on, divisibility valid at the production 10980² (aligned 366 divides 10980), reads byte-identical. Honest caveat recorded: a full-array sequential read is NOT faster sharded (same bytes, one un-parallelizable object) — the win is object-count (upload + listing) and windowed/partial cloud reads, not full-read throughput. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * refactor(s1-rtc): trim the conditions-sharding comment to match vv/vh style Comment-only. The surrounding vv/vh sharding has no inline explainer; the long cloud-access rationale now lives in claude-docs/specs/s1_gamma_area_sharding.md. Keep only the non-obvious bits: why one shard, and the Zarr v3 shard-divisibility invariant. No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * chore(s1-rtc): drop the claude-docs spec from this PR The data-model repo has no claude-docs/specs convention; the spec was noise for this PR's reviewers. The problem statement, real-S3 benchmark and migration note live in the data-pipeline plan + tracking issue EOPF-Explorer/data-pipeline#288 and PR #197's description. Also drop the now-dangling spec path from the code comment (the rationale stays inline). No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… writer) (#200) * perf(s1-rtc): shard the conditions arrays like the vv/vh pyramid A real S1 RTC cube is 3807 objects / 3.5 GB, of which 3604 (94.7%) are the conditions/gamma_area_<relorbit> arrays: [10980,10980] float32, inner chunk 366², NO sharding_indexed -> ~900 tiny chunk objects each. They are time-invariant yet dominate the object count, which dominated the ingest's S3 upload wall-time (a live pod sat ~34 min in "Uploading store" at 9 millicores). The vv/vh/border_mask display pyramid is already sharded (one shard per time slice over the full spatial extent, inner 366²). Apply that same existing layout to the condition arrays: add shards=(h, w) to the one create_array in ingest_s1tiling_conditions. All condition arrays (gamma_area, lia, incidence_angle) share that write path and the same 2D full-resolution shape, so all collapse from ~900 chunk objects to 1 shard object (cube ~3807 -> ~210). calculate_aligned_chunk_size returns a divisor of the dimension, so (h, w) is a clean multiple of the inner chunk (Zarr v3 shard-divisibility). conditions arrays are NOT in TiTiler's render path (vv/vh/border_mask), so this does not touch the web-render layout; it only makes a client read a condition array in one ranged GET instead of ~900. Values are byte-identical. Tests: +2 (sharding codec present; 9 inner chunks -> 1 on-disk shard object + byte-identical roundtrip). 57 passed. Spec: claude-docs/specs/s1_gamma_area_sharding.md. Cross-repo Task T5 of data-pipeline/claude-docs/plans/s1_ingest_upload_perf.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * docs(s1-rtc): record real-S3 sharding benchmark in the T5 spec Validated on the live OVH bucket (laptop->DE): object collapse 100->1 (prod ~900->1), PUT 1.7x faster even with batched concurrency on, divisibility valid at the production 10980² (aligned 366 divides 10980), reads byte-identical. Honest caveat recorded: a full-array sequential read is NOT faster sharded (same bytes, one un-parallelizable object) — the win is object-count (upload + listing) and windowed/partial cloud reads, not full-read throughput. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * refactor(s1-rtc): trim the conditions-sharding comment to match vv/vh style Comment-only. The surrounding vv/vh sharding has no inline explainer; the long cloud-access rationale now lives in claude-docs/specs/s1_gamma_area_sharding.md. Keep only the non-obvious bits: why one shard, and the Zarr v3 shard-divisibility invariant. No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * chore(s1-rtc): drop the claude-docs spec from this PR The data-model repo has no claude-docs/specs convention; the spec was noise for this PR's reviewers. The problem statement, real-S3 benchmark and migration note live in the data-pipeline plan + tracking issue EOPF-Explorer/data-pipeline#288 and PR #197's description. Also drop the now-dangling spec path from the code comment (the rationale stays inline). No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * fix(s1-rtc): heal a multiscale level missing `time` on append (robust writer) `ingest_s1tiling_acquisition` resized `level["time"]` on every multiscale level, assuming a fresh build created `time` at each level (#192). A cube built before #192 -- or left half-built by an interrupted append -- can carry `r10m/time` yet lack it at a coarser level, so the resize raised `KeyError: 'time'` and, because the append consistency check validated only CRS + shape, the ingest was non-convergent (observed on 30TWM). Before the per-level write loop, recreate any missing-level `time` from `r10m/time` (backfilling the existing slices so prior timestamps are preserved), or raise a clear error when the cube is inconsistent in a way a backfill cannot fix (a level's length disagrees with `r10m/time`, or `r10m` has slices but no `time`). This is the durable upstream counterpart to the data-pipeline guard (data-pipeline #294), making that orchestration-side mitigation belt-and-suspenders. Tests: 4 new cases (heal, no-op when healthy, raise on half-built, raise on missing r10m/time); full s1_ingest suite 61 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01V3qS75byrUuCSHFqcWi26B --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(s1-rtc): write CF `_FillValue` on float arrays (data-model #172 parity) The S1 RTC ingest writer (`s1_ingest.py`) was never touched by #172, which added the CF `_FillValue` attribute to the S2 and S1-GRD paths so xarray can mask NaN nodata via `use_zarr_fill_value_as_mask=True` despite xarray #11345 — the zarr-level `fill_value` field alone is not surfaced through xarray's encoding. Consequently S1 RTC cubes carried only `grid_mapping` on their float bands, while S2 cubes carry `_FillValue` (`AAAAAAAA+H8=`) + standard_name + units, leaving S1 nodata unmaskable for generic xarray/CF consumers. Set the CF `_FillValue` (FillValueCoder-encoded, matching S2) plus standard_name and units on the float backscatter bands (vv/vh) at every multiscale level, and `_FillValue` on the float condition arrays (gamma_area, lia). The two near-identical vv/vh creation loops — new store vs. new orbit on an existing store — are unified into a shared `_create_band_arrays` helper so they can't drift again (the inline path previously lacked the metadata). Not ported (S2-specific or N/A to the RTC writer): the scale/offset + CastValue codec (S1 vv/vh are float32, unpacked), v0 deprecation (RTC is v1-only), and `sanitize_array_attrs` (s1_ingest writes controlled attrs zarr-direct and never copies source EOPF attrs). Tests: create-store unit, end-to-end ingest across both orbit-creation paths, and float-conditions coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011sBdb17MWSAn7VEPSnqKJU * refactor(s1-rtc): address review — round-trip test, orbit-builder dedup, typed attrs - Add an end-to-end NaN-masking round-trip test (open with `use_zarr_fill_value_as_mask=True`, `to_masked_array()`), mirroring the S2 guarantee in `test_array_attrs.py`. Proves `_FillValue` actually masks nodata, not just that the attribute is present. - Unify the two orbit-creation paths — new store (`create_s1_store`) and new orbit added to an existing store (`ingest_s1tiling_acquisition`) — into a shared `_build_orbit_group` helper. Removes ~60 lines of duplication and fixes the inline path's missing `proj:code` on level groups (guarded by a new test). Net: `s1_ingest.py` shrinks while gaining the fix. - Model the backscatter band CF attrs as a `S1BackscatterAttrsJSON` TypedDict (PR review request), alongside the existing `Standard*CoordAttrsJSON` types. - Document the internal `xarray.backends.zarr.FillValueCoder` import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011sBdb17MWSAn7VEPSnqKJU --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
| # per-acquisition previews can only select positionally (`sel=time={index}`) — fragile once a cube's | ||
| # time axis goes non-monotonic. With these attrs `time` decodes to datetime64 and previews can render by | ||
| # `sel=time={datetime}` (order-immune). The stored dtype stays int64 nanoseconds. See data-model #192. | ||
| TIME_CF_ATTRS = { |
| if TYPE_CHECKING: | ||
| from pathlib import Path | ||
|
|
||
| CRS = "EPSG:32631" |
There was a problem hiding this comment.
would be good to associate special values like this with types. I wonder if we need a NewType for CRS strings.
| from pathlib import Path | ||
|
|
||
| CRS = "EPSG:32631" | ||
| UTM_BBOX = [300000.0, 4900000.0, 400000.0, 5000000.0] |
There was a problem hiding this comment.
this pattern:
from typing import NewType
BoundingBox2D = NewType("BoundingBox2D", tuple[float, float, float, float])
def make_bounding_box(value: object) -> BoundingBox2D: ...ensures that we never confuse a bounding box with some other collection of numbers
…ent (#202) S1 RTC previews render the out-of-swath region as opaque black while the S2 reference renders it transparent. The cause is the *data*, not the metadata: #201 already gave vv/vh the same dtype/`fill_value`=NaN/ `_FillValue`/`grid_mapping` encoding as S2, but those are inert because no NaN is ever written — s1tiling stores `0.0` out of swath, and titiler treats `0` as valid data and paints it black. Mask nodata to NaN at the writer: - vv/vh: `np.where(border_mask == 0, NaN, ...)` — border_mask is the authoritative valid-data mask (0 = no-data). `_downsample_2d` already uses `np.nanmean` for floats, so NaN propagates to every overview level. - float conditions (gamma_area/lia): masked read off the GeoTIFF's declared nodata (border_mask is N/A for static conditions); a no-op when no nodata is declared. Only newly re-ingested cubes get NaN; existing cubes are remediated separately. Tests: NaN ⟺ border_mask==0 at native + overview levels, masking round-trips via `use_zarr_fill_value_as_mask`, conditions declared-nodata → NaN. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…d one (#203) S1 RTC cubes ended up with only one orbit consolidated on disk (staging: 30TWM asc✓/desc✗, 32TNN/30UVU asc✗/desc✓). Root cause: the pipeline ingests acquisitions one orbit at a time in separate pods, each of which strips all consolidated metadata (so `time` can resize), ingests its orbit, then calls `consolidate_s1_store(store, orbit_direction)` — which only re-consolidated the single orbit passed. So whichever orbit was ingested last is the only one left with on-disk consolidated metadata. Consolidate every orbit group present (iterate `root.groups()`), then the root. The minimal append-fetch already pulls all `zarr.json`, so both orbits' metadata is local — same reason the root consolidation already works. Signature unchanged (`orbit_direction` kept for logging/callers). Impact is cosmetic: the root consolidation is complete for both orbits and readers opening at the root get a synthesized child view (titiler renders the unconsolidated orbit fine). The fix matters for clients opening a single orbit group standalone (the STAC `<cube>.zarr/<orbit>` hrefs) with `consolidated=True`, which otherwise fall back to a listing. Test asserts each orbit group is consolidated *standalone* — checking via `root[orbit]` is a false-green because a consolidated root synthesizes the child's view. Existing cubes self-heal on their next re-ingest. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The render expression emits 3 bands (vv; vh; vv/vh) but supplied a single rescale pair [0,0.2] applied to all three. The vv/vh ratio (natural range ~1-15) saturated under that stretch -> flat blue/purple wash, and low cross-pol water dropped to transparent, which made (correctly geolocated) ascending swaths look mislocated. Give each band its own pair: vv [0,0.4], vh [0,0.1], ratio [1,15]. Cosmetic preview-only change; pixel placement is unchanged. Verified live against the raster API and the S1 STAC tests. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
#205) Both the cube item and (by inheritance) its per-acquisition items now carry the STAC grid extension with grid:code = "MGRS-{tile}". This makes the tile a first-class queryable property: the acquisitions collection becomes natively tile-filterable in STAC Browser, and cube↔acquisition cross-links can filter on grid:code instead of an id-prefix LIKE. Cube-builder only — per-acquisition items inherit it via the existing build_s1_rtc_per_acquisition_items copy ({**base_dict} carries stac_extensions; the per-acq property denylist excludes grid:code). Claude-Session: https://claude.ai/code/session_019z3eVtkSNHhN9vHd8QqGcf Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
reopen #173
Summary
Adds S1 GRD RTC STAC item generation plus the ingest fixes needed to make those items render correctly in TiTiler.
STAC builder (
src/eopf_geozarr/stac/s1_rtc.py)build_s1_rtc_stac_item(zarr_store, collection_id) -> pystac.Itemproj:code,spatial:bbox,r10m/time,r10m/platformfrom each orbit-direction grouppyproj; ascending orbit preferred for assets and projection metadata when both orbits are presentrescalepair) so titiler-eopf can render previews/tilesvv/vhasset hrefs point to the orbit-group rootCLI
generate-stac-s1 --store <path> [--collection <id>]subcommandIngest fixes (
src/eopf_geozarr/conversion/s1_ingest.py)s3://URIs in S1Tiling discovery and ingestgrid_mappingand dropstile_matrix_setfrom the RTC storeproj:codeto resolution-group attrs for TiTiler compatibilitytimecoordinate at every multiscale level so per-acquisition previews render by datetime instead of a fragile positional index (details below)NaN, not0, so titiler masks out-of-swath transparent (fix(s1-rtc): store nodata as NaN, not 0, so titiler masks it transparent #202, completes the_FillValueencoding from fix(s1-rtc): write CF_FillValueon float arrays (#172 parity) #201): s1tiling writes0.0out of swath and titiler treats0as valid data → opaque black.vv/vhare now masked byborder_mask(0= no-data) and float conditions (gamma_area/lia) by the GeoTIFF's declared nodata, so previews render nodata transparent like the S2 reference.np.nanmeandownsampling carries theNaNto every overview level. Affects newly re-ingested cubes only.Datetime-based per-acquisition rendering (#193, closes #192)
Per-acquisition previews render via titiler
sel=time={…}. Previouslytimewas a bareint64array with no CF datetime metadata, present only atr10m, so titiler could only select positionally (sel=time={index}). That index silently mis-targets — or 400s — once a cube's time axis goes non-monotonic, which happens on a cross-run append of an earlier-dated scene (observed on 31TEH:[idx0=06-08, idx1=06-07], leaving the 06-07 item with no preview). Keeping positional indices correct would require reordering cube data and re-registering every item on every append — not scalable.Instead, this CF-encodes
time(units = "nanoseconds since 1970-01-01",calendar = "proleptic_gregorian",standard_name = "time") and replicates it as a coordinate at every multiscale level:da.sel(…, method=…)viaopen_datatree(decode_times=True); with the CF attrs,timedecodes to a datetime64 index and previews can select by exact datetime — which works even on a non-monotonic axis (onlynearest/slice need monotonicity). So no cube reorder, and registration can be incremental (only the new item).timemust resolve at whatever level TiTiler renders (previews use a coarse level). Putting it only at the orbit-group level and relying on DataTree inheritance fails to open whiler10mstays bare int64 (AlignmentError: int64 vs datetime64 on the sharedtimedim) — hence per-level encoding.int64ns, so downstream raw readers (e.g. data-pipelineregister_per_acquisition) are unaffected.Implementation: new
_create_time_coordinate_array+TIME_CF_ATTRS;timeis created/written CF-encoded at every level in the create, append-group, and append-write paths (absolute_orbit/relative_orbit/platformstay atr10m, since they are not selected on). The companion data-pipeline change (per-acquisition render links emittingsel=time={datetime}+ incremental registration) is tracked in #192.Dependencies
pystac>=1.8.0Test plan
tests/test_s1_stac.py— 11 unit tests covering: item id, temporal range, WGS84 bbox, both-orbit bbox union, ascending asset preference, non-consolidated store, empty-store ValueError, asset hrefs, SAR extension fields, and the RGB render extension (incl. ascending preference)tests/test_s1_rtc_ingest.py— ingest coverage incl.s3://discovery, masked multi-frame timestamps, CFgrid_mapping/CRS resolution, and absence oftile_matrix_settests/test_s1_rtc_ingest.py::TestTimeCFDatetime(fix(s1-rtc): CF-encodetimeat every level for datetime-based rendering #193) — 5 tests: CF attrs ontimeat every level,timedecodes to datetime64 viaopen_datatree, exact datetime.selon a non-monotonic axis at the native and a coarse level, identicaltimevalues across levels, andr10m/timestill rawint64for downstream readerstests/test_s1_rtc_ingest.pynodata→NaN (fix(s1-rtc): store nodata as NaN, not 0, so titiler masks it transparent #202) —NaN ⟺ border_mask==0at native + overview,_FillValuemasking round-trips viause_zarr_fill_value_as_mask, and declared-nodata conditions →NaNruffclean on new files;mypy src/eopf_geozarr/stac/cleanBranch sync
This branch has been merged up to date with
s1-tiling(release 0.10.1), resolving version/dependency conflicts (pyproject.toml,uv.lock,CHANGELOG.md,.release-please-manifest.json).aiohttp>=3.14.0(CVE fix from main) is included; PR now reports mergeable / clean.🤖 Generated with Claude Code