perf(s1-rtc): shard the conditions arrays (gamma_area/LIA) like the vv/vh pyramid by lhoupert · Pull Request #197 · EOPF-Explorer/data-model

lhoupert · 2026-06-19T08:36:57Z

Tracking issue: EOPF-Explorer/data-pipeline#288 · Sibling PR: EOPF-Explorer/data-pipeline#287 (T1–T4 — incremental + concurrent transfer) · Base: feat/s1-rtc-stac-builder (#180)

Stacked on #180 (feat/s1-rtc-stac-builder) — this PR merges into #180, not main.

Task T5 of the data-pipeline S1 ingest upload-bottleneck plan
(data-pipeline/claude-docs/plans/s1_ingest_upload_perf.md). The biggest absolute lever in that work.

Problem

A real S1 RTC cube (s1-rtc-31TEG, staging) is 3807 objects / 3.5 GB, of which 3604 (94.7%)
are the conditions/gamma_area_<relorbit> arrays: [10980,10980] float32, inner chunk 366², no
sharding_indexed → ~900 tiny chunk objects per array. They are time-invariant yet dominate the
object count, which dominated the ingest's S3 upload wall-time (a live pod sat ~34 min in
"Uploading store" at 9 millicores — pure object-count latency).

The vv/vh/border_mask display pyramid is already sharded (one shard per time slice over the
full extent, inner 366²). This applies that same existing layout to the one array family left out.

Change

Add shards=(h, w) to the single condition create_array in ingest_s1tiling_conditions. All
condition arrays (gamma_area, lia, incidence_angle) share that write path and the same 2D
full-resolution shape, so all collapse from ~900 chunk objects to 1 shard object (cube
~3807 → ~210). calculate_aligned_chunk_size returns a divisor of the dimension, so (h, w) is a
clean multiple of the inner chunk (Zarr v3 shard-divisibility). Condition arrays are not in
TiTiler's render path, so the web-render layout is untouched; values are byte-identical.

Validation (real OVH S3, laptop→DE)

layout	S3 objects	PUT (best of 3)	full-array GET	byte-identical
unsharded	100	4.43 s	1.38 s	✓
sharded	1	2.55 s (1.7×)	1.61 s	✓

Object collapse 100 → 1 (production ~900 → 1 per array; cube 3604 → ~4).
PUT 1.7× faster even with batched concurrency on; ratio grows with object count.
Divisibility valid at the production 10980² (aligned=366, 10980 % 366 == 0).
Honest caveat: a full-array read is not faster sharded (same bytes, one un-parallelizable
object) — the win is object-count (upload + listing) and windowed/partial cloud reads.

Tests

+2 targeted (test_gamma_area_is_sharded, test_sharding_collapses_chunk_objects_to_one — 9 inner
chunks → 1 on-disk shard object + byte-identical roundtrip). 102 passed across ingest + STAC +
per-acquisition + data_api (no regression in consolidation/STAC).

Migration

Old cubes stay unsharded until re-ingested (Zarr doesn't re-chunk in place) — sequence a per-tile
re-ingest after the rebuilt image deploys. Documented in the migration note above and tracking issue EOPF-Explorer/data-pipeline#288.

🤖 Generated with Claude Code

A real S1 RTC cube is 3807 objects / 3.5 GB, of which 3604 (94.7%) are the conditions/gamma_area_<relorbit> arrays: [10980,10980] float32, inner chunk 366², NO sharding_indexed -> ~900 tiny chunk objects each. They are time-invariant yet dominate the object count, which dominated the ingest's S3 upload wall-time (a live pod sat ~34 min in "Uploading store" at 9 millicores). The vv/vh/border_mask display pyramid is already sharded (one shard per time slice over the full spatial extent, inner 366²). Apply that same existing layout to the condition arrays: add shards=(h, w) to the one create_array in ingest_s1tiling_conditions. All condition arrays (gamma_area, lia, incidence_angle) share that write path and the same 2D full-resolution shape, so all collapse from ~900 chunk objects to 1 shard object (cube ~3807 -> ~210). calculate_aligned_chunk_size returns a divisor of the dimension, so (h, w) is a clean multiple of the inner chunk (Zarr v3 shard-divisibility). conditions arrays are NOT in TiTiler's render path (vv/vh/border_mask), so this does not touch the web-render layout; it only makes a client read a condition array in one ranged GET instead of ~900. Values are byte-identical. Tests: +2 (sharding codec present; 9 inner chunks -> 1 on-disk shard object + byte-identical roundtrip). 57 passed. Spec: claude-docs/specs/s1_gamma_area_sharding.md. Cross-repo Task T5 of data-pipeline/claude-docs/plans/s1_ingest_upload_perf.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP

Validated on the live OVH bucket (laptop->DE): object collapse 100->1 (prod ~900->1), PUT 1.7x faster even with batched concurrency on, divisibility valid at the production 10980² (aligned 366 divides 10980), reads byte-identical. Honest caveat recorded: a full-array sequential read is NOT faster sharded (same bytes, one un-parallelizable object) — the win is object-count (upload + listing) and windowed/partial cloud reads, not full-read throughput. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP

… style Comment-only. The surrounding vv/vh sharding has no inline explainer; the long cloud-access rationale now lives in claude-docs/specs/s1_gamma_area_sharding.md. Keep only the non-obvious bits: why one shard, and the Zarr v3 shard-divisibility invariant. No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP

lhoupert · 2026-06-19T08:46:36Z

Context / problem statement: EOPF-Explorer/data-pipeline#288 (umbrella issue covering this PR + the sibling data-pipeline transfer PR #287).

lhoupert · 2026-06-19T08:58:11Z

@d-v-b do you approve?

The data-model repo has no claude-docs/specs convention; the spec was noise for this PR's reviewers. The problem statement, real-S3 benchmark and migration note live in the data-pipeline plan + tracking issue EOPF-Explorer/data-pipeline#288 and PR #197's description. Also drop the now-dangling spec path from the code comment (the rationale stays inline). No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP

d-v-b · 2026-06-19T09:20:51Z

looks good!

lhoupert · 2026-06-19T09:33:43Z

fiy @emmanuelmathot

… writer) (#200) * perf(s1-rtc): shard the conditions arrays like the vv/vh pyramid A real S1 RTC cube is 3807 objects / 3.5 GB, of which 3604 (94.7%) are the conditions/gamma_area_<relorbit> arrays: [10980,10980] float32, inner chunk 366², NO sharding_indexed -> ~900 tiny chunk objects each. They are time-invariant yet dominate the object count, which dominated the ingest's S3 upload wall-time (a live pod sat ~34 min in "Uploading store" at 9 millicores). The vv/vh/border_mask display pyramid is already sharded (one shard per time slice over the full spatial extent, inner 366²). Apply that same existing layout to the condition arrays: add shards=(h, w) to the one create_array in ingest_s1tiling_conditions. All condition arrays (gamma_area, lia, incidence_angle) share that write path and the same 2D full-resolution shape, so all collapse from ~900 chunk objects to 1 shard object (cube ~3807 -> ~210). calculate_aligned_chunk_size returns a divisor of the dimension, so (h, w) is a clean multiple of the inner chunk (Zarr v3 shard-divisibility). conditions arrays are NOT in TiTiler's render path (vv/vh/border_mask), so this does not touch the web-render layout; it only makes a client read a condition array in one ranged GET instead of ~900. Values are byte-identical. Tests: +2 (sharding codec present; 9 inner chunks -> 1 on-disk shard object + byte-identical roundtrip). 57 passed. Spec: claude-docs/specs/s1_gamma_area_sharding.md. Cross-repo Task T5 of data-pipeline/claude-docs/plans/s1_ingest_upload_perf.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * docs(s1-rtc): record real-S3 sharding benchmark in the T5 spec Validated on the live OVH bucket (laptop->DE): object collapse 100->1 (prod ~900->1), PUT 1.7x faster even with batched concurrency on, divisibility valid at the production 10980² (aligned 366 divides 10980), reads byte-identical. Honest caveat recorded: a full-array sequential read is NOT faster sharded (same bytes, one un-parallelizable object) — the win is object-count (upload + listing) and windowed/partial cloud reads, not full-read throughput. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * refactor(s1-rtc): trim the conditions-sharding comment to match vv/vh style Comment-only. The surrounding vv/vh sharding has no inline explainer; the long cloud-access rationale now lives in claude-docs/specs/s1_gamma_area_sharding.md. Keep only the non-obvious bits: why one shard, and the Zarr v3 shard-divisibility invariant. No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * chore(s1-rtc): drop the claude-docs spec from this PR The data-model repo has no claude-docs/specs convention; the spec was noise for this PR's reviewers. The problem statement, real-S3 benchmark and migration note live in the data-pipeline plan + tracking issue EOPF-Explorer/data-pipeline#288 and PR #197's description. Also drop the now-dangling spec path from the code comment (the rationale stays inline). No behavior change (20 condition/shard tests green). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011LsWkVvRfkRzjqAMrzfmRP * fix(s1-rtc): heal a multiscale level missing `time` on append (robust writer) `ingest_s1tiling_acquisition` resized `level["time"]` on every multiscale level, assuming a fresh build created `time` at each level (#192). A cube built before #192 -- or left half-built by an interrupted append -- can carry `r10m/time` yet lack it at a coarser level, so the resize raised `KeyError: 'time'` and, because the append consistency check validated only CRS + shape, the ingest was non-convergent (observed on 30TWM). Before the per-level write loop, recreate any missing-level `time` from `r10m/time` (backfilling the existing slices so prior timestamps are preserved), or raise a clear error when the cube is inconsistent in a way a backfill cannot fix (a level's length disagrees with `r10m/time`, or `r10m` has slices but no `time`). This is the durable upstream counterpart to the data-pipeline guard (data-pipeline #294), making that orchestration-side mitigation belt-and-suspenders. Tests: 4 new cases (heal, no-op when healthy, raise on half-built, raise on missing r10m/time); full s1_ingest suite 61 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01V3qS75byrUuCSHFqcWi26B --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

lhoupert and others added 3 commits June 19, 2026 09:19

This was referenced Jun 19, 2026

perf(s1-ingest): incremental + concurrent S3 transfer for the RTC cube EOPF-Explorer/data-pipeline#287

Merged

S1 RTC ingest is I/O-bound: ~34-min S3 "Uploading store" stall per cube EOPF-Explorer/data-pipeline#288

Open

lhoupert requested a review from emmanuelmathot June 19, 2026 09:08

d-v-b approved these changes Jun 19, 2026

View reviewed changes

lhoupert merged commit 817e2b9 into feat/s1-rtc-stac-builder Jun 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(s1-rtc): shard the conditions arrays (gamma_area/LIA) like the vv/vh pyramid#197

perf(s1-rtc): shard the conditions arrays (gamma_area/LIA) like the vv/vh pyramid#197
lhoupert merged 4 commits into
feat/s1-rtc-stac-builderfrom
feat/s1-gamma-area-sharding

lhoupert commented Jun 19, 2026 •

edited

Loading

Uh oh!

lhoupert commented Jun 19, 2026

Uh oh!

lhoupert commented Jun 19, 2026

Uh oh!

d-v-b commented Jun 19, 2026

Uh oh!

lhoupert commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lhoupert commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Validation (real OVH S3, laptop→DE)

Tests

Migration

Uh oh!

lhoupert commented Jun 19, 2026

Uh oh!

lhoupert commented Jun 19, 2026

Uh oh!

d-v-b commented Jun 19, 2026

Uh oh!

lhoupert commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lhoupert commented Jun 19, 2026 •

edited

Loading