Skip to content

perf/direct zarr io#92

Open
d-v-b wants to merge 69 commits into
EOPF-Explorer:mainfrom
d-v-b:perf/direct-zarr-io
Open

perf/direct zarr io#92
d-v-b wants to merge 69 commits into
EOPF-Explorer:mainfrom
d-v-b:perf/direct-zarr-io

Conversation

@d-v-b

@d-v-b d-v-b commented Dec 8, 2025

Copy link
Copy Markdown
Contributor
  • eagerly compute multiscales
  • directly copy chunk bytes and metadata documents

@codecov-commenter

codecov-commenter commented Dec 11, 2025

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 92.85714% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/eopf_geozarr/s2_optimization/s2_multiscale.py 88.88% 7 Missing ⚠️
src/eopf_geozarr/zarrio.py 93.54% 6 Missing ⚠️
src/eopf_geozarr/cli.py 0.00% 4 Missing ⚠️
src/eopf_geozarr/s2_optimization/s2_converter.py 97.64% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread src/eopf_geozarr/cli.py
Comment on lines +1162 to +1164
s2_parser.add_argument(
"--omit-nodes", help="The names of groups or arrays to skip.", default="", type=str
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this argument solves #81. You would pass --omit-nodes "quality/l2a_quicklook" to omit that group

cc @emmanuelmathot

@d-v-b

d-v-b commented Dec 17, 2025

Copy link
Copy Markdown
Contributor Author

@emmanuelmathot the redundant multiscales calculation is now fixed, and the chunk sizes / sharding are now consistent with the design goal (use as few objects as possible).

On my local system, using dask for rechunking was much slower than what I am currently doing (plain assignment via zarr python).

@d-v-b

d-v-b commented Dec 17, 2025

Copy link
Copy Markdown
Contributor Author

since we are re-encoding the zarr groups here, I can also handle the NaN conversion in this branch, unless that's better in a separate branch @emmanuelmathot

@d-v-b

d-v-b commented Dec 17, 2025

Copy link
Copy Markdown
Contributor Author

with a1375b7 we have an option (defaulting to false) of allowing invalid values (nan and inf) in the output. When set to false (the default), any NaN or inf or -inf values in attributes field are replaced with string equivalents.

@emmanuelmathot

Copy link
Copy Markdown
Contributor

Just tested the last version of this PR: https://api.explorer.eopf.copernicus.eu/raster/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20251115T091139_N0511_R050_T35SLU_20251115T111807/viewer
but multiscales are still missing at /measurements/reflectance group

@d-v-b

d-v-b commented Jan 5, 2026

Copy link
Copy Markdown
Contributor Author

i'll have a look later today!

@emmanuelmathot

Copy link
Copy Markdown
Contributor

@d-v-b it seems multiscales are working now but can you tell me wwhat is the fill value? It seems it has been changes (black part that should be transparent)
image

@emmanuelmathot

Copy link
Copy Markdown
Contributor

seems also that we are missing the fixedscaleoffset codec declaration. I dont understand how xarray resolves correctly the data values.
Example:
zarr.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants