Add opt-in Zarr v3 sharding via shardShape -> sharding_indexed codec by blasscoc · Pull Request #171 · TGSAI/mdio-cpp

blasscoc · 2026-06-29T19:50:29Z

MDIO v3 datasets write one storage object per chunk, which forces a single trade-off: small chunks give fine read granularity but produce many tiny objects (painful on object stores), while large chunks give few big objects but coarsen every read. Zarr v3 sharding resolves this by packing many small inner chunks into one large shard (the storage object) behind a per-shard index, so reads still fetch only the inner chunks they need.

This exposes that as an opt-in: a RegularChunkShape may now carry an optional shardShape alongside chunkShape. When present, the v3 codec pipeline already built for the variable is nested inside a sharding_indexed codec (inner chunk_shape = chunkShape, index_codecs = [bytes, crc32c], index_location end), and the array-level chunk_grid is rewritten to the shard. chunkShape becomes the inner read chunk; shardShape becomes the storage object.

Fully backward compatible: with no shardShape the metadata is unchanged (today's one-chunk-per-object behavior). shardShape must be a positive integer multiple of chunkShape on every axis, validated at spec-build time.

MDIO v3 datasets write one storage object per chunk, which forces a single trade-off: small chunks give fine read granularity but produce many tiny objects (painful on object stores), while large chunks give few big objects but coarsen every read. Zarr v3 sharding resolves this by packing many small inner chunks into one large shard (the storage object) behind a per-shard index, so reads still fetch only the inner chunks they need. This exposes that as an opt-in: a RegularChunkShape may now carry an optional shardShape alongside chunkShape. When present, the v3 codec pipeline already built for the variable is nested inside a sharding_indexed codec (inner chunk_shape = chunkShape, index_codecs = [bytes, crc32c], index_location end), and the array-level chunk_grid is rewritten to the shard. chunkShape becomes the inner read chunk; shardShape becomes the storage object. Fully backward compatible: with no shardShape the metadata is unchanged (today's one-chunk-per-object behavior). shardShape must be a positive integer multiple of chunkShape on every axis, validated at spec-build time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add opt-in Zarr v3 sharding via shardShape -> sharding_indexed codec#171

Add opt-in Zarr v3 sharding via shardShape -> sharding_indexed codec#171
blasscoc wants to merge 1 commit into
mainfrom
feat/zarr3-sharding

blasscoc commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

blasscoc commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants