diff --git a/dev-docs/state-of-zarrista.md b/dev-docs/state-of-zarrista.md new file mode 100644 index 0000000..eb77085 --- /dev/null +++ b/dev-docs/state-of-zarrista.md @@ -0,0 +1,198 @@ +# State of Zarrista + +_Prepared for the Zarr prioritization call — 2026-06-23. One week of work._ + +## TL;DR + +Zarrista is a small, Python-first Zarr library that binds the Rust **`zarrs`** +crate directly via PyO3. In ~one week it has gone from nothing to a working, +type-hinted package that can: + +- **Read** Zarr v3 arrays and groups (sync + async), with NumPy / Arrow / DLPack + zero-copy exchange, validated round-trip against zarr-python. +- **Create** arrays via a fluent `ArrayBuilder`, and **write** individual chunks. +- Talk to **local FS, object stores (S3/GCS/Azure via obstore), and Icechunk**. + +It is explicitly an **evaluation prototype** — not production-ready, and not yet +benchmarked. The open question for this call is whether to invest further. + +This document is a factual snapshot of what exists today, derived directly from +the source tree (`python/zarrista/*.pyi`, `src/`, `tests/`) at commit `f7af44f`. + +--- + +## What is zarrista, and how does it relate to zarrs-python? + +There are two ways to put `zarrs` under Python: + +| | **zarrs-python** (existing) | **zarrista** (this project) | +|---|---|---| +| Goal | Drop-in accelerator for **zarr-python**'s codec pipeline | A standalone, low-level Zarr API in the shape of zarrita.js | +| Surface | Implements zarr-python's store/codec hooks | Its own `Array` / `Group` / `ArrayBuilder` classes | +| Audience | Existing zarr-python users (transparent) | New callers wanting a thin, explicit, typed Zarr API | +| Maturity | Established | One week old, prototype | + +These are **complementary, not competing**. zarrs-python makes today's +zarr-python faster from the inside; zarrista explores what a clean, Rust-native +Python Zarr API could look like, and whether zarr-python could one day lean on it +for object modelling (not just codecs). Both sit on the same `zarrs` core. + +--- + +## Could zarr-python depend on this? + +**Not today, but the architecture is deliberately compatible.** Relevant facts: + +- **Pure Zarr v3, spec-faithful metadata.** `ArrayBuilder.create_metadata()` + emits standard v3 JSON; arrays written by zarrista are read back correctly by + **zarr-python**, and vice versa (verified both directions — see the notebook + and `tests/test_indexing.py`). +- **Stores are pluggable at the Rust boundary**, but there is currently **no + Python-side store protocol** — you cannot hand zarrista an arbitrary + zarr-python store. It accepts concrete stores only (filesystem, memory, + obstore, Icechunk). This is the single biggest blocker to zarr-python depending + on it, and it is not yet designed. +- **Read coverage is solid; write coverage is minimal** (single chunks only — no + multi-chunk region writes, no group creation). zarr-python would need both. + +**Realistic dependency paths**, in increasing ambition: +1. zarr-python uses zarrista's **metadata / codec** machinery as a fast helper. +2. zarr-python delegates **whole-array read/write** to zarrista behind its store + abstraction (requires the Python store protocol + region writes). +3. zarr-python adopts zarrista objects directly (largest change; furthest off). + +None of these are close yet, but nothing in the design precludes them. + +--- + +## Expected performance gains + +**Honest status: not yet benchmarked.** No numbers should be quoted in the call. +What we *can* say is where gains are structurally expected, because the work moves +out of Python and into `zarrs`: + +- **Codec pipeline in native code.** Decompression, sharding, transpose, etc. run + in Rust with Rayon-based parallelism (`concurrent_target` / + `chunk_concurrent_minimum` are exposed as codec options), instead of per-chunk + Python overhead. This is the same thesis that motivates zarrs-python's measured + speedups. +- **Zero-copy hand-off.** Decoded buffers cross into NumPy (buffer protocol / + `np.frombuffer`), Arrow (C Data Interface), and DLPack **without a copy**. The + Rust allocation *is* the array's backing memory. +- **No Python-level chunk orchestration** for reads — selection → chunk fetch → + decode → assemble happens once, in Rust. + +**Recommendation:** the highest-value next deliverable for *this* audience is a +small, reproducible benchmark (zarrista vs zarr-python vs zarrs-python on a +representative read) so the perf claim is evidence, not architecture. + +--- + +## How much of the zarrs API have we wrapped? + +Coverage by area (✅ done · ⚠️ partial · ❌ absent). Method names below are the +**actual live surface** at `f7af44f`. + +### Reading — ✅ mature +- `Array.open` / `AsyncArray.open_async` +- `arr[selection]` and `retrieve_array_subset(selection)` — NumPy-style **basic** + indexing (int, step-1 slice, ellipsis, negative indices). `step != 1`, newaxis, + boolean/fancy indexing raise `NotImplementedError`/`IndexError`. +- `retrieve_chunk(idx)`, `retrieve_encoded_chunk(idx)` (pre-codec bytes) +- Metadata accessors: `shape`, `ndim`, `dtype`, `attrs`, `dimension_names`, + `chunk_grid`, `filters`, `serializer`, `compressors`, `metadata` + +### Groups — ✅ read · ❌ write +- `Group.open` / `AsyncGroup.open_async` +- `array_keys`, `group_keys`, `traverse`, `child_arrays`, `child_groups`, + `child_paths` (+ array/group variants), `child(name)` / `grp[name]` +- `attrs`, `metadata`, `consolidated_metadata` (read; consolidated metadata is + **read-only** — not written) +- `store_metadata` / `erase_metadata` exist, but there is **no group creation + builder**. + +### Creating arrays — ✅ via `ArrayBuilder` +- Immutable, chainable: `shape`, `chunk_grid`, `chunk_key_encoding`, `data_type`, + `dimension_names`, `attrs`, `filters`, `compressors`, `serializer`, + `subchunk_shape` (enables sharding), plus `like(array)` to clone config. +- Materialize: `create` / `create_async` (auto-writes metadata, commit `f7af44f`) + and `create_metadata()` (metadata only, no store touch). + +### Writing data — ⚠️ minimal +- `store_chunk(idx, ArrayBytes)`, `store_encoded_chunk(idx, bytes)`, + `compact_chunk(idx)`, `erase_chunk(idx)`, `erase_metadata()` (sync + async). +- **Absent:** multi-chunk region writes (`store_array_subset`), array resize, + in-place attribute/metadata updates, partial encoding. + +### Codecs — ⚠️ thin but extensible +- Convenience constructors: `transpose`, `bitround` (array→array); `gzip`, `zstd`, + `blosc`, `crc32c` (bytes→bytes); sharding via `serializer`/`subchunk_shape`. +- **Any** zarrs codec is still usable via `from_config({...v3 metadata...})`; + most simply lack a typed Python constructor. + +### Data types — ✅ read path +- All v3 fixed/variable dtypes decode; result dispatches to `Tensor`, + `VariableArray`, `MaskedTensor`, or `MaskedVariableArray`. +- `MaskedTensor` / `MaskedVariableArray` carry data but **do not yet expose + `to_numpy()`**. Complex dtypes not surfaced to NumPy. + +### Stores — ✅ broad +- Sync: `FilesystemStore`, `MemoryStore`. +- Async: obstore (`ObjectStore` → S3/GCS/Azure/local/HTTP), Icechunk (`Session`). +- **Absent:** Python-side custom store protocol. + +### Data exchange — ✅ multiple zero-copy faces +- Buffer protocol + `Tensor.to_numpy()`; Arrow C Data Interface on `Tensor` / + `VariableArray`; DLPack on `Tensor`. + +--- + +## How much *more* should we wrap? (suggested priorities) + +Ordered by leverage for the stated goals (zarr-python interop + write workloads): + +1. **Benchmark harness** — turn the perf thesis into measured numbers. _(Small, + high signal for funding conversations.)_ +2. **Multi-chunk region writes** (`store_array_subset`) — the obvious missing half + of the write story; today only single chunks can be written. +3. **Python store protocol** — the gateway to any real zarr-python integration. +4. **Group creation / metadata writes** — needed for end-to-end dataset authoring. +5. **`to_numpy()` for masked/variable results** — completes the read story for + string/variable and nullable data. +6. **Codec breadth** — typed constructors for the rest of the zarrs codec set + (and consolidated-metadata writing). + +Lower priority / known debt (from `src/` TODOs): selection parsing cleanup +(`src/array/selection.rs`), richer node `Path` type, `Node` as a full `#[pyclass]`, +DLPack version pin on `dlpark`. + +--- + +## Engineering signals + +- **Type-driven, "parse-don't-validate"** design at the PyO3 boundary (see + `CLAUDE.md`): inputs are parsed into already-valid typed forms; no scattered + runtime validation. +- **Full type hints** (`.pyi` stubs) for the entire surface; `py.typed` shipped. +- **Tests** round-trip against zarr-python, pyarrow, and Icechunk + (`tests/test_indexing.py`, `test_builder.py`, `test_group.py`, `test_arrow.py`, + `test_icechunk.py`, `test_exceptions.py`, `test_store_input.py`). +- **Structured exception hierarchy** under `zarrista.exceptions` (13 classes). +- **Docs site** (mkdocs-material + mkdocstrings) published with versioning (mike). + This audit added the previously-missing reference pages (builder, exceptions, + `ChunkKeyEncoding`, `FillValue`). +- **Distribution**: single abi3 wheel, Python 3.11+; CI builds wheels + runs tests. + +--- + +## Bottom line for the call + +- **Progress in one week is substantial**: a coherent, typed, sync+async Zarr + read API with zero-copy exchange, plus array creation and chunk writes, all + interoperable with zarr-python. +- **It is a prototype**: write support is minimal, there's no Python store + protocol, and there are **no benchmarks yet** — so the central "is it faster?" + claim is currently unproven. +- **The cheapest, most decision-relevant next step is a benchmark**; the most + strategically important is the Python store protocol if zarr-python integration + is the goal. diff --git a/docs/notebooks/zarrista-tour.ipynb b/docs/notebooks/zarrista-tour.ipynb new file mode 100644 index 0000000..d027d5b --- /dev/null +++ b/docs/notebooks/zarrista-tour.ipynb @@ -0,0 +1,724 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6b044e43", + "metadata": {}, + "source": [ + "# A tour of zarrista (for zarr-python users)\n", + "\n", + "[**zarrista**](https://github.com/developmentseed/zarrista) is a small, low-level Zarr API for Python, powered from Rust by [**zarrs**](https://zarrs.dev/) via PyO3. It is inspired by [zarrita.js](https://zarrita.dev/): explicit, thin, and typed.\n", + "\n", + "This notebook walks through the **current** zarrista API and maps each piece onto the equivalent **zarr-python** concept. Throughout, we use zarr-python and zarrista side by side on the **same data on disk** to show they are interoperable.\n", + "\n", + "**Scope of this notebook:** the **synchronous** API on a **local filesystem** store only. zarrista also has a full `async` API (`AsyncArray` / `AsyncGroup`) and object-store / Icechunk backends, which are not covered here.\n", + "\n", + "> zarrista is an **evaluation prototype**, not production-ready. Some APIs (notably writing) are intentionally minimal." + ] + }, + { + "cell_type": "markdown", + "id": "64c8fbdd", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "We import NumPy, zarr-python (for comparison / fixtures), and the zarrista names we'll use. Everything runs in a throwaway temporary directory." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "40a0b27a", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.405579Z", + "iopub.status.busy": "2026-06-23T21:53:02.405401Z", + "iopub.status.idle": "2026-06-23T21:53:02.606494Z", + "shell.execute_reply": "2026-06-23T21:53:02.606079Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "zarr-python: 3.2.1\n", + "working dir: /var/folders/42/5jr6891d4ds4xysz7q0rsghw0000gn/T/tmpgqk4gxa8\n" + ] + } + ], + "source": [ + "import tempfile\n", + "from pathlib import Path\n", + "\n", + "import numpy as np\n", + "import zarr # zarr-python, for comparison and to write fixtures\n", + "\n", + "from zarrista import (\n", + " Array,\n", + " ArrayBuilder,\n", + " ArrayBytes,\n", + " ChunkGrid,\n", + " DataType,\n", + " FillValue,\n", + " FilesystemStore,\n", + " Group,\n", + " MemoryStore,\n", + " codec,\n", + ")\n", + "\n", + "tmp = Path(tempfile.mkdtemp())\n", + "print(\"zarr-python:\", zarr.__version__)\n", + "print(\"working dir:\", tmp)" + ] + }, + { + "cell_type": "markdown", + "id": "f8d5c64e", + "metadata": {}, + "source": [ + "## 1. Opening an array written by zarr-python\n", + "\n", + "First we write a Zarr v3 array with **zarr-python**, exactly as a user normally would. zarrista will open the *same bytes on disk*." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "e058ecc2", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.607622Z", + "iopub.status.busy": "2026-06-23T21:53:02.607516Z", + "iopub.status.idle": "2026-06-23T21:53:02.622359Z", + "shell.execute_reply": "2026-06-23T21:53:02.622052Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "path = tmp / \"temperature.zarr\"\n", + "data = np.arange(9 * 64 * 100, dtype=\"int32\").reshape(9, 64, 100)\n", + "\n", + "z = zarr.create_array(\n", + " store=str(path),\n", + " shape=data.shape,\n", + " chunks=(3, 16, 50),\n", + " dtype=data.dtype,\n", + ")\n", + "z[:] = data\n", + "z" + ] + }, + { + "cell_type": "markdown", + "id": "386ffdc2", + "metadata": {}, + "source": [ + "Now open it with zarrista. Where zarr-python uses `zarr.open_array(store)`, zarrista uses `Array.open(store)`, and the store is an explicit `FilesystemStore`." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "93bebed0", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.623449Z", + "iopub.status.busy": "2026-06-23T21:53:02.623379Z", + "iopub.status.idle": "2026-06-23T21:53:02.655158Z", + "shell.execute_reply": "2026-06-23T21:53:02.654786Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "shape: [9, 64, 100]\n", + "ndim: 3\n", + "dtype: DataType(int32 / reinterpret with the known dtype and shape\n", + "flat = np.frombuffer(result.buffer(), dtype=\"int32\").reshape(result.shape)\n", + "print(\"from buffer matches:\", np.array_equal(flat, expected))\n", + "\n", + "# DLPack carries the shape, so it comes back N-D directly\n", + "via_dlpack = np.from_dlpack(result)\n", + "print(\n", + " \"from_dlpack shape:\",\n", + " via_dlpack.shape,\n", + " \"matches:\",\n", + " np.array_equal(via_dlpack, expected),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "65631809", + "metadata": {}, + "source": [ + "### Reading individual chunks\n", + "\n", + "Because zarrista is low-level, you can address the chunk grid directly — there is no exact zarr-python public equivalent for this. `retrieve_chunk` decodes a chunk; `retrieve_encoded_chunk` hands back the raw stored (pre-codec) bytes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04050567", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.677054Z", + "iopub.status.busy": "2026-06-23T21:53:02.676990Z", + "iopub.status.idle": "2026-06-23T21:53:02.679199Z", + "shell.execute_reply": "2026-06-23T21:53:02.678837Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chunk shape: [3, 16, 50]\n", + "matches data[0:3, 0:16, 0:50]: True\n", + "encoded chunk: 5320 bytes\n" + ] + } + ], + "source": [ + "chunk = arr.retrieve_chunk([0, 0, 0]) # the chunk at grid index (0, 0, 0)\n", + "print(\"chunk shape:\", chunk.shape)\n", + "print(\n", + " \"matches data[0:3, 0:16, 0:50]:\",\n", + " np.array_equal(chunk.to_numpy(), data[0:3, 0:16, 0:50]),\n", + ")\n", + "\n", + "encoded = arr.retrieve_encoded_chunk([0, 0, 0]) # raw stored bytes\n", + "print(\"encoded chunk: %d bytes\" % len(bytes(encoded)))" + ] + }, + { + "cell_type": "markdown", + "id": "148927b9", + "metadata": {}, + "source": [ + "## 3. Inspecting metadata and codecs\n", + "\n", + "zarrista exposes the codec pipeline directly, split into the three Zarr v3 roles: **filters** (array→array), the **serializer** (array→bytes), and **compressors** (bytes→bytes)." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "7e5443fb", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.680223Z", + "iopub.status.busy": "2026-06-23T21:53:02.680110Z", + "iopub.status.idle": "2026-06-23T21:53:02.685257Z", + "shell.execute_reply": "2026-06-23T21:53:02.684892Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chunk_grid.grid_shape: [3, 4, 2]\n", + "chunk_grid.array_shape: [9, 64, 100]\n", + "filters: []\n", + "serializer: bytes\n", + "compressors: ['zstd']\n", + "\n", + "full v3 metadata:\n" + ] + }, + { + "data": { + "text/plain": [ + "{'zarr_format': 3,\n", + " 'node_type': 'array',\n", + " 'shape': [9, 64, 100],\n", + " 'data_type': 'int32',\n", + " 'chunk_grid': {'name': 'regular',\n", + " 'configuration': {'chunk_shape': [3, 16, 50]}},\n", + " 'chunk_key_encoding': {'name': 'default',\n", + " 'configuration': {'separator': '/'}},\n", + " 'fill_value': 0,\n", + " 'codecs': [{'name': 'bytes', 'configuration': {'endian': 'little'}},\n", + " {'name': 'zstd', 'configuration': {'level': 0, 'checksum': False}}]}" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print(\"chunk_grid.grid_shape:\", arr.chunk_grid.grid_shape)\n", + "print(\"chunk_grid.array_shape:\", arr.chunk_grid.array_shape)\n", + "print(\"filters: \", [c.name for c in arr.filters])\n", + "print(\"serializer: \", arr.serializer.name)\n", + "print(\"compressors:\", [c.name for c in arr.compressors])\n", + "print(\"\\nfull v3 metadata:\")\n", + "arr.metadata" + ] + }, + { + "cell_type": "markdown", + "id": "e6d3f332", + "metadata": {}, + "source": [ + "## 4. Creating an array with `ArrayBuilder`\n", + "\n", + "Where zarr-python has `zarr.create_array(...)`, zarrista uses an **immutable, chained builder**. Every setter returns a *new* builder, so configuration is explicit and reusable. The required pieces — chunk grid, data type, fill value — are constructor arguments; everything else is an optional setter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59cdc4e0", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.686204Z", + "iopub.status.busy": "2026-06-23T21:53:02.686135Z", + "iopub.status.idle": "2026-06-23T21:53:02.691800Z", + "shell.execute_reply": "2026-06-23T21:53:02.691495Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "created: [8, 8] DataType(int32 / sharding serializer\n", + " .create_metadata()\n", + ")\n", + "print(\"serializer is:\", sharded_meta[\"codecs\"][0][\"name\"])" + ] + }, + { + "cell_type": "markdown", + "id": "89985918", + "metadata": {}, + "source": [ + "## 5. Groups\n", + "\n", + "We build a small group hierarchy with **zarr-python**, then navigate it with zarrista's `Group`." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d853e97f", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.706729Z", + "iopub.status.busy": "2026-06-23T21:53:02.706664Z", + "iopub.status.idle": "2026-06-23T21:53:02.721660Z", + "shell.execute_reply": "2026-06-23T21:53:02.721316Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "array_keys: ['temperature']\n", + "group_keys: ['diagnostics']\n", + "traverse: ['Array', 'Group', 'Array']\n", + "g['temperature'] -> Array [4, 4]\n" + ] + } + ], + "source": [ + "gpath = tmp / \"dataset.zarr\"\n", + "root = zarr.open_group(str(gpath), mode=\"w\")\n", + "root.create_array(\"temperature\", shape=(4, 4), chunks=(2, 2), dtype=\"float32\")\n", + "nested = root.create_group(\"diagnostics\")\n", + "nested.create_array(\"pressure\", shape=(2,), chunks=(2,), dtype=\"float64\")\n", + "\n", + "g = Group.open(FilesystemStore(gpath))\n", + "print(\"array_keys:\", g.array_keys())\n", + "print(\"group_keys:\", g.group_keys())\n", + "print(\"traverse: \", [type(n).__name__ for n in g.traverse()])\n", + "\n", + "# Index into a group like zarr-python: g[name] -> Array or Group\n", + "child = g[\"temperature\"]\n", + "print(\"g['temperature'] ->\", type(child).__name__, child.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "c408ee6a", + "metadata": {}, + "source": [ + "| concept | zarr-python | zarrista |\n", + "|---|---|---|\n", + "| open group | `zarr.open_group(store)` | `Group.open(store)` |\n", + "| child arrays | `list(g.array_keys())` | `g.array_keys()` |\n", + "| child groups | `list(g.group_keys())` | `g.group_keys()` |\n", + "| access child | `g[name]` | `g[name]` / `g.child(name)` |\n", + "| recurse | `g.members(...)` | `g.traverse()` |\n", + "\n", + "Group reading is complete; group **creation** is not yet implemented in zarrista." + ] + }, + { + "cell_type": "markdown", + "id": "8b05d5ed", + "metadata": {}, + "source": [ + "## 6. The in-memory store\n", + "\n", + "`MemoryStore` is a drop-in store for tests and scratch work — no filesystem involved." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "975527f2", + "metadata": { + "execution": { + "iopub.execute_input": "2026-06-23T21:53:02.722755Z", + "iopub.status.busy": "2026-06-23T21:53:02.722690Z", + "iopub.status.idle": "2026-06-23T21:53:02.724839Z", + "shell.execute_reply": "2026-06-23T21:53:02.724446Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "in-memory array: [2, 2] DataType(uint8 / |u1)\n" + ] + } + ], + "source": [ + "mem = ArrayBuilder(\n", + " ChunkGrid.regular([2, 2], [2, 2]),\n", + " DataType.from_string(\"uint8\"),\n", + " FillValue(b\"\\x00\"),\n", + ").create(MemoryStore(), \"/scratch\")\n", + "print(\"in-memory array:\", mem.shape, mem.dtype)" + ] + }, + { + "cell_type": "markdown", + "id": "d527411d", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "What this notebook exercised, all interoperable with zarr-python on the same files:\n", + "\n", + "- **Read**: `Array.open`, `arr[...]` / `retrieve_array_subset`, `retrieve_chunk`, `retrieve_encoded_chunk`, with zero-copy NumPy / buffer / DLPack exchange.\n", + "- **Inspect**: `shape`, `dtype`, `dimension_names`, `chunk_grid`, `filters` / `serializer` / `compressors`, full v3 `metadata`.\n", + "- **Create + write**: `ArrayBuilder` → `create`, `store_chunk` (chunk-level), sharding via `subchunk_shape`.\n", + "- **Groups**: `Group.open`, `array_keys` / `group_keys` / `traverse`, `g[name]`.\n", + "- **Stores**: `FilesystemStore`, `MemoryStore`.\n", + "\n", + "**Not covered here** (but present or planned): the full `async` API (`AsyncArray` / `AsyncGroup`), object-store and Icechunk backends, variable-length / masked result types' `to_numpy()`, multi-chunk region writes, and group creation." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}