Skip to content

bench: tensorstore CPU vs damacy GPU read+decode comparison#155

Merged
nclack merged 4 commits into
mainfrom
bench-tensorstore-comparison
Jun 15, 2026
Merged

bench: tensorstore CPU vs damacy GPU read+decode comparison#155
nclack merged 4 commits into
mainfrom
bench-tensorstore-comparison

Conversation

@nclack

@nclack nclack commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Adds bench/tensorstore_bench.py, a scenario-driven CPU read+decode benchmark
using tensorstore, so the GPU (damacy) vs CPU (tensorstore) tradeoff for zarr v3
read+decode can be measured on identical data, chunking, and patch sampling.

  • Consumes the existing Scenario JSON schema via bench/scenario.py (same as
    run.py), handling both the synthetic path (uris is None, arrays enumerated
    from uri_fmt/array_path/n_zarrs) and explicit uris.
  • Opens arrays with tensorstore's zarr3 driver; skips wrong-rank/too-small arrays
    with the same filtering as the bench so array counts line up.
  • Ports bench/main.c's xorshift64* RNG and draw order (array index, then
    per-axis start), so with a shared seed the sampled patches match damacy's
    bit-for-bit and both read the same bytes.
  • --threads concurrency sweep (default 1,2,4,8,16,32) via tensorstore context
    limits + a bounded in-flight read window, reporting samples/s and GB/s per
    thread count and the best point — the CPU thread pool is the real comparison
    point against a single GPU decode stream.
  • --drop-cache mirrors run.py's page-cache drop for cold-read measurement;
    --compare-with <damacy results.json> prints a head-to-head line.
  • Emits a table plus a JSON summary shaped like the existing bench output.

tensorstore fuses read+decode, so only total throughput is reported (no per-stage
split). Smoke-tested on a synthetic scenario and a uris scenario on a login node;
the full cold sweep runs on a compute node.

Closes #153.

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.73%. Comparing base (17239ae) to head (537d29b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #155   +/-   ##
=======================================
  Coverage   57.72%   57.73%           
=======================================
  Files          64       64           
  Lines       10055    10055           
  Branches     1750     1750           
=======================================
+ Hits         5804     5805    +1     
+ Misses       3501     3500    -1     
  Partials      750      750           
Flag Coverage Δ
unittests 57.73% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nclack added a commit that referenced this pull request Jun 13, 2026
Small follow-up rolling up the remaining working-tree changes left out
of #155.

Two independent changes, one commit each:

- **bench: per-stage in/out throughput + load** — `bench/report.py`'s
stage table
now reports `GB/s_out` and `load%` (stage `ms_total` / wall) alongside
the
existing `GB/s_in`, so each pipeline stage shows input and output
throughput
and its share of the wall — making it obvious which stage bounds a run.
- **chore: pixi workspace config** — `[tool.pixi.*]`
workspace/environments in
`pyproject.toml`, `.pixi/*` ignored in `.gitignore`, and `pixi.lock`
merge
  attributes in `.gitattributes`.

---------

Co-authored-by: Nathan Clack <nclack@biohub.org>
github-actions Bot added a commit that referenced this pull request Jun 13, 2026
Small follow-up rolling up the remaining working-tree changes left out
of #155.

Two independent changes, one commit each:

- **bench: per-stage in/out throughput + load** — `bench/report.py`'s
stage table
now reports `GB/s_out` and `load%` (stage `ms_total` / wall) alongside
the
existing `GB/s_in`, so each pipeline stage shows input and output
throughput
and its share of the wall — making it obvious which stage bounds a run.
- **chore: pixi workspace config** — `[tool.pixi.*]`
workspace/environments in
`pyproject.toml`, `.pixi/*` ignored in `.gitignore`, and `pixi.lock`
merge
  attributes in `.gitattributes`.

---------

Co-authored-by: Nathan Clack <nclack@biohub.org> 96d6be8
@nclack nclack merged commit dee3e66 into main Jun 15, 2026
6 checks passed
@nclack nclack deleted the bench-tensorstore-comparison branch June 15, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bench: comparative read-throughput benchmark, damacy (GPU) vs tensorstore (CPU)

1 participant