improves dataloader performance by giovp · Pull Request #687 · scverse/spatialdata

giovp · 2024-08-21T18:48:08Z

iteration over 20 batches, single worker

new implementation

main

one annoying thing is that the "apply" method of the dataframe to get the bounding box selection is quite slow.

giovp · 2024-09-03T00:05:32Z

quick push to try #699 where tiling is vectorized, removed the need for pandas.DataFrame.apply. Quite big speedup

**Bugs fixed in datasets.py:** - rasterize=True path was broken: __getitem__ always called image.sel() regardless of rasterize flag, bypassing rasterize_fn entirely. Fixed by storing self._rasterize and branching in __getitem__. - ad.concat(*tables_l) unpacked the list as positional args, failing with >1 region. Fixed to ad.concat(tables_l). - Vectorized selection pre-computation was always run even for rasterize=True where it is unused. Fixed by guarding with `if not rasterize`. - Removed stale commented-out pandas.apply fallback code. **Fixes in _utils.py:** - Removed redundant nopython=True from @nb.njit (njit implies nopython=True, and the argument caused a RuntimeWarning). - Replaced invalid nb.types.Array[nb.float64, nb.float64] annotations with np.ndarray. **Fixes in spatial_query.py:** - Restored BoundingBoxRequest validation that was commented out. The validator's __post_init__ already handles both 1-D (single box) and 2-D (multi-box) arrays. **Benchmark (benchmark_dataloader.py):** Synthetic 2048x2048 image, 500 circle regions (32 px radius), 3-channel. Phase main PR (fixed) speedup init ~162 ms ~20 ms ~8x fetch 500 ~618 ms ~118 ms ~5x per-tile ~1237 us ~235 us ~5x Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

LucaMarconato · 2026-05-21T11:40:11Z

I picked this up, performance indeed improves significantly with the new vectorized bounding box approach. Thanks @giovp

asv benchmarks result:

| Change   | Before [accf496c] <main>   | After [e87d3183] <giovp/dataloader3>   |   Ratio | Benchmark (Parameter)                          |
|----------|----------------------------|----------------------------------------|---------|------------------------------------------------|
| -        | 618±20ms                   | 123±1ms                                |    0.2  | benchmark_dataloader.TimeDataloader.time_fetch |
| -        | 169±20ms                   | 18.7±0.1ms                             |    0.11 | benchmark_dataloader.TimeDataloader.time_init  |

codecov · 2026-05-21T11:42:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.55%. Comparing base (accf496) to head (a51cb95).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #687      +/-   ##
==========================================
+ Coverage   92.28%   92.55%   +0.26%     
==========================================
  Files          51       51              
  Lines        7804     7763      -41     
==========================================
- Hits         7202     7185      -17     
+ Misses        602      578      -24

Files with missing lines	Coverage Δ
src/spatialdata/_core/query/_utils.py	`93.26% <100.00%> (ø)`
src/spatialdata/_core/query/spatial_query.py	`95.52% <ø> (ø)`
src/spatialdata/dataloader/datasets.py	`90.52% <100.00%> (+0.27%)`	⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

giovp added 14 commits August 21, 2024 20:39

implement selection

40326e9

update

aa339aa

Merge branch 'main' into giovp/dataloader3

4ac406e

vectorize adjust_bounding_box_to_real_axes

92d578f

update

2bb5c35

replace append with insert

c89dcdf

add comment

5bf0b43

vectorize

a60bf6f

update to handle multiple boxes

017967b

vectorize with numba

ab774b7

Merge branch 'giovp/parallel-transform' into giovp/dataloader3

804b30a

fix corner len

38dba25

Merge branch 'giovp/parallel-transform' into giovp/dataloader3

df80902

update

b27607e

giovp added 2 commits September 2, 2024 17:16

fix validation

a934e21

Merge branch 'giovp/parallel-transform' into giovp/dataloader3

5bdd9df

This was referenced Sep 3, 2024

vectorize bounding box query #699

Merged

improve data loader performance #565

Closed

improves dataloader performance #622

Closed

giovp and others added 10 commits September 3, 2024 14:26

refactor

77f73f4

refactor

3adfea8

add test for query with multiple bounding boxes

dfdfdbf

fix typing

5c5560d

vectorize bounding box query on polygons

dd2c573

add test to cover no polygon overlap (None)

be95358

vectorize bounding box query on points and tests

fad9b1a

fix type

9b977d6

Merge branch 'giovp/parallel-transform' into giovp/dataloader3

f3f3d27

Merge branch 'main' into giovp/dataloader3

208e217

LucaMarconato and others added 3 commits May 15, 2026 23:16

Merge branch 'main' into giovp/dataloader3

df21672

add asv benchmark for dataloader performance

a51cb95

LucaMarconato merged commit 9cd4eb7 into main May 21, 2026
9 checks passed

LucaMarconato deleted the giovp/dataloader3 branch May 21, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improves dataloader performance#687

improves dataloader performance#687
LucaMarconato merged 29 commits into
mainfrom
giovp/dataloader3

giovp commented Aug 21, 2024 •

edited

Loading

Uh oh!

giovp commented Sep 3, 2024 •

edited

Loading

Uh oh!

LucaMarconato commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

giovp commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giovp commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucaMarconato commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

giovp commented Aug 21, 2024 •

edited

Loading

giovp commented Sep 3, 2024 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading