`dataset4dstem` support torch array while maintaining backward compatibility by bobleesj · Pull Request #228 · electronmicroscopy/quantem

bobleesj · 2026-05-19T05:58:24Z

What problem this PR addreseses

A long discussion has been initiated here: #222 with action plan in #222 (comment)

tl;dr - allow datset4dstem to hold torch tensor, w/o breaking existing notebooks and codes.

API

# Existing — unchanged
Dataset4dstem.from_array(numpy_arr, sampling=..., units=..., name=...)

# New — GPU-resident path
Dataset4dstem.from_tensor(torch_tensor, sampling=..., units=..., name=...)

Access

┌─────────────────────────────┬─────────────────────────────┬───────────────────────────────────────────────────┐
│      Property / Method      │  numpy-backed (from_array)  │            tensor-backed (from_tensor)            │
├─────────────────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
│ dset.array                  │ np.ndarray                  │ None (use .tensor)                                │
├─────────────────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
│ dset.tensor                 │ AttributeError              │ torch.Tensor                                      │
├─────────────────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
│ dset.numpy()                │ np.ndarray (same as .array) │ np.ndarray (CPU copy via .detach().cpu().numpy()) │
├─────────────────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
│ dset.device                 │ "cpu"                       │ "cuda:0" / "mps" / "cpu"                          │
├─────────────────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
│ dset.to(device)             │ AttributeError              │ moves tensor, returns self                        │
├─────────────────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤
│ dset.shape / .ndim / .dtype │ from numpy                  │ from tensor                                       │
└─────────────────────────────┴─────────────────────────────┴───────────────────────────────────────────────────┘

May 22, 2026 update

Verification:

Widget:

Ptycho notebook:

Direct ptycho noteobok:

What should the reviewer(s) do

Arthur's comment: #222 (comment)

non-breaking just adding basic torch support
in base Dataset dset.tensor # is None if not set, .tensor and .array, but only one of them will be set (raises AttributeError for explicitness)
Dataset4dstem: dset.from_tensor classmethod
dset.device (just return self.tensor.device)
dset.to (just moving dset.tensor)
dset.numpy() method (will be required in the future, make it now so people get used to it)

bobleesj

@arthurmccray This is ready for review - I tried to have as minimal change as possible while catering to the comments provided.

dataset3d and others aren't touched on purpose to make this PR easy to review. Happy to iterative a few times or address anything to make this PR more robust.

bobleesj · 2026-05-23T06:35:04Z

-        self._array = arr
+        super().__init__()
+        # Dual-slot storage: exactly one of (_array, _tensor) is set.
+        if array is None and tensor is None:


Some conditional checks for now - user can either have nupmy-backed OR torch-backed. Not both at this stage

Is there a way that an array or tensor is never initialized for a Dataset? Otherwise, I feel like this first conditional is kind of redundant since everything is instantiated with from_data or from_tensor.

agreed that some of these protections are probably unnecessary, but it's okay to leave them assuming that they will be removed once the transition is complete (maybe with a comment stating as much)

bobleesj · 2026-05-23T06:35:38Z

+        return (array if array is not None else self._tensor).ndim

    @property
    def dtype(self) -> DTypeLike:


dtype - based on the given numpy or torch

Are torch.dtype included in numpys DTypeLike?

bobleesj · 2026-05-23T06:35:56Z

+        return (array if array is not None else self._tensor).dtype

    @property
    def device(self) -> str:


device - cpy for numpy, for torch, depends on the tensor

you can actually do .device on numpy arrays, np.arange(10).device -> "cpu". it's included to be compatible with other array packages :)

bobleesj · 2026-05-23T06:36:15Z

+        return "cpu"

-        For NumPy-only datasets, this is always "cpu".
+    def numpy(self) -> NDArray:


@arthurmccray comment on - getting User used to this for explicit array type.

this looks good to me! Only thing i would add is the flags.writable thing that Cedric found to the torch tensor output, making it clear that it cannot be writable. I haven't tested this but it seems like what we want: #222 (comment)

arthurmccray

Overall this looks good! A couple questions on dtypes and devices, and a few places where we should at least put comments for temporary things that will be removed once the transition is complete. Once those are addressed I think it should be good to merge

arthurmccray · 2026-05-26T16:55:37Z

-        self._array = arr
+        super().__init__()
+        # Dual-slot storage: exactly one of (_array, _tensor) is set.
+        if array is None and tensor is None:


agreed that some of these protections are probably unnecessary, but it's okay to leave them assuming that they will be removed once the transition is complete (maybe with a comment stating as much)

arthurmccray · 2026-05-26T17:01:08Z

+        return (array if array is not None else self._tensor).ndim

    @property
    def dtype(self) -> DTypeLike:


Are torch.dtype included in numpys DTypeLike?

arthurmccray · 2026-05-26T17:09:48Z

+        return (array if array is not None else self._tensor).dtype

    @property
    def device(self) -> str:


you can actually do .device on numpy arrays, np.arange(10).device -> "cpu". it's included to be compatible with other array packages :)

arthurmccray · 2026-05-26T17:11:41Z

+        return "cpu"

-        For NumPy-only datasets, this is always "cpu".
+    def numpy(self) -> NDArray:


this looks good to me! Only thing i would add is the flags.writable thing that Cedric found to the torch tensor output, making it clear that it cannot be writable. I haven't tested this but it seems like what we want: #222 (comment)

arthurmccray · 2026-05-26T17:26:01Z

+            raise AttributeError(
+                f"Cannot .to({device!r}) on numpy-backed Dataset '{self.name}'."
+            )
+        self._tensor = tensor.to(device)


From the config module we have a method for validating and getting canonical names for devices, which might be useful here.

from quantem.core import config dev, _id = config.validate_device(device) self._tensor = tensor.to(dev)

arthurmccray · 2026-05-26T17:30:30Z

+        if tensor.ndim != 4:
+            raise ValueError(
+                f"Dataset4dstem.from_tensor requires a 4D tensor "
+                f"(scan_row, scan_col, dp_row, dp_col), got shape {tuple(tensor.shape)}."
+            )


This is fine for now, but I think we should update the validators (or maybe make a new ensure_valid_tensor to match ensure_valid_array). I generally like having validators as it significantly cuts down on bloat.

bobleesj added 6 commits May 18, 2026 22:36

dataset4d, dataset4dstem hold torch array

98ddd0d

bring original docstring back

d2dc32b

remove need for cached numpy array

6b7319d

further cleaup api docstring

7f9913f

use _array _tensor duck typing for show4dstem

9c93ae9

use row, col convention in docstring

40878d3

bobleesj mentioned this pull request May 23, 2026

MAPED data merging with Torch #204

Open

fix: tolerate missing _tensor slot on autoserialize-loaded datasets

21b684f

bobleesj commented May 23, 2026

View reviewed changes

bobleesj marked this pull request as ready for review May 23, 2026 06:40

This was referenced May 24, 2026

Dataset5dstem - stack reduction, slicing, Nion Swift, match Dataset4dstem API convention #151

Closed

Dataset5dstem torch native initial implementation (MAPED, time-series) #231

Draft

arthurmccray approved these changes May 26, 2026

View reviewed changes

Conversation

bobleesj commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem this PR addreseses

What should the reviewer(s) do

Uh oh!

bobleesj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arthurmccray left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bobleesj commented May 19, 2026 •

edited

Loading