feat(ops): indexing & slicing — slice/select/narrow + index_select (CPU + CUDA)#17
Conversation
…PU + CUDA) Implements issue #10: zero-copy view-producing ops on Tensor and a front-door index_select free function dispatched per device. - Tensor::slice / select / narrow: pure metadata mutation with shared storage. PyTorch-style clamping for slice bounds; ShapeError for step <= 0, OOR select index, and out-of-range narrow ranges. - ctorch::index_select: fresh contiguous output, int32/int64 indices, cross-device + dtype + rank-0 + non-1-D index validation in the front-door. CPU pre-resolves indices into source-axis offsets; CUDA validates in a pre-pass kernel that raises a host-readable error flag, then gathers in a separate kernel. - Parity vs np.take across f32/f64/i32/i64 sources × int32/int64 indices × 1-D / 2-D / 3-D layouts. - docs/ops.md and README roadmap updated. Closes #10
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0d8700756
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex review on PR #17 flagged that launch_index_select copies `rank` entries into IndexSelectPlan's fixed-size (kMaxRank=16) arrays without any bound check, which would corrupt memory for tensors with rank > 16. The CPU backend handles arbitrary rank via std::vector-backed coords, so the guard lives only on the CUDA path (matches the rest of the CUDA op family). Adds a regression test that index_select on a 17-D CPU tensor still succeeds. Also applies clang-format updates flagged by the lint job.
|
@codex review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29b9e8834f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex review on PR #17 flagged two P2 signed-overflow UB cases: 1. src/ops/indexing_cpu.cpp + indexing_cuda.cu: `v += src_dim_size` overflows signed-64 when the user-supplied index is INT64_MIN. 2. src/tensor.cpp Tensor::select / narrow / slice: same pattern when the negative-index argument is INT64_MIN. Fix in both layers: range-check against the pre-normalised window [-size, size) (or [-size, size] for narrow's start, with a separate length check) before adding. For slice's clamping path, floor extreme negatives at -size first so the subsequent `+ size` is well-defined. Adds regression tests covering INT64_MIN inputs to select / narrow / slice / index_select.
|
@codex review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 881e81df6d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex review on PR #17 flagged two P1 signed-overflow UB cases: 1. Tensor::slice computed `length = (end - start + step - 1) / step`, which overflows signed-64 when `step` is near INT64_MAX. 2. Tensor::narrow checked `adj_start + length > size`, which overflows signed-64 when user-supplied `length` is near INT64_MAX. Slice now uses a branchy ceil-div: `span == 0 ? 0 : (span - 1) / step + 1`, where `span = end - start` is in `[0, size]` after clamping. Narrow uses the subtraction form `length > size - adj_start` (`adj_start` is already clamped to `[0, size]`, so `size - adj_start` stays non-negative). Adds regression tests covering INT64_MAX `step` to slice and INT64_MAX `length` to narrow.
|
@codex review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7a22486aaf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex review on PR #17 flagged a P2 signed-overflow UB: `Tensor::slice` unconditionally computed `out->stride[d] = old_stride * step`, which overflows signed-64 for huge `step` even when the resulting axis has length <= 1 (e.g. shape {2,2}, dim=0, step=INT64_MAX → 2 * INT64_MAX). The stride is unobservable for length 0 / 1 (the axis is never iterated), so collapse it to `old_stride` in that case. For length > 1 the multiplication is bounded: clamping forces step <= span <= size, so old_stride * step <= numel, well within int64. Adds a regression test covering shape {2,2} with INT64_MAX step.
|
@codex review. |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Implements issue #10: zero-copy view-producing ops on
Tensorand a front-doorindex_selectthat dispatches per device.Tensor::slice/select/narrow— pure metadata mutation, shared storage. PyTorch-style clamping for slice bounds;ShapeErrorforstep <= 0, OORselectindex, and out-of-rangenarrowranges.ctorch::index_select— fresh contiguous output, int32/int64 indices, cross-device + dtype + rank-0 + non-1-D index validation in the front-door.atomicMaxand writes resolved offsets to a scratch buffer; gather kernel then consumes the scratch.np.takeacrossf32 / f64 / i32 / i64sources ×int32 / int64indices × 1-D / 2-D / 3-D layouts (6 fixtures × 3 .npy files each).docs/ops.mdand README roadmap updated.Test plan
slice_test(8 tests) — basic slice, step, negative bounds, dim normalisation, clamping, empty slice, bad step, rank-0 rejectionview_alias_test(4 tests) — bidirectional aliasing, chained views, lifetimeindex_select_test(10 tests) — leading / inner / 3-D axes, int32 indices, negative indices, OOR rejection, strided sourceindex_select_dtype_test(7 tests) — dtype preservation, reject float / bool / non-1-D indices, dim OOR, rank-0 src, bfloat16indexing_parity_test(6 cases) — exact equality vsnp.takeCloses #10