Skip to content

feat(ops): indexing & slicing — slice/select/narrow + index_select (CPU + CUDA)#17

Merged
Hayden727 merged 5 commits into
masterfrom
feat/indexing-slicing-issue-10
Apr 29, 2026
Merged

feat(ops): indexing & slicing — slice/select/narrow + index_select (CPU + CUDA)#17
Hayden727 merged 5 commits into
masterfrom
feat/indexing-slicing-issue-10

Conversation

@Hayden727

Copy link
Copy Markdown
Owner

Summary

Implements issue #10: zero-copy view-producing ops on Tensor and a front-door index_select that dispatches per device.

  • Tensor::slice / select / narrow — pure metadata mutation, shared storage. PyTorch-style clamping for slice bounds; ShapeError for step <= 0, OOR select index, and out-of-range narrow ranges.
  • ctorch::index_select — fresh contiguous output, int32/int64 indices, cross-device + dtype + rank-0 + non-1-D index validation in the front-door.
    • CPU: pre-resolves indices into source-axis offsets (one bounds check per index entry, not per output element); odometer-walks the contiguous output.
    • CUDA: two-kernel pattern — validation kernel raises a host-readable error flag via atomicMax and writes resolved offsets to a scratch buffer; gather kernel then consumes the scratch.
  • Parity vs np.take across f32 / f64 / i32 / i64 sources × int32 / int64 indices × 1-D / 2-D / 3-D layouts (6 fixtures × 3 .npy files each).
  • docs/ops.md and README roadmap updated.

Test plan

  • slice_test (8 tests) — basic slice, step, negative bounds, dim normalisation, clamping, empty slice, bad step, rank-0 rejection
  • view_alias_test (4 tests) — bidirectional aliasing, chained views, lifetime
  • index_select_test (10 tests) — leading / inner / 3-D axes, int32 indices, negative indices, OOR rejection, strided source
  • index_select_dtype_test (7 tests) — dtype preservation, reject float / bool / non-1-D indices, dim OOR, rank-0 src, bfloat16
  • indexing_parity_test (6 cases) — exact equality vs np.take
  • All 233 tests pass on CPU build (was 190 baseline; +43 new tests)
  • CUDA build verification (no toolchain on this dev machine; CI will exercise)

Closes #10

…PU + CUDA)

Implements issue #10: zero-copy view-producing ops on Tensor and a
front-door index_select free function dispatched per device.

- Tensor::slice / select / narrow: pure metadata mutation with shared
  storage. PyTorch-style clamping for slice bounds; ShapeError for
  step <= 0, OOR select index, and out-of-range narrow ranges.
- ctorch::index_select: fresh contiguous output, int32/int64 indices,
  cross-device + dtype + rank-0 + non-1-D index validation in the
  front-door. CPU pre-resolves indices into source-axis offsets;
  CUDA validates in a pre-pass kernel that raises a host-readable
  error flag, then gathers in a separate kernel.
- Parity vs np.take across f32/f64/i32/i64 sources × int32/int64
  indices × 1-D / 2-D / 3-D layouts.
- docs/ops.md and README roadmap updated.

Closes #10

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0d8700756

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/ops/indexing_cuda.cu
Codex review on PR #17 flagged that launch_index_select copies `rank`
entries into IndexSelectPlan's fixed-size (kMaxRank=16) arrays without
any bound check, which would corrupt memory for tensors with rank > 16.

The CPU backend handles arbitrary rank via std::vector-backed coords,
so the guard lives only on the CUDA path (matches the rest of the
CUDA op family). Adds a regression test that index_select on a 17-D
CPU tensor still succeeds.

Also applies clang-format updates flagged by the lint job.
@Hayden727

Copy link
Copy Markdown
Owner Author

@codex review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29b9e8834f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/ops/indexing_cpu.cpp Outdated
Comment thread src/tensor.cpp Outdated
Codex review on PR #17 flagged two P2 signed-overflow UB cases:

  1. src/ops/indexing_cpu.cpp + indexing_cuda.cu: `v += src_dim_size`
     overflows signed-64 when the user-supplied index is INT64_MIN.
  2. src/tensor.cpp Tensor::select / narrow / slice: same pattern
     when the negative-index argument is INT64_MIN.

Fix in both layers: range-check against the pre-normalised window
[-size, size) (or [-size, size] for narrow's start, with a separate
length check) before adding. For slice's clamping path, floor extreme
negatives at -size first so the subsequent `+ size` is well-defined.

Adds regression tests covering INT64_MIN inputs to select / narrow /
slice / index_select.
@Hayden727

Copy link
Copy Markdown
Owner Author

@codex review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 881e81df6d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tensor.cpp Outdated
Comment thread src/tensor.cpp Outdated
Codex review on PR #17 flagged two P1 signed-overflow UB cases:

  1. Tensor::slice computed `length = (end - start + step - 1) / step`,
     which overflows signed-64 when `step` is near INT64_MAX.
  2. Tensor::narrow checked `adj_start + length > size`, which overflows
     signed-64 when user-supplied `length` is near INT64_MAX.

Slice now uses a branchy ceil-div: `span == 0 ? 0 : (span - 1) / step + 1`,
where `span = end - start` is in `[0, size]` after clamping. Narrow uses
the subtraction form `length > size - adj_start` (`adj_start` is already
clamped to `[0, size]`, so `size - adj_start` stays non-negative).

Adds regression tests covering INT64_MAX `step` to slice and INT64_MAX
`length` to narrow.
@Hayden727

Copy link
Copy Markdown
Owner Author

@codex review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a22486aaf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tensor.cpp Outdated
Codex review on PR #17 flagged a P2 signed-overflow UB: `Tensor::slice`
unconditionally computed `out->stride[d] = old_stride * step`, which
overflows signed-64 for huge `step` even when the resulting axis has
length <= 1 (e.g. shape {2,2}, dim=0, step=INT64_MAX → 2 * INT64_MAX).

The stride is unobservable for length 0 / 1 (the axis is never
iterated), so collapse it to `old_stride` in that case. For length > 1
the multiplication is bounded: clamping forces step <= span <= size, so
old_stride * step <= numel, well within int64.

Adds a regression test covering shape {2,2} with INT64_MAX step.
@Hayden727

Copy link
Copy Markdown
Owner Author

@codex review.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Hayden727 Hayden727 merged commit 36ac56b into master Apr 29, 2026
7 checks passed
@Hayden727 Hayden727 deleted the feat/indexing-slicing-issue-10 branch April 29, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Indexing & slicing (CPU + CUDA)

1 participant