feat: SIMD-accelerated box operations

## Description

All core computation loops (IoU, GIoU, DIoU, box areas, NMS) are currently scalar. There is an opportunity to use SIMD intrinsics to process multiple boxes or box pairs per instruction.

## Candidate hot paths

### Box areas (`box_areas_slice`)
Currently computes `(x2-x1) * (y2-y1)` one box at a time. With AVX2, 4 `f64` boxes can be processed per iteration (load 4 x1/y1/x2/y2, subtract, multiply).

### IoU inner loop (`iou_distance_slice`)
The N1×N2 loop computes min/max for intersection then division for each pair. The inner loop (iterating over boxes2 for a fixed boxes1 row) can be vectorized:
- Load 4 boxes2 at once
- SIMD `min`/`max` for intersection coordinates
- SIMD multiply + subtract for area
- SIMD divide for IoU

### NMS suppression check
After sorting by score, the suppression loop checks IoU against all remaining candidates. The IoU comparison can be vectorized similarly.

## Approach options

1. **Auto-vectorization hints** — restructure loops so LLVM auto-vectorizes (SoA layout instead of AoS, `#[target_feature]` annotations). Lowest effort, portable.
2. **`std::simd` (nightly)** — use Rust's portable SIMD API behind a feature flag. Clean but requires nightly.
3. **`std::arch` intrinsics** — manual SSE4.1/AVX2 with `#[cfg(target_arch)]` fallback. Maximum control, stable Rust.
4. **`pulp` or `wide` crate** — safe SIMD wrappers that work on stable. Good middle ground.

## Suggested plan

- Add a `simd` feature flag (off by default)
- Start with `box_areas_slice` as a benchmark-driven proof of concept
- Benchmark against the scalar version with criterion
- If gains are significant (>2x), extend to `iou_distance_slice` inner loop
- Keep Rayon parallelism orthogonal (SIMD within each thread)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SIMD-accelerated box operations #65

Description

Candidate hot paths

Box areas (`box_areas_slice`)

IoU inner loop (`iou_distance_slice`)

NMS suppression check

Approach options

Suggested plan

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: SIMD-accelerated box operations #65

Description

Description

Candidate hot paths

Box areas (box_areas_slice)

IoU inner loop (iou_distance_slice)

NMS suppression check

Approach options

Suggested plan

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Box areas (`box_areas_slice`)

IoU inner loop (`iou_distance_slice`)