Skip to content

Remove NumPy benchmark dependency; fix benchmark honesty and where uint8 masks#1

Merged
rizukirr merged 1 commit into
mainfrom
chore/remove-numpy-and-bench-honesty
Jun 11, 2026
Merged

Remove NumPy benchmark dependency; fix benchmark honesty and where uint8 masks#1
rizukirr merged 1 commit into
mainfrom
chore/remove-numpy-and-bench-honesty

Conversation

@rizukirr

@rizukirr rizukirr commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Three related pieces of work. No behavioral change to library math except the numc_where fix (#3).

1. Remove NumPy as a benchmark dependency

  • Delete the NumPy benchmark suite (bench/numpy/) and the comparison/charting tooling (bench/compare.py, bench/graph/).
  • run.sh's numpy-run, compare, and plot stages are gone — ./run.sh bench is now numc-only, emitting bench/numc/results.csv.
  • Drop "vs NumPy" framing from README, ROADMAP, and bench/README; reword kernel comments to keep the rationale without the NumPy name.
  • externals/numpy/ is kept as a study reference (and its CLAUDE/AGENTS reference pointer stays).

2. Benchmark honesty (bench/numc/bench.c)

The previous harness flattered several ops. Fixes:

  • Spread input data (was a single constant), so data-dependent branches in comparisons, max/min, clip, argmax/argmin, and where aren't perfectly predicted. Nonzero divisors avoid integer divide-by-zero.
  • where uses a 50/50 condition mask (was all-true).
  • In-place exp/log reset each iteration — they were compounding to inf/NaN within a few iterations and timing a different code path.
  • Single statistic everywhere: report the minimum per-iteration time (matmul already did; the rest used mean).
  • New cache category: an add sweep from L1 into DRAM so throughput's cache dependence is explicit (the fixed-1M numbers are largely L3-resident).

3. numc_where accepts a uint8 condition mask

Comparisons emit uint8, but where previously required cond to match the value dtype — so the natural where(numc_gt(...), a, b, out) pattern was rejected (and the uint8 carve-out in _check_ternary was dead code).

  • Add a uint8-cond kernel variant + dispatch table; where now accepts cond of either the value dtype or uint8 over any value dtype.
  • Tests: uint8-mask where (contiguous, comparison-driven, and strided).

Verification

  • ./run.sh test49/49 pass (incl. 3 new where tests).
  • Benchmark suite builds and runs clean; the cache sweep shows float32 throughput falling ~5.7× from L3-resident to DRAM-bound.

…nt8 masks

NumPy removal:
- Delete the NumPy benchmark suite (bench/numpy/) and comparison tooling
  (bench/compare.py, bench/graph/); run.sh's numpy/compare/plot stages are
  gone, so `./run.sh bench` is now numc-only, emitting bench/numc/results.csv.
- Drop "vs NumPy" framing from README, ROADMAP, and bench/README; reword
  kernel comments to keep the rationale without the NumPy name.
  externals/numpy/ stays as a study reference.

Benchmark honesty (bench/numc/bench.c):
- Spread per-element input data (was constant) so data-dependent ops
  (comparisons, max/min, clip, argmax/argmin, where) aren't flattered by
  perfect branch prediction; nonzero divisors avoid integer div-by-zero.
- where uses a 50/50 condition mask.
- Reset in-place exp/log inputs each iteration (were compounding to inf/NaN).
- Report the minimum per-iteration time consistently across all categories.
- Add an L1->DRAM cache sweep so throughput's cache dependence is explicit.

numc_where uint8 condition masks:
- Comparisons emit uint8; where now accepts a uint8 cond over any value dtype
  (the natural comparison-mask pattern) via a uint8-cond kernel variant and
  dispatch table, fixing the previously dead uint8 path in _check_ternary.
- Add tests: uint8-mask where (contiguous, comparison-driven, and strided).
@rizukirr rizukirr merged commit f2a865c into main Jun 11, 2026
9 checks passed
@rizukirr rizukirr deleted the chore/remove-numpy-and-bench-honesty branch June 11, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant