Skip to content

Add ARM SSVE FP32 vector kernels#1

Open
codp594 wants to merge 4 commits into
Arm-Examples:mainfrom
codp594:main
Open

Add ARM SSVE FP32 vector kernels#1
codp594 wants to merge 4 commits into
Arm-Examples:mainfrom
codp594:main

Conversation

@codp594

@codp594 codp594 commented Jun 23, 2026

Copy link
Copy Markdown

Summary

This PR adds ARM SSVE/SME2 FP32 vector kernels under vecops_fp32, with a small timing program for running and measuring selected kernels.

Changes

  • Add FP32 real and complex vector kernels using streaming SVE/SME2 intrinsics
  • Add a simple CLI timing program for kernel selection, vector size, and iteration count
  • Add build configuration and usage documentation

Kernels

  • Complex FP32: mul, dot, power, conj_scale, conj_mul, conj_dot
  • Real FP32: mul, scale, dot, add

Huanlun Cheng added 4 commits June 23, 2026 15:46
- Require explicit compiler selection
- Target ARMv9.2-A with SME2/SVE/SVE2
- Enable optimized strict-warning build flags
- Document supported real and complex kernels
- Explain the fixed 512-bit vector assumption
- Include build instructions and benchmark usage
- Add CLI arguments for kernel choice, vector size, and iterations
- Measure elapsed time with the ARM generic timer
- Warm up the selected kernel before timing
- Add complex float32 kernels for multiply, dot, power, and conjugate variants
- Add real float32 kernels for multiply, scale, dot, and add
- Use streaming SVE/SME2 intrinsics with tuple loads and predicates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant