Skip to content

Optimize swizzle_dyn for AArch64 with N > 16#528

Merged
calebzulawski merged 1 commit into
rust-lang:masterfrom
Kmeakin:km/aarch64-dyn-swizzle
May 30, 2026
Merged

Optimize swizzle_dyn for AArch64 with N > 16#528
calebzulawski merged 1 commit into
rust-lang:masterfrom
Kmeakin:km/aarch64-dyn-swizzle

Conversation

@Kmeakin
Copy link
Copy Markdown
Contributor

@Kmeakin Kmeakin commented May 28, 2026

We can do swizzles for 24, 32, 48 and 64 byte vectors by stacking multiple TBL instructions.

See https://godbolt.org/z/PE95nrqjj for a comparison of the generated assembly.

I would do the same optimization for 32-bit Arm but I can't get at the intrinsics for some reason: https://godbolt.org/z/54jqTEeeW

@programmerjake
Copy link
Copy Markdown
Member

to ensure it's actually tested, you'll need to uncomment the lane counts that you have special cases for, so uncomment 48 here:

We can do swizzles for 24, 32, 48 and 64 byte vectors by stacking multiple TBL
instructions.

See https://godbolt.org/z/PE95nrqjj for a comparison of the generated assembly.
@Kmeakin Kmeakin force-pushed the km/aarch64-dyn-swizzle branch from 8278494 to beafe83 Compare May 30, 2026 00:59
@calebzulawski
Copy link
Copy Markdown
Member

Looks good, thank you!

@calebzulawski calebzulawski merged commit be090a7 into rust-lang:master May 30, 2026
53 checks passed
@Kmeakin Kmeakin deleted the km/aarch64-dyn-swizzle branch May 30, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants