Skip to content

Remove unrolling of last iteration of loop for x86 decode#80

Merged
lemire merged 2 commits into
fast-pack:masterfrom
andrewthad:remove-x86-final-loop-unrolling
Apr 30, 2026
Merged

Remove unrolling of last iteration of loop for x86 decode#80
lemire merged 2 commits into
fast-pack:masterfrom
andrewthad:remove-x86-final-loop-unrolling

Conversation

@andrewthad
Copy link
Copy Markdown
Contributor

It is not clear why this was originally done. There is no comment in the source code, but it does not seem to improve performance.

If I run the perf benchmark on my laptop and bump the iterations up to 5000 from 100, I do not see any measurable difference.

It is not clear why this was originally done. There is no comment
in the source code, but it does not seem to improve performance.
@lemire
Copy link
Copy Markdown
Member

lemire commented Apr 29, 2026

@andrewthad Isn't this PR doing an extra memcpy?

@andrewthad
Copy link
Copy Markdown
Contributor Author

Good catch. I've made this change in the latest commit. I had forgotten that, without the unrolled final iteration, there was no need to load the keys for iteration N+1 at the beginning of iteration N. I've run the perf test again, and this further does not have any measurable impact, but it is a nice further simplification.

@lemire
Copy link
Copy Markdown
Member

lemire commented Apr 30, 2026

Verified manually. This does not seem to affect performance. Merging.

@lemire lemire merged commit 5c7251e into fast-pack:master Apr 30, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants