perf: optimize all_zeros using fast bytes comparison by mike-hunhoff · Pull Request #3078 · mandiant/capa

mike-hunhoff · 2026-05-15T21:55:14Z

Optimization: Fast C-level check in `all_zeros`

Description

In capa/features/extractors/helpers.py:all_zeros, the code used a Python generator expression all(b == 0 for b in ...) to check if a buffer was entirely composed of null bytes. This introduced significant overhead due to Python bytecode execution for each byte.

Fix

Refactored the function to use bytez == b'\x00' * len(bytez). This delegates the operation to the highly optimized C implementation of Python's bytes comparison.

Trade-offs

Speed: Benchmarks showed a ~67x speedup for 256-byte buffers (the typical size used in capa for instruction and data references).
Memory: The new method creates a temporary bytes object of size len(bytez). However, since capa restricts the buffer size to MAX_BYTES_FEATURE_SIZE (256 bytes) in all calling contexts, the memory overhead is negligible (at most 256 bytes per call) and far outweighed by the performance gains.

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

This submission includes AI-generated code and I have provided details in the description.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

gemini-code-assist

Code Review

This pull request optimizes the all_zeros helper function by replacing a generator expression with a direct bytes comparison to leverage C-level performance. A review comment identifies a correctness issue where the new implementation fails to compare correctly against other bytes-like objects such as bytearray or memoryview. The reviewer suggests using not any(bytez) as a more robust, type-agnostic, and efficient alternative that also benefits from short-circuiting.

CHANGELOG updated or no update needed, thanks! 😄

williballenthin

nice.

thank you for:

doing the benchmark, and
including the inline comment so there's a record of the lesson

mr-tz

We may also need to check floss for these issues.

perf: optimize all_zeros using fast bytes comparison

08e4faa

mike-hunhoff requested review from a team and williballenthin May 15, 2026 21:55

github-actions Bot previously requested changes May 15, 2026

View reviewed changes

merge upstream

cb4aad7

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Comment thread capa/features/extractors/helpers.py

update CHANGELOG

30dede2

fix lints and address review comments

9a59b5a

williballenthin approved these changes May 16, 2026

View reviewed changes

mr-tz approved these changes May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize all_zeros using fast bytes comparison#3078

perf: optimize all_zeros using fast bytes comparison#3078
mike-hunhoff wants to merge 4 commits into
masterfrom
fix/all_zeros

mike-hunhoff commented May 15, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

williballenthin left a comment

Uh oh!

mr-tz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mike-hunhoff commented May 15, 2026

Optimization: Fast C-level check in all_zeros

Description

Fix

Trade-offs

Checklist

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

williballenthin left a comment

Choose a reason for hiding this comment

Uh oh!

mr-tz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimization: Fast C-level check in `all_zeros`