Skip to content

perf: optimize all_zeros using fast bytes comparison#3078

Open
mike-hunhoff wants to merge 4 commits into
masterfrom
fix/all_zeros
Open

perf: optimize all_zeros using fast bytes comparison#3078
mike-hunhoff wants to merge 4 commits into
masterfrom
fix/all_zeros

Conversation

@mike-hunhoff
Copy link
Copy Markdown
Collaborator

Optimization: Fast C-level check in all_zeros

Description

In capa/features/extractors/helpers.py:all_zeros, the code used a Python generator expression all(b == 0 for b in ...) to check if a buffer was entirely composed of null bytes. This introduced significant overhead due to Python bytecode execution for each byte.

Fix

Refactored the function to use bytez == b'\x00' * len(bytez). This delegates the operation to the highly optimized C implementation of Python's bytes comparison.

Trade-offs

  • Speed: Benchmarks showed a ~67x speedup for 256-byte buffers (the typical size used in capa for instruction and data references).
  • Memory: The new method creates a temporary bytes object of size len(bytez). However, since capa restricts the buffer size to MAX_BYTES_FEATURE_SIZE (256 bytes) in all calling contexts, the memory overhead is negligible (at most 256 bytes per call) and far outweighed by the performance gains.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed
  • This submission includes AI-generated code and I have provided details in the description.

@mike-hunhoff mike-hunhoff requested review from a team and williballenthin May 15, 2026 21:55
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the all_zeros helper function by replacing a generator expression with a direct bytes comparison to leverage C-level performance. A review comment identifies a correctness issue where the new implementation fails to compare correctly against other bytes-like objects such as bytearray or memoryview. The reviewer suggests using not any(bytez) as a more robust, type-agnostic, and efficient alternative that also benefits from short-circuiting.

Comment thread capa/features/extractors/helpers.py
@github-actions github-actions Bot dismissed their stale review May 15, 2026 21:58

CHANGELOG updated or no update needed, thanks! 😄

Copy link
Copy Markdown
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

thank you for:

  1. doing the benchmark, and
  2. including the inline comment so there's a record of the lesson

Copy link
Copy Markdown
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need to check floss for these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants