Skip to content

perf(scanner): cache rare-byte anchor in CompiledPattern#83

Merged
tkhquang merged 1 commit into
mainfrom
perf/scanner-anchor-cache
May 22, 2026
Merged

perf(scanner): cache rare-byte anchor in CompiledPattern#83
tkhquang merged 1 commit into
mainfrom
perf/scanner-anchor-cache

Conversation

@tkhquang
Copy link
Copy Markdown
Owner

@tkhquang tkhquang commented May 22, 2026

Summary

  • parse_aob() now caches the rarest literal byte's index on CompiledPattern::anchor so find_pattern() reads it as a single load instead of re-running the selection loop on every scan.
  • Manually constructed patterns fall back to inline selection via a sentinel default; the new CompiledPattern::compile_anchor() lets callers opt in to the cached fast path.
  • cpu_has_avx2() is hoisted out of the per-memchr-hit loop, and the AVX2 mismatch sentinel is replaced with std::optional<size_t> for clearer code.
  • Adds tests/bench_scanner.cpp, a standalone microbench that contrasts the rare-byte anchor against a first-literal-byte anchor on an 8 MiB code-like buffer. Build with -DDMK_BUILD_BENCHMARKS=ON.

Benchmark

8 MiB synthetic buffer tuned to x64 .text byte frequencies, AVX2 verify, 200 scans per sample, 11-sample median. Both runs use the exact same find_pattern code path; only CompiledPattern::anchor differs.

Scenario Smart anchor Naive anchor Speedup
common_first_rare_buried_8 (48 8B 05 37 DE AD BE EF) 523 us 14184 us 27.1x
common_first_rare_buried_16 573 us 13848 us 24.2x
all_common_first_no_match 585 us 16279 us 27.8x
rare_first_short_no_match (37 6B C1 BA 5E 71) 582 us 591 us 1.01x (noise)
long_mostly_wildcards 564 us 14030 us 24.9x
verify_heavy_32B_match (32 bytes) 569 us 14257 us 25.1x

When the first literal byte is common (0x48 REX.W, 0x8B MOV, etc) the rare-byte heuristic produces a 24x to 28x speedup. When the first byte is already rare both strategies are within 1% noise; the heuristic never regresses.

Full methodology and reproduction steps live in docs/analysis/scanner_bench_v3.x/README.md.

Test plan

  • Five new unit tests cover parse_aob caching the anchor, all-wildcard pattern encoding, compile_anchor() idempotency, the manual-construction fallback, and the empty-pattern boundary.
  • Full suite: 1088 / 1088 pass on mingw-debug.
  • Bench numbers reproduced across two runs (within 5% noise) on mingw-release with LTO.

Summary by CodeRabbit

Release Notes

  • New Features

    • Pattern scanning now uses a smart rare-byte anchor heuristic to improve scan performance by reducing false candidate matches.
    • Added new compile_anchor() method for explicit anchor selection in compiled patterns.
  • Documentation

    • Added comprehensive benchmark documentation for scanner performance testing.
    • Updated feature descriptions with anchor heuristic details.
  • Tests

    • Added microbenchmark harness for measuring scanner throughput across multiple pattern scenarios.
    • Extended unit tests for anchor behavior and prologue detection.

Review Change Stack

parse_aob() now stores the rarest literal byte's index on
CompiledPattern::anchor so find_pattern() reads it as a single load
instead of re-running the selection loop on every scan. Manually
constructed patterns fall back to inline selection via a sentinel
default. Also hoists cpu_has_avx2() out of the memchr hit loop and
replaces the AVX2 mismatch sentinel with std::optional<size_t>.

Adds tests/bench_scanner.cpp that contrasts the rare-byte anchor
against a first-literal-byte anchor on an 8 MiB code-like buffer;
on an AVX2 host the rare-byte strategy is 24x to 28x faster on
patterns whose first literal is a common opcode, and within 1%
noise when the first literal is already rare.
@tkhquang tkhquang self-assigned this May 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 331ae546-8d0e-4236-9bbe-b731299089db

📥 Commits

Reviewing files that changed from the base of the PR and between 469cb5e and 284e717.

📒 Files selected for processing (8)
  • AGENTS.md
  • README.md
  • docs/analysis/scanner_bench_v3.x/README.md
  • include/DetourModKit/scanner.hpp
  • src/scanner.cpp
  • tests/CMakeLists.txt
  • tests/bench_scanner.cpp
  • tests/test_scanner.cpp

📝 Walkthrough

Walkthrough

The pull request implements a rare-byte anchor heuristic for pattern matching in the AOB scanner. CompiledPattern now caches a precomputed anchor byte index pointing to the rarest literal in a pattern. The scanning loop uses this anchor to drive candidate matching, reducing false hits. AVX2 verification returns std::optional instead of sentinel values. A comprehensive benchmark harness and documentation demonstrate the performance impact of anchor selection.

Changes

Anchor Heuristic Feature and Validation

Layer / File(s) Summary
Anchor Caching API Contract
include/DetourModKit/scanner.hpp
Header defines CompiledPattern::anchor member initialized to std::numeric_limits<std::size_t>::max() sentinel and declares compile_anchor() noexcept method with documented semantics for idempotency and thread-safety constraints.
Anchor Selection and Compilation Implementation
src/scanner.cpp
Implements select_pattern_anchor() helper that ranks literal bytes by frequency class and selects the rarest non-wildcard byte index; compile_anchor() caches the anchor on the pattern; parse_aob() precomputes anchor before returning compiled patterns.
AVX2 Verification Refactoring and Scanning Loop Integration
src/scanner.cpp
Changes verify_pattern_avx2 signature to return std::optional<size_t> with std::nullopt on mismatch; hoists AVX2 CPU detection once per scan; anchor selection in find_pattern_raw() prefers cached pattern.anchor over dynamic computation; candidate verification loop branches on optional return and abandons candidates on mismatch.
Unit Tests for Anchor Behavior and Scanner Enhancements
tests/test_scanner.cpp
New test cases verify parse_aob() caches rarest-literal anchor, all-wildcard patterns set sentinel, compile_anchor() is idempotent for manual patterns, matching works with sentinel anchor fallback, and empty patterns have well-defined anchor state; additional prologue tests recognize short JMP rel8 and indirect JMP memory encodings.
Benchmark Implementation and CMake Configuration
tests/bench_scanner.cpp, tests/CMakeLists.txt
Standalone microbenchmark generates 8 MiB synthetic buffer with weighted opcode frequencies, plants test patterns at fixed offsets, compares smart vs. naive anchor strategies with correctness validation, measures median microseconds per full scan, computes throughput and speedup, and prints tab-separated result table; CMakeLists.txt adds DetourModKit_bench_scanner target without GoogleTest.
Benchmark Results and Development Documentation
docs/analysis/scanner_bench_v3.x/README.md, AGENTS.md, README.md
Benchmark documentation describes two anchor strategies, synthetic buffer generation with seed reproducibility, build/hardware configuration (CMake preset, GCC/LTO, AVX2, sample counts), measured timing table across scenarios, pattern-to-anchor mapping guide, and key takeaways about anchor dominance and verify-tier behavior; AGENTS.md documents release-build flags and benchmark commands; README.md explains rare-byte anchor heuristic reduction of false positives.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • tkhquang/DetourModKit#57: Both PRs modify the AVX2 pattern-verification path in src/scanner.cpp, introducing/adjusting verify_pattern_avx2 and the verification control flow.
  • tkhquang/DetourModKit#54: Both PRs add/extend Scanner::find_pattern rare-byte anchor validation in tests/test_scanner.cpp, directly targeting anchor-selection behavior.
  • tkhquang/DetourModKit#26: Both PRs extend documentation in AGENTS.md, with the main PR adding the Microbenchmarks section building on the earlier PR's navigation/admonition work.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main performance optimization: caching the rare-byte anchor index in CompiledPattern to avoid recomputation during pattern scanning.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tkhquang tkhquang merged commit 723e1a5 into main May 22, 2026
2 checks passed
@tkhquang tkhquang deleted the perf/scanner-anchor-cache branch May 22, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant