Skip to content

feat(memory): add single-fault-guard pointer-chain read primitives#87

Merged
tkhquang merged 2 commits into
mainfrom
feat/memory-pointer-chain
May 29, 2026
Merged

feat(memory): add single-fault-guard pointer-chain read primitives#87
tkhquang merged 2 commits into
mainfrom
feat/memory-pointer-chain

Conversation

@tkhquang
Copy link
Copy Markdown
Owner

@tkhquang tkhquang commented May 29, 2026

What

New Memory primitives for resolving and reading multi-level (Cheat-Engine-style) pointer chains under one fault guard:

  • plausible_userspace_ptr(p) -- constexpr x64 user-mode pointer check (no syscall, no memory access)
  • seh_resolve_chain(base, {offsets...}) -- resolve a chain to its final address
  • seh_read_chain<T>(base, {offsets...}) -- resolve and read a typed value
  • seh_read_chain_bytes(base, {offsets...}, out, n) -- resolve and read a raw range

The whole walk runs in one fault guard (a single __try on MSVC, VirtualQuery-guarded per link on MinGW). Intermediate links are plausibility-screened, so a faulting or implausible link aborts the walk and returns nullopt/false.

Why

Gating every hot-path read with is_readable costs a lock plus a possible VirtualQuery per field and is a time-of-check/time-of-use illusion. These primitives read directly under one guard instead. Usage guidance in docs/misc/hot-path-memory.md.

Benchmark

tests/bench_memory.cpp, MSVC 2022 release (/O2, -DDMK_BUILD_BENCHMARKS=ON), 200k iterations x 15 samples, median per call. One machine, illustrative; run it for your own target.

Per-call cost (warm cache where applicable):

Operation ns/call
direct volatile load 3.9
read_ptr_unchecked 3.9
seh_read<u64> 7.4
is_readable warm HIT 54.5
is_readable cold MISS (VirtualQuery) 236.7

Pointer chain, 6 links, warm cache:

Walk ns/call
seh_read_chain<u64> (one guard) 10.9
gated per-link (is_readable before each) 316.1

The chain primitive is ~29x faster than gating each link, and a single SEH-guarded read (7.4 ns) is within ~2x of a raw load because the MSVC __try is table-driven.

Hot-path probe model (8 reads across ~3 cache-missing objects), per probe:

Path mean p99 max
GATED (is_readable per read) 6120 ns 10500 ns 152100 ns
DIRECT (one guard, raw reads) 89 ns 400 ns 5100 ns

Gating costs ~69x the mean and a much worse tail (p99 10500 ns vs 400 ns). At 256 such probes per frame that is 9.4% of a 16.67 ms frame gated vs 0.14% direct.

Summary by CodeRabbit

  • New Features

    • Pointer-chain resolution and guarded memory read helpers
    • User-space pointer plausibility checks and bounds constants
    • Added standalone memory microbenchmark executable
  • Documentation

    • New hot-path guide for efficient memory access patterns
    • Benchmark analysis and updated test/benchmark docs
  • Tests

    • New pointer-chain memory test suite
    • Memory performance benchmark with multi-scenario analysis

Review Change Stack

Add plausible_userspace_ptr, seh_resolve_chain, seh_read_chain<T>, and seh_read_chain_bytes for resolving and reading multi-level pointer chains under one fault guard. Include a hot-path memory usage guide, unit tests, and a standalone microbenchmark.
@tkhquang tkhquang self-assigned this May 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR adds guarded multi-level pointer-chain primitives and an inline pointer plausibility predicate, implements SEH/MinGW-safe chain resolution and chain-read helpers, introduces tests for chain semantics and non-default-constructible typed reads, and ships hot-path documentation plus a microbenchmark measuring validation/read costs and contention.

Changes

Pointer-chain resolution and hot-path memory access

Layer / File(s) Summary
Pointer plausibility foundation
include/DetourModKit/memory.hpp
Adds <initializer_list>/<span>, canonical x64 user-mode pointer bounds (USERSPACE_PTR_MIN, USERSPACE_PTR_MAX), and plausible_userspace_ptr(uintptr_t) plus expanded predicate hot-path docs.
Pointer-chain API declarations
include/DetourModKit/memory.hpp
seh_read<T> now reconstructs via byte-read + std::bit_cast; declares span/initializer_list overloads for seh_resolve_chain, seh_read_chain_bytes, and seh_read_chain<T> template. Header docs updated for is_readable/is_writable/module_range_for.
Pointer-chain implementation
src/memory.cpp
Implements an internal guarded pointer-chain walker (SEH on MSVC; read_ptr_unsafe + plausible_userspace_ptr on MinGW). Adds public seh_resolve_chain (optional final address) and seh_read_chain_bytes (resolve + read terminal bytes with MSVC inlined prechecks; MinGW delegates final read).
Pointer-chain test suite
tests/test_memory_chain.cpp
GoogleTest coverage for plausibility predicate (runtime + static_assert), seh_resolve_chain semantics (empty/multi-level, offsets, failure), seh_read_chain typed reads (including non-default-constructible types), and seh_read_chain_bytes argument/zero-byte/base-copy behaviors.
Documentation and performance measurement
tests/CMakeLists.txt, tests/bench_memory.cpp, AGENTS.md, docs/misc/hot-path-memory.md, docs/tests/README.md, docs/analysis/memory_bench_v3.x/README.md
Adds DetourModKit_bench_memory microbenchmark and benchmark orchestration; documents benchmark methodology, TSV output format, and measured scenarios; adds hot-path guidance recommending plausibility checks + single SEH-guarded reads, primitive selection, MSVC vs MinGW notes, and anti-patterns; updates test inventory and AGENTS.md entries.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • tkhquang/DetourModKit#44: Introduces Memory::read_ptr_unsafe() used by the non-SEH (MinGW) pointer-chain walk.
  • tkhquang/DetourModKit#50: Refactors/optimizes the MinGW read_ptr_unsafe/pointer-read paths that the new chain code depends on.
  • tkhquang/DetourModKit#70: Related benchmark infrastructure changes; complements the event-dispatcher benchmark with the new memory microbenchmark.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding pointer-chain read primitives under single fault guard, which is the core feature of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_memory_chain.cpp (1)

16-170: ⚡ Quick win

Wrap this suite in a fixture.

This new file uses raw TEST(...) cases, but the repo test contract for tests/**/*.cpp requires a ::testing::Test subclass with SetUp()/TearDown(). Even if the hooks are empty today, converting these to TEST_F keeps the suite consistent and gives you a safe place to reset memory state if future cases start touching the cache.

As per coding guidelines: "tests/**/*.cpp: Use GoogleTest framework. One test file per module: tests/test_<module>.cpp mirrors src/<module>.cpp. Each suite must use a ::testing::Test subclass with SetUp()/TearDown() for temp file cleanup."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_memory_chain.cpp` around lines 16 - 170, The tests must be
converted to use a GoogleTest fixture class (e.g., declare class MemoryTest :
public ::testing::Test with SetUp() and TearDown() methods, even if empty) and
replace all TEST(...) macros in this file (MemoryPlausiblePtr,
MemorySehResolveChain, MemorySehReadChain cases) with TEST_F(MemoryTest, ...),
so each case runs under the fixture and you have a place to reset memory state
or caches later; ensure the fixture class is defined above the tests and
referenced by each TEST_F invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/DetourModKit/memory.hpp`:
- Around line 382-397: The template seh_read currently requires
std::is_default_constructible_v<T> and declares a T value, which prevents
reading trivially copyable but non-default-constructible types; change seh_read
(and the typed chained-read overloads) to only require
std::is_trivially_copyable_v<T>, allocate raw storage (e.g.,
std::array<std::byte, sizeof(T)> or std::aligned_storage_t<sizeof(T),
alignof(T)>) and call seh_read_bytes(addr, storage.data(), sizeof(T)), then
return std::bit_cast<T>(storage) on success (or std::nullopt on failure); remove
the default-constructible constraint and any direct T value declaration so the
byte-wise read works for non-default-constructible trivially copyable types.

In `@tests/bench_memory.cpp`:
- Around line 174-185: make_churn_pool can return an empty pool which causes
division-by-zero / invalid indexing in run_contention and run_probe; after
filling the pool in make_churn_pool check if pool.empty() and fail fast (e.g.,
throw a descriptive std::runtime_error or call std::exit after printing an
error) so callers never receive an empty vector; include a clear message
referencing the failed VirtualAlloc allocations so the benchmark setup error is
explicit.

---

Nitpick comments:
In `@tests/test_memory_chain.cpp`:
- Around line 16-170: The tests must be converted to use a GoogleTest fixture
class (e.g., declare class MemoryTest : public ::testing::Test with SetUp() and
TearDown() methods, even if empty) and replace all TEST(...) macros in this file
(MemoryPlausiblePtr, MemorySehResolveChain, MemorySehReadChain cases) with
TEST_F(MemoryTest, ...), so each case runs under the fixture and you have a
place to reset memory state or caches later; ensure the fixture class is defined
above the tests and referenced by each TEST_F invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 994799ac-b0f9-4e1a-a7c5-08ce3bbe2408

📥 Commits

Reviewing files that changed from the base of the PR and between ddb15a0 and 2ac5ace.

📒 Files selected for processing (8)
  • AGENTS.md
  • docs/misc/hot-path-memory.md
  • docs/tests/README.md
  • include/DetourModKit/memory.hpp
  • src/memory.cpp
  • tests/CMakeLists.txt
  • tests/bench_memory.cpp
  • tests/test_memory_chain.cpp

Comment thread include/DetourModKit/memory.hpp Outdated
Comment thread tests/bench_memory.cpp
- seh_read and both seh_read_chain<T> overloads now require only std::is_trivially_copyable_v<T>, read into raw std::array<std::byte> storage, and return via std::bit_cast, so non-default-constructible trivially-copyable types can be read.
- Build the benchmarks with the library Release LTO so a std::thread-using bench links under MinGW (works around a GCC LTO multiple-definition bug on a mixed non-LTO/LTO link).
- make_churn_pool fails fast when no pages can be allocated.
- Add a non-default-constructible read test and a memory benchmark analysis doc.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/bench_memory.cpp (1)

249-250: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align DIRECT-path comments with actual benchmark behavior.

These comments say DIRECT is “one __try per probe”, but the code does raw volatile dereferences without seh_* calls. Please reword to avoid overstating the measured path.

Suggested wording update
-    // predicate and reads under one __try per probe. This measures both,
+    // predicate and reads directly (raw dereference baseline). This measures both,
@@
-    // DIRECT = read inside one per-probe __try (the SEH fast path).
+    // DIRECT = raw reads (dereference baseline; no per-read validation gate).

Also applies to: 432-433

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/bench_memory.cpp` around lines 249 - 250, Update the inline comments
that describe the DIRECT path so they accurately reflect the implementation:
change any claim like "one __try per probe" to indicate that DIRECT does raw
volatile dereferences (no seh_* calls or SEH __try/__except usage) and therefore
measures the non-SEH/unchecked access path; apply this wording change in the
comment blocks around the DIRECT-path descriptions (the comment near the
volatile dereference usage and the corresponding second comment later in the
file).
🧹 Nitpick comments (1)
include/DetourModKit/memory.hpp (1)

393-396: ⚡ Quick win

Initialize storage at declaration.

std::array<std::byte, sizeof(T)> storage; is left uninitialized here (and identically at Line 489). It is not a correctness bug today because bit_cast only runs on the success path where seh_read_bytes fully populates the buffer, but it deviates from the project's initialization rule.

Note seh_read_bytes is an opaque (non-inline) call, so the zero-init won't be elided on this benchmarked hot path. If the few-byte cost is unacceptable, keep it uninitialized but add a // comment documenting the deliberate omission and the full-write contract.

♻️ Brace-initialize the storage buffers
-            std::array<std::byte, sizeof(T)> storage;
+            std::array<std::byte, sizeof(T)> storage{};
             if (!seh_read_bytes(addr, storage.data(), sizeof(T)))

Apply the same change at Line 489:

-            std::array<std::byte, sizeof(T)> storage;
+            std::array<std::byte, sizeof(T)> storage{};
             if (!seh_read_chain_bytes(base, offsets, storage.data(), sizeof(T)))

As per coding guidelines: "Do not add uninitialized variables -- always initialize at declaration with brace syntax or assignment."

Also applies to: 489-492

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/DetourModKit/memory.hpp` around lines 393 - 396, The local std::array
buffer (storage) must be initialized at declaration to satisfy the project's
rule against uninitialized variables; change the two declarations of
std::array<std::byte, sizeof(T)> storage; to brace-initialize (e.g.,
std::array<std::byte, sizeof(T)> storage{};) so storage is zero-initialized
before calling seh_read_bytes and then std::bit_cast<T>(storage) is safe; if you
intentionally want to avoid the zeroing for performance, instead leave it
uninitialized but add a clear comment next to the declaration documenting the
deliberate omission and the full-write contract guaranteed by seh_read_bytes and
why that makes it safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/bench_memory.cpp`:
- Around line 249-250: Update the inline comments that describe the DIRECT path
so they accurately reflect the implementation: change any claim like "one __try
per probe" to indicate that DIRECT does raw volatile dereferences (no seh_*
calls or SEH __try/__except usage) and therefore measures the non-SEH/unchecked
access path; apply this wording change in the comment blocks around the
DIRECT-path descriptions (the comment near the volatile dereference usage and
the corresponding second comment later in the file).

---

Nitpick comments:
In `@include/DetourModKit/memory.hpp`:
- Around line 393-396: The local std::array buffer (storage) must be initialized
at declaration to satisfy the project's rule against uninitialized variables;
change the two declarations of std::array<std::byte, sizeof(T)> storage; to
brace-initialize (e.g., std::array<std::byte, sizeof(T)> storage{};) so storage
is zero-initialized before calling seh_read_bytes and then
std::bit_cast<T>(storage) is safe; if you intentionally want to avoid the
zeroing for performance, instead leave it uninitialized but add a clear comment
next to the declaration documenting the deliberate omission and the full-write
contract guaranteed by seh_read_bytes and why that makes it safe.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 870dabb6-e0b6-42fa-bbc7-99c12c345413

📥 Commits

Reviewing files that changed from the base of the PR and between 2ac5ace and 4466296.

📒 Files selected for processing (7)
  • AGENTS.md
  • docs/analysis/memory_bench_v3.x/README.md
  • docs/misc/hot-path-memory.md
  • include/DetourModKit/memory.hpp
  • tests/CMakeLists.txt
  • tests/bench_memory.cpp
  • tests/test_memory_chain.cpp
✅ Files skipped from review due to trivial changes (2)
  • docs/misc/hot-path-memory.md
  • AGENTS.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/test_memory_chain.cpp
  • tests/CMakeLists.txt

@tkhquang tkhquang merged commit aa01500 into main May 29, 2026
2 checks passed
@tkhquang tkhquang deleted the feat/memory-pointer-chain branch May 29, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant