From 58ff33c44b7b7fdd3786179a2a9969619a87a822 Mon Sep 17 00:00:00 2001 From: Yifeng He Date: Fri, 6 Feb 2026 15:32:49 -0800 Subject: [PATCH 1/3] feat: order-independent coverage via union-find in BKFrameMonitor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace greedy first-seen-wins dedup with connected-component counting using a union-find (disjoint-set) structure. Every distinct hash is now inserted into the BK-tree and unioned with all neighbours within RADIUS. coverage_count = number of connected components, which is fully order-independent: same hashes → same count regardless of insertion order. - Add _BKTree.find_all_within() for neighbour enumeration - Add _UnionFind with path splitting and union by rank - Add CoverageMonitor.coverage_count property (overridden by BKFrameMonitor) - Replace differential test with order-independence property test - Update monotonicity tests and docs --- AGENTS.md | 4 +- docs/design.md | 20 ++++++- src/gamecov/cov_base.py | 10 ++++ src/gamecov/frame_cov.py | 102 +++++++++++++++++++++++++++++---- tests/test_BK_frame_monitor.py | 56 ++++++++++++------ tests/test_monotone.py | 14 ++++- tests/test_monotone_smb.py | 9 ++- 7 files changed, 178 insertions(+), 37 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 58adb5e..54931df 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,7 +20,7 @@ Future metrics (e.g., audio coverage, state-graph coverage) will follow the same │ ├── cov_base.py # Abstract protocols: CoverageItem, Coverage, CoverageMonitor │ ├── frame.py # Frame dataclass (PIL Image wrapper with average-hash) │ ├── dedup.py # Deduplication algorithms (pHash, SSIM [deprecated]) -│ ├── frame_cov.py # FrameCoverage, FrameMonitor, BKFrameMonitor, BK-tree +│ ├── frame_cov.py # FrameCoverage, FrameMonitor, BKFrameMonitor, BK-tree, UnionFind │ ├── loader.py # MP4 loading: bulk, lazy (generator), last-n │ ├── writer.py # MP4 writing: imageio and OpenCV backends │ ├── stitch.py # Panorama stitching of unique frames @@ -59,7 +59,7 @@ See [docs/design.md](docs/design.md) for the coverage framework architecture, fr | `cov_base.py` | `CoverageItem`, `Coverage[T]`, `CoverageMonitor[T]` protocols/ABC | | `frame.py` | `Frame` dataclass (PIL Image + average-hash) | | `dedup.py` | `is_dup()`, `dedup_unique_frames()`, `dedup_unique_hashes()`, `ssim_dedup()` [deprecated] | -| `frame_cov.py` | `FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`, `get_frame_cov()` | +| `frame_cov.py` | `FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`, `get_frame_cov()`, `_UnionFind`, `_BKTree` | | `loader.py` | `load_mp4()`, `load_mp4_lazy()`, `load_mp4_last_n()` | | `writer.py` | `write_mp4()`, `write_mp4_cv2()` | | `stitch.py` | `stitch_images()` (panorama via AffineStitcher) | diff --git a/docs/design.md b/docs/design.md index 095eeaf..510c895 100644 --- a/docs/design.md +++ b/docs/design.md @@ -18,9 +18,10 @@ Frame coverage (`FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`) is the first concrete implementation. The framework is designed to support future metrics such as audio coverage or state-graph coverage with no changes to the monitoring interface. -### Key Invariant +### Key Invariants -Coverage is **monotonically non-decreasing** — `add_cov()` can only grow `item_seen`, never shrink it. This property is verified by the `test_monotone*` test suite. +- `len(item_seen)` is **monotonically non-decreasing** — `add_cov()` can only grow the set of distinct hashes, never shrink it. This property is verified by the `test_monotone*` test suite. +- `coverage_count` (connected-component count, `BKFrameMonitor` only) is **order-independent**: the same set of hashes always produces the same count regardless of insertion order. It may transiently decrease when a bridging hash merges two clusters. ## Frame Coverage @@ -74,6 +75,21 @@ The naive `FrameMonitor` checks each new hash against all previously seen hashes 2. The BK-tree stores these integers. Distances are computed with `(x ^ y).bit_count()` (popcount = Hamming distance). 3. On lookup, the triangle inequality prunes branches: for a query point *x* with radius *r* at a node with distance *d*, only children with keys in [d-r, d+r] need to be visited. +### Order-Independent Coverage via Union-Find + +The greedy first-seen-wins dedup in `FrameMonitor` is **order-dependent**: processing the same recordings in different orders can yield different coverage counts (because the "is duplicate" relation is not transitive). + +`BKFrameMonitor` solves this with a union-find (disjoint-set) structure: + +1. **Every** distinct hash is inserted into the BK-tree (no greedy skip). +2. On insertion, `find_all_within(x, radius)` locates all existing neighbours. +3. The new hash is unioned with every neighbour in the union-find. +4. `coverage_count` = number of connected components = number of disjoint clusters. + +Because the Hamming-distance graph depends only on which hashes exist (not insertion order), the connected-component count is fully order-independent. + +**Trade-off**: `coverage_count` may transiently *decrease* when a new hash bridges two previously separate components. `len(item_seen)` (total distinct hashes) remains monotonically non-decreasing. + ### Performance Benchmarked on the SMB dataset with `N_MAX=500` recordings: diff --git a/src/gamecov/cov_base.py b/src/gamecov/cov_base.py index 5240871..98a4c6a 100644 --- a/src/gamecov/cov_base.py +++ b/src/gamecov/cov_base.py @@ -48,6 +48,16 @@ def is_seen(self, cov: Coverage[T]) -> bool: def add_cov(self, cov: Coverage[T]) -> None: """Add a new execution coverage record to the monitor.""" + @property + def coverage_count(self) -> int: + """Number of unique coverage items. + + The default implementation returns ``len(self.item_seen)``. + Subclasses may override to provide order-independent metrics + (e.g., connected-component count via union-find). + """ + return len(self.item_seen) + def reset(self) -> None: """Reset the monitor state.""" self.path_seen.clear() diff --git a/src/gamecov/frame_cov.py b/src/gamecov/frame_cov.py index d01c775..8a2bcca 100644 --- a/src/gamecov/frame_cov.py +++ b/src/gamecov/frame_cov.py @@ -156,25 +156,90 @@ def any_within(self, x: int, r: int) -> bool: stack.append(child) return False + def find_all_within(self, x: int, r: int) -> list[int]: + """Return all values in the tree within Hamming distance r of x.""" + if self.root is None: + return [] + results: list[int] = [] + stack = [self.root] + while stack: + n = stack.pop() + d = (x ^ n.val).bit_count() + if d <= r: + results.append(n.val) + lo, hi = d - r, d + r + for dd, child in n.children.items(): + if lo <= dd <= hi: + stack.append(child) + return results + + +class _UnionFind: + """Disjoint-set (union-find) with path splitting and union by rank.""" + + def __init__(self) -> None: + self._parent: dict[int, int] = {} + self._rank: dict[int, int] = {} + self._count: int = 0 + + def make_set(self, x: int) -> None: + if x not in self._parent: + self._parent[x] = x + self._rank[x] = 0 + self._count += 1 + + def find(self, x: int) -> int: + while self._parent[x] != x: + self._parent[x] = self._parent[self._parent[x]] # path splitting + x = self._parent[x] + return x + + def union(self, a: int, b: int) -> None: + ra, rb = self.find(a), self.find(b) + if ra == rb: + return + if self._rank[ra] < self._rank[rb]: + ra, rb = rb, ra + self._parent[rb] = ra + if self._rank[ra] == self._rank[rb]: + self._rank[ra] += 1 + self._count -= 1 + + @property + def component_count(self) -> int: + return self._count + # > N_MAX=500 uv run pytest tests/test_monotone.py --durations=0 # 236.71s call tests/test_monotone.py::test_monotone # 186.90s call tests/test_monotone.py::test_monotone_BK class BKFrameMonitor(FrameMonitor): - """FrameMonitor implemented using BK Tree - For long videos with many frames, - this implementation speed up the process of checking frame coverage significantly. + """FrameMonitor backed by a BK-tree and union-find for order-independent coverage. + + Coverage is measured as the number of connected components in the + Hamming-distance neighbourhood graph (distance <= radius). Unlike the + greedy first-seen-wins approach, this metric is **order-independent**: + the same set of hashes always produces the same coverage count regardless + of insertion order. + + Note: ``coverage_count`` may transiently *decrease* when a newly inserted + hash bridges two previously separate components. ``len(item_seen)`` + (total distinct hashes) remains monotonically non-decreasing. """ def __init__(self, radius: int = RADIUS): super().__init__() self._bktree = _BKTree() self._exact_bytes: set[bytes] = set() + self._uf = _UnionFind() self.radius = radius def add_cov(self, cov: Coverage[ImageHash]) -> None: - """add coverage to the current set. - The deduplication by Hamming distance is managed by a BK-tree. + """Add coverage to the current set. + + Every distinct hash is inserted into the BK-tree and unioned with all + its neighbours within ``self.radius``. Coverage is the number of + connected components in the resulting union-find structure. """ self.path_seen.add(cov.path_id) for img_hash in cov.coverage: @@ -182,11 +247,28 @@ def add_cov(self, cov: Coverage[ImageHash]) -> None: np.asarray(img_hash.hash, dtype=np.uint8), bitorder="big", ).tobytes() - if hash_bytes in self._exact_bytes: # exact dup + if hash_bytes in self._exact_bytes: continue x = int.from_bytes(hash_bytes, "big") - if not self._bktree.any_within(x, self.radius): # prune most candidates - self._bktree.add(x) - self._exact_bytes.add(hash_bytes) - self.item_seen.add(img_hash) + neighbors = self._bktree.find_all_within(x, self.radius) + + self._uf.make_set(x) + for nb in neighbors: + self._uf.union(x, nb) + + self._bktree.add(x) + self._exact_bytes.add(hash_bytes) + self.item_seen.add(img_hash) + + @property + def coverage_count(self) -> int: + """Order-independent coverage: number of connected components.""" + return self._uf.component_count + + def reset(self) -> None: + """Reset all monitor state including BK-tree and union-find.""" + super().reset() + self._bktree = _BKTree() + self._exact_bytes.clear() + self._uf = _UnionFind() diff --git a/tests/test_BK_frame_monitor.py b/tests/test_BK_frame_monitor.py index 25596e4..d8167aa 100644 --- a/tests/test_BK_frame_monitor.py +++ b/tests/test_BK_frame_monitor.py @@ -1,36 +1,58 @@ -import tempfile import os +import random +import tempfile from hypothesis import given, settings, strategies as st -from gamecov import FrameCoverage, FrameMonitor, BKFrameMonitor +from gamecov import FrameCoverage, BKFrameMonitor import gamecov.generator as cg from gamecov.writer import write_mp4 @settings(deadline=None) -@given(data=st.data(), n=st.integers(min_value=1, max_value=100)) -def test_monitor_diff(data, n): - base_monitor = FrameMonitor() - bk_monitor = BKFrameMonitor() - created_files = [] +@given(data=st.data(), n=st.integers(min_value=1, max_value=30)) +def test_order_independent_coverage(data, n): + """BKFrameMonitor.coverage_count must be the same regardless of insertion order.""" + created_files: list[str] = [] + covs: list[FrameCoverage] = [] + for _ in range(n): frames = data.draw(cg.frames_lists) - with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp_f: output_path = tmp_f.name created_files.append(output_path) write_mp4(frames, output_path) - cov = FrameCoverage(output_path) - if not base_monitor.is_seen(cov): - base_monitor.add_cov(cov) - if not bk_monitor.is_seen(cov): - bk_monitor.add_cov(cov) + covs.append(FrameCoverage(output_path)) + + # Process in original order + monitor_a = BKFrameMonitor() + for cov in covs: + if not monitor_a.is_seen(cov): + monitor_a.add_cov(cov) + + # Process in reversed order + monitor_b = BKFrameMonitor() + for cov in reversed(covs): + if not monitor_b.is_seen(cov): + monitor_b.add_cov(cov) + + # Process in a random shuffle + shuffled = list(covs) + random.shuffle(shuffled) + monitor_c = BKFrameMonitor() + for cov in shuffled: + if not monitor_c.is_seen(cov): + monitor_c.add_cov(cov) - assert len(base_monitor.item_seen) == len( - bk_monitor.item_seen - ), "Base and BK monitors should see the same number of items" + assert monitor_a.coverage_count == monitor_b.coverage_count, ( + "coverage_count should be order-independent (original vs reversed)" + ) + assert monitor_a.coverage_count == monitor_c.coverage_count, ( + "coverage_count should be order-independent (original vs shuffled)" + ) + # Total distinct hashes should also agree (set union is order-independent) + assert len(monitor_a.item_seen) == len(monitor_b.item_seen) + assert len(monitor_a.item_seen) == len(monitor_c.item_seen) - # Clean up temporary files for f in created_files: os.remove(f) diff --git a/tests/test_monotone.py b/tests/test_monotone.py index cde47c7..b2e66b5 100644 --- a/tests/test_monotone.py +++ b/tests/test_monotone.py @@ -45,8 +45,14 @@ def test_monotone(data, n): ) @given(data=st.data(), n=st.integers(min_value=1, max_value=N_MAX)) def test_monotone_BK(data, n): + """len(item_seen) (total distinct hashes) is always monotonic. + + Note: monitor.coverage_count (connected components) may decrease when + a bridging hash merges two clusters. That is correct semantics for + order-independent coverage and is NOT tested for monotonicity here. + """ monitor = BKFrameMonitor() - prev_cov = 0 + prev_item_count = 0 created_files = [] for _ in range(n): frames = data.draw(cg.frames_lists) @@ -59,8 +65,10 @@ def test_monotone_BK(data, n): if not monitor.is_seen(cov): monitor.add_cov(cov) - assert len(monitor.item_seen) >= prev_cov, "Coverage should not decrease" - prev_cov = len(monitor.item_seen) + assert len(monitor.item_seen) >= prev_item_count, ( + "item_seen count should not decrease" + ) + prev_item_count = len(monitor.item_seen) # Clean up temporary files for f in created_files: diff --git a/tests/test_monotone_smb.py b/tests/test_monotone_smb.py index 7a5a81b..58bfed6 100644 --- a/tests/test_monotone_smb.py +++ b/tests/test_monotone_smb.py @@ -4,6 +4,7 @@ def test_smb_monotone_BK(): + """item_seen count is monotonic; coverage_count (components) may dip on bridges.""" assets_dir = os.path.abspath("assets/smb") if not os.path.exists(assets_dir): @@ -15,7 +16,7 @@ def test_smb_monotone_BK(): mp4_files.sort() monitor = BKFrameMonitor() - prev_cov = 0 + prev_item_count = 0 for f in mp4_files: f = os.path.join(assets_dir, f) @@ -23,8 +24,10 @@ def test_smb_monotone_BK(): if not monitor.is_seen(cov): monitor.add_cov(cov) - assert len(monitor.item_seen) >= prev_cov, "Coverage should not decrease" - prev_cov = len(monitor.item_seen) + assert len(monitor.item_seen) >= prev_item_count, ( + "item_seen count should not decrease" + ) + prev_item_count = len(monitor.item_seen) def test_smb_monotone(): From 3bb930b48c23230871ec1bed38524e6105b79425 Mon Sep 17 00:00:00 2001 From: Yifeng He Date: Fri, 6 Feb 2026 15:50:10 -0800 Subject: [PATCH 2/3] doc: add justification --- docs/design.md | 114 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 96 insertions(+), 18 deletions(-) diff --git a/docs/design.md b/docs/design.md index 510c895..2be46a3 100644 --- a/docs/design.md +++ b/docs/design.md @@ -8,13 +8,13 @@ - **`CoverageItem`** — anything hashable and stringable. All coverage data points must satisfy this protocol. - **`Coverage[T]`** — an execution trace exposing: - - `.trace` — ordered list of all items encountered. - - `.coverage` — deduplicated set of unique items. - - `.path_id` — SHA1 fingerprint of the unique coverage set. + - `.trace` — ordered list of all items encountered. + - `.coverage` — deduplicated set of unique items. + - `.path_id` — SHA1 fingerprint of the unique coverage set. - **`CoverageMonitor[T]`** — accumulates coverage across sessions: - - `.add_cov(cov)` — merge new coverage into the monitor. - - `.is_seen(cov)` — check whether a path has already been recorded. - - `.item_seen` / `.path_seen` — running totals. + - `.add_cov(cov)` — merge new coverage into the monitor. + - `.is_seen(cov)` — check whether a path has already been recorded. + - `.item_seen` / `.path_seen` — running totals. Frame coverage (`FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`) is the first concrete implementation. The framework is designed to support future metrics such as audio coverage or state-graph coverage with no changes to the monitoring interface. @@ -23,6 +23,85 @@ Frame coverage (`FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`) is the first - `len(item_seen)` is **monotonically non-decreasing** — `add_cov()` can only grow the set of distinct hashes, never shrink it. This property is verified by the `test_monotone*` test suite. - `coverage_count` (connected-component count, `BKFrameMonitor` only) is **order-independent**: the same set of hashes always produces the same count regardless of insertion order. It may transiently decrease when a bridging hash merges two clusters. +## Why Frame Coverage Is a Valid Fuzzing Metric + +A useful fuzzing coverage metric must satisfy two properties: + +1. **Monotonicity** — coverage never decreases as more inputs are observed. A fuzzer can safely interpret "coverage stopped growing" as saturation. +2. **Order-independence** — the final coverage value depends only on *which* inputs were observed, not *when*. This makes coverage comparable across runs with different scheduling strategies. + +`gamecov` provides two monitors. We justify each property for both. + +### Definitions + +Let *H* = {h₁, h₂, …} be the universe of pHash values (64-bit vectors). +Define the **neighbourhood graph** G(S) for a set S ⊆ H as: + +- Vertices: S +- Edges: {(a, b) | hamming(a, b) ≤ RADIUS} + +Hamming distance is a **metric** (non-negative, symmetric, zero iff equal, and satisfies the triangle inequality). This makes G(S) a well-defined undirected graph for any S. + +### Monotonicity + +**`item_seen` (both monitors).** `add_cov` only ever *inserts* hashes into `item_seen`; it never removes them. Therefore `|item_seen|` is monotonically non-decreasing across successive `add_cov` calls. + +*Proof.* Each call iterates over `cov.coverage`. A hash is added to `item_seen` if and only if it was not already present (the exact-duplicate check short-circuits). No code path removes elements from the set. ∎ + +This is the metric used by `FrameMonitor.coverage_count` (which returns `len(item_seen)`). It is also monotonic in `BKFrameMonitor`, but `BKFrameMonitor` uses a different primary metric (see below). + +### Order-Independence (`BKFrameMonitor`) + +**`coverage_count` = number of connected components of G(item_seen).** + +*Claim.* For any fixed set S of hashes, the connected-component count cc(G(S)) is uniquely determined by S, regardless of the order in which the hashes were inserted. + +*Proof.* The graph G(S) is defined purely by the set S and the distance predicate hamming(a, b) ≤ RADIUS. Neither depends on insertion order. The number of connected components is a property of the graph, not of how it was constructed. + +The implementation maintains this invariant incrementally via union-find: + +1. When a new hash x is inserted, `find_all_within(x, RADIUS)` returns **all** existing hashes within the radius (not just the first match). +2. x is unioned with every such neighbour. After the union step, any path of radius-edges connecting x to any existing component is faithfully captured. +3. Because union-find tracks *all* edges, not just first-seen ones, the resulting component structure is identical to computing cc(G(S)) from scratch. + +This is verified empirically by `test_order_independent_coverage`, which asserts identical `coverage_count` across original, reversed, and randomly shuffled insertion orders. ∎ + +**Why `FrameMonitor` is order-dependent.** The greedy first-seen-wins dedup skips a hash if *any* existing hash is within RADIUS. Because the "within RADIUS" relation is **not transitive** (a is near b, b is near c, but a may not be near c), the set of retained hashes depends on which hash was encountered first. Different orderings can yield different retained sets and therefore different counts. + +### Non-Monotonicity of `coverage_count` Is Expected + +`BKFrameMonitor.coverage_count` may *decrease* when a new hash bridges two previously separate components. For example: + +``` +Before: {A} {B} (2 components, hamming(A,B) > RADIUS) +Insert C where hamming(A,C) ≤ RADIUS and hamming(B,C) ≤ RADIUS +After: {A, B, C} (1 component) +``` + +This is correct: the new hash genuinely reduces the number of distinct visual clusters. In a fuzzing context, `coverage_count` decreasing means the fuzzer discovered that two previously-distinct regions are actually connected — this is valuable information, not a metric error. A fuzzer should track `coverage_count` (clusters explored) alongside `len(item_seen)` (total distinct observations) and use both signals. + +### Cross-Run Comparability + +Because `coverage_count` depends only on the set of observed hashes: + +- Two fuzzing campaigns over the same game can be directly compared: the campaign with more connected components explored more visually distinct regions. +- Merging coverage from two campaigns is straightforward: take the union of their hash sets and recompute components. The result equals what a single campaign observing all those hashes would report. +- The metric is **idempotent**: adding a recording that contributes no new hashes changes nothing. + +These properties make `BKFrameMonitor.coverage_count` suitable as a fuzzing progress metric analogous to edge coverage in traditional software fuzzing. + +### Summary of Metric Properties + +| Property | `FrameMonitor` (`len(item_seen)`) | `BKFrameMonitor` (`coverage_count`) | +|----------|-----------------------------------|-------------------------------------| +| Monotonic | Yes | No (may decrease on bridge) | +| Order-independent | No (greedy first-seen-wins) | Yes (graph-theoretic) | +| Cross-run comparable | No | Yes | +| Mergeable | No | Yes (set union) | +| Idempotent | Yes | Yes | + +`BKFrameMonitor` is the recommended monitor for production fuzzing. `FrameMonitor` remains available for backward compatibility and as a simpler baseline. + ## Frame Coverage ### Perceptual Hashing @@ -57,23 +136,23 @@ Coverage statistics (.item_seen, .path_seen) ### Loading Strategies -| Function | Behavior | Use Case | -|----------|----------|----------| -| `load_mp4()` | Decode all frames into memory | Small videos, random access needed | -| `load_mp4_lazy()` | Generator, one frame at a time | Large videos, memory-constrained | -| `load_mp4_last_n()` | Seek + decode last *n* frames | Tail sampling | +| Function | Behavior | Use Case | +| ------------------- | ------------------------------ | ---------------------------------- | +| `load_mp4()` | Decode all frames into memory | Small videos, random access needed | +| `load_mp4_lazy()` | Generator, one frame at a time | Large videos, memory-constrained | +| `load_mp4_last_n()` | Seek + decode last _n_ frames | Tail sampling | All loaders use `imageio.v3` with the PyAV plugin. ## BK-Tree Optimization -The naive `FrameMonitor` checks each new hash against all previously seen hashes — O(N*M) per session. `BKFrameMonitor` uses a [Burkhard-Keller tree](https://en.wikipedia.org/wiki/BK-tree) that indexes hashes by Hamming distance in a metric space. +The naive `FrameMonitor` checks each new hash against all previously seen hashes — O(N\*M) per session. `BKFrameMonitor` uses a [Burkhard-Keller tree](https://en.wikipedia.org/wiki/BK-tree) that indexes hashes by Hamming distance in a metric space. ### How it works 1. Each image hash is packed into an integer via `numpy.packbits`. 2. The BK-tree stores these integers. Distances are computed with `(x ^ y).bit_count()` (popcount = Hamming distance). -3. On lookup, the triangle inequality prunes branches: for a query point *x* with radius *r* at a node with distance *d*, only children with keys in [d-r, d+r] need to be visited. +3. On lookup, the triangle inequality prunes branches: for a query point _x_ with radius _r_ at a node with distance _d_, only children with keys in [d-r, d+r] need to be visited. ### Order-Independent Coverage via Union-Find @@ -88,15 +167,15 @@ The greedy first-seen-wins dedup in `FrameMonitor` is **order-dependent**: proce Because the Hamming-distance graph depends only on which hashes exist (not insertion order), the connected-component count is fully order-independent. -**Trade-off**: `coverage_count` may transiently *decrease* when a new hash bridges two previously separate components. `len(item_seen)` (total distinct hashes) remains monotonically non-decreasing. +**Trade-off**: `coverage_count` may transiently _decrease_ when a new hash bridges two previously separate components. `len(item_seen)` (total distinct hashes) remains monotonically non-decreasing. ### Performance Benchmarked on the SMB dataset with `N_MAX=500` recordings: -| Monitor | Time | -|---------|------| -| `FrameMonitor` | ~237s | +| Monitor | Time | +| ---------------- | ----- | +| `FrameMonitor` | ~237s | | `BKFrameMonitor` | ~187s | ~21% speedup, and the gap widens as the number of accumulated hashes grows. @@ -112,7 +191,6 @@ Benchmarked on the SMB dataset with `N_MAX=500` recordings: | `RADIUS` | `5` | Hamming distance threshold for frame deduplication | | `N_MAX` | `100` | Max recordings to process in monotonicity tests | - ## Dependencies | Library | Purpose | From 8073160ffb6c582c80d1553979b20d6b66014a53 Mon Sep 17 00:00:00 2001 From: Yifeng He Date: Fri, 6 Feb 2026 15:50:49 -0800 Subject: [PATCH 3/3] doc: rename to frame_cov.md --- docs/{design.md => frame_cov.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/{design.md => frame_cov.md} (100%) diff --git a/docs/design.md b/docs/frame_cov.md similarity index 100% rename from docs/design.md rename to docs/frame_cov.md