fix(bench): correct matrix-harness reuse measurement, thread guard, resume#189
Merged
Conversation
…esume Four issues blocked a real ollama-vs-squish matrix run: - Reuse mismeasured. This Ollama build reports the full prompt_eval_count even when the KV prefix is reused, and Squish's reuse counters miss the prefix-slot path. Both engines now fall back to the prefill-time collapse (1 - warm/cold) against a cold-prefill reference measured per system+ctx, keeping the head-to-head apples-to-apples. (cache_probe.py, cell.py) - Thread crash. RSSSampler/TemperatureSampler shadowed a Thread internal with self._stop; renamed to _stop_event. (memory.py, thermal.py) - Governor false-positive. "compressed"/"swap" matched benign "free_swap=0 B" log lines; narrowed to "swapping"/"paged out". (memory.py) - Squish metrics URL was /metrics; it is served at /v1/metrics. (systems.py) Also adds --resume to run_matrix (per-cell JSON skip -> crash-safe) and two unit tests for the Ollama prefill-ratio fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The ollama-vs-squish benchmark matrix (
benchmarks/ollama_vs_squish/matrix/) had four issues that blocked a real run. All four are harness-only — no change tosquish/runtime.Fixes
prompt_eval_counteven when the KV prefix is reused, and Squish's reuse counters miss the prefix-slot path. Both engines now fall back to the prefill-time collapse1 - warm/coldagainst a cold-prefill reference measured per(system, ctx), keeping the head-to-head apples-to-apples. A sentinel run-index (10_000_000) ensures the cold reference shares no cache prefix with the primed block. (cache_probe.py,cell.py)RSSSampler/TemperatureSamplershadowed athreading.Threadinternal withself._stop; renamed to_stop_event. (memory.py,thermal.py)"compressed"/"swap"matched benign"free_swap=0 B"log lines and falsely flagged degradation; narrowed to"swapping"/"paged out". (memory.py)/metrics; it is served at/v1/metrics. (systems.py)Also
--resume <dir>forrun_matrix: cells whose<cell_id>.jsonalready exists are loaded and skipped → crash-safe / overnight-resumable.Test
🤖 Generated with Claude Code