PR-D1 (ADR 0008 Phase D): remove ADR 0007 server-side dead code#49
Merged
FluffyAIcode merged 1 commit intoJun 2, 2026
Merged
Conversation
ADR 0008 \u00a76.4 PR-D1 originally proposed a coupled change \u2014 (a)
remove ADR-0007-vintage server dead code, (b) refactor the HTTP
shim's chat-completions handler onto SessionStore. Implementation
revealed (a) is a pure subtraction with no behavior dependence on
(b), so they're split (same pattern as PR-A3 / PR-A3b). This PR is
(a); (b) becomes PR-D2 (queued, not in this diff).
Net diff: -540 deletions, +78 insertions. ADR 0008 \u00a76.4 amended
to record the split.
Files modified \u2014 production:
inference_engine/server/engine.py
-12 lines: EngineResult.path_selection / .tokens_skipped /
.prefill_duration_seconds fields removed; SpeculativeEngine
no longer forwards them from SpeculativeRunResult.
inference_engine/server/metrics.py
-74 lines: path_selection_total, continuation_tokens_skipped_total,
verifier_prefill_duration_seconds, cache_invariant_violations_total
metrics removed from Metrics + factory; record_path_selection and
record_cache_invariant_violation methods removed.
inference_engine/server/app.py
-53 lines: _session_acceptance_rate and _emit_path_selection_metric
helpers removed; the two call sites in the streaming + non-
streaming completion paths now pass acceptance_rate=None to
record_completion. The OpenAI response loses its acceptance_rate
field as a result \u2014 acceptable on a feature-frozen deprecated
shim per ADR 0008 \u00a72.7. Migrate to gRPC for richer telemetry.
inference_engine/scheduler/session.py
-7 lines: engine_result field on Session removed. The scheduler
worker (scheduler.py) no longer stashes engine.generate()'s
result on the session \u2014 the only reader was app.py's removed
helpers.
inference_engine/scheduler/scheduler.py
\u00b13 lines (renamed assignment to del): the line that wrote
session.engine_result = result is gone.
scripts/bench_agentic/bench_long_session.py
-211/+30 lines: removed _PATH_SELECTION_METRIC /
_CONTINUATION_TOKENS_SKIPPED_METRIC / _CACHE_INVARIANT_VIOLATIONS_METRIC
constants, the labeled-line regex, the labeled-metric branch in
_parse_prom_text, the _extract_label helper (no callers after
PR-D1), the _adr_0007_summary aggregator, the adr_0007 payload
field, and the §2.10 block in render_summary.
Module docstring updated to point at PR-E1's
bench_session_long_run.py for the replacement bench.
Files modified \u2014 tests:
tests/inference_engine/server/test_metrics.py
-107 lines: 4 entries dropped from test_build_registers_all_documented_metrics
expected set; 9 tests removed from the 'ADR 0007 \u00a72.10 \u2014
path_selection observability' section.
tests/inference_engine/server/test_app_metrics_and_auth.py
-98 lines: 4 ADR-0007-specific tests removed
(test_metrics_path_selection_metrics_present_on_idle_metrics_scrape,
test_metrics_path_selection_recorded_after_completion,
test_session_acceptance_rate_returns_none_when_result_missing_rate,
test_emit_path_selection_metric_noop_when_path_unset).
Local verification (Linux VM, py3.12):
PYTHONPATH=.:sdks/python pytest <Linux CI gate set>
682 passed (was 695 - 13 ADR-0007-specific tests),
TOTAL 1660 stmts 100.00 % coverage (was 1694 - 34 dead stmts).
Per ADR 0008 \u00a79: this PR is pure deletion / cleanup on the Linux-
runnable surface. Zero MLX runtime code touched. \u00a79 carve-out
applies; no Mac M4 report needed.
Next PR after merge:
PR-D2 (\u00a76.4 amended, queued): HTTP-shim refactor onto SessionStore
proper. Each /v1/chat/completions request becomes a single-
shot session; PooledVerifier retires; Deprecation / Sunset
headers added per \u00a72.7.
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
cursor Bot
pushed a commit
that referenced
this pull request
Jun 1, 2026
Stacks on PR-D1 (#49). When this PR merges, PR-D1 lands along with it. Per ADR 0008 \u00a76.5, PR-E1 ships: 1. tests/integration/ \u2014 new test directory with pytest.mark.integration marker auto-applied by tests/integration/conftest.py. Every test in this directory is opt-in via 'pytest -m integration'; a bare pytest invocation skips them. 2. tests/integration/test_inv3_session_determinism_gate.py \u2014 the INV-3 byte-exact GA gate (ADR 0008 \u00a77 G3). Drives two independent SinkWindowVerifier instances (real Qwen3-0.6B weights) through identical history fed via different chunkings, asserts the resulting greedy token streams are byte-identical. Three tests: test_one_call_vs_two_calls_yield_byte_identical_tokens Minimal gate: 1\u00d710 vs 2\u00d75 chunking on a 10-token history, 12 tokens of greedy generation. test_chunking_invariance_across_three_splits Stronger version: 1\u00d720 / 3\u00d7medium / 10\u00d7small chunkings on a 20-token history, 8 tokens of greedy generation. Catches chunk-boundary bugs the 1-vs-2 case might miss (e.g., a bug that only triggers when a chunk crosses a sink+window trim boundary). test_repeated_runs_with_same_history_byte_identical Sanity: same workload run twice on the same verifier produces the same output. Greedy decoding has no legitimate source of nondeterminism. 3. pytest.ini \u2014 minimal new file registering the 'integration' marker so it doesn't trigger PytestUnknownMarkWarning. Tests opt in via 'pytest -m integration'. Replaces the deleted tests/core/test_determinism_gate.py (PR-A3 removed it together with verifier.path_select; per ADR 0008 \u00a76.6 PR-E1's replacement is in tests/integration/, not tests/core/, because integration is where Mac-M4-only GA gates belong per \u00a79). NOT in this PR (deferred): scripts/bench_agentic/bench_session_long_run.py \u00a76.5 also mentions a Mac M4 long-session bench using the gRPC SDK. Splitting that out as PR-E1b (or rolled into PR-E2's CI workflow PR) so PR-E1's diff stays focused on the GA gate. PR-E2: GitHub Actions self-hosted Mac M4 runner workflow that invokes 'pytest -m integration' on every PR labelled 'needs-mac-m4'. Until that lands, the gate is run manually via scripts/review_pr_e1_on_mac.sh. Mac M4 reviewer aid: scripts/review_pr_e1_on_mac.sh Runs 'pytest -m integration tests/integration/' and produces one JSON artifact (pr-e1-mac-integration-tests-<unix>.json) under results/platform-tests/. Same coverage-free pattern as review_pr_b3_on_mac.sh + commit 9d1a250 hotfixes already folded in. Local verification (Linux VM, py3.12): Linux CI gate (existing 8 paths): 682 passed, 100% coverage, UNCHANGED. tests/integration/ is not in the Linux paths. Integration suite collection: 3 tests collected from tests/integration/, marker auto-applied via conftest. Bare 'pytest' from repo root: would still pick up tests/integration/ if discovered, but the existing project convention is explicit test paths in CI; the marker is the safety net. Per ADR 0008 \u00a79: this PR ships the test suite that IS the GA gate. Linux CI gate does not exercise the integration tests (HF-cache- bound, real-model-numerics-dependent), so PR-E1's true validation happens on Mac M4. Reviewer pushes scripts/review_pr_e1_on_mac.sh's JSON output to the PR branch as evidence. Next PR after merge: PR-E2: GitHub Actions self-hosted Mac M4 runner workflow. PR-E1b (or rolled into PR-E2): bench_session_long_run.py. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This was referenced Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
ADR 0008 §6.4 PR-D1 originally proposed two coupled changes:
SessionStore.Implementation revealed (1) is a pure subtraction with no behavior dependence on (2), so they're split — same pattern as PR-A3 / PR-A3b. This PR is (1). PR-D2 (queued, not in this diff) is (2).
ADR 0008 §6.4 amended in this PR to record the split.
Net diff: −540 / +78
Files (production)
inference_engine/server/engine.pyEngineResult.path_selection / .tokens_skipped / .prefill_duration_secondsfields; SpeculativeEngine no longer forwards them.inference_engine/server/metrics.pypath_selection_total,continuation_tokens_skipped_total,verifier_prefill_duration_seconds,cache_invariant_violations_total); 2record_*methods.inference_engine/server/app.py_session_acceptance_rate+_emit_path_selection_metrichelpers; the two call sites passacceptance_rate=Nonenow. The OpenAI response loses itsacceptance_ratefield as a side effect — acceptable on a feature-frozen deprecated shim per §2.7.inference_engine/scheduler/session.pyengine_resultfield onSession.inference_engine/scheduler/scheduler.pysession.engine_result = resultassignment withdel result.scripts/bench_agentic/bench_long_session.py_adr_0007_summary+ render-summary block. Module docstring points at PR-E1'sbench_session_long_run.pyfor the replacement bench.Files (tests)
tests/inference_engine/server/test_metrics.pytest_build_registers_all_documented_metrics's expected set; 9 tests in the "ADR 0007 §2.10 — path_selection observability" section.tests/inference_engine/server/test_app_metrics_and_auth.pytest_metrics_path_selection_metrics_present_on_idle_metrics_scrape,test_metrics_path_selection_recorded_after_completion,test_session_acceptance_rate_returns_none_when_result_missing_rate,test_emit_path_selection_metric_noop_when_path_unset.Behavior changes visible to clients
/metricsno longer exposes 4 ADR-0007-vintage metrics. They were unreachable anyway after PR-A3 removed thepath_selectmachinery from the verifier.acceptance_ratefield. Migrate to the gRPCGeneratestream +GetSessionInfofor richer telemetry.SessionStoremigration.Linux verification
Per ADR 0008 §9
Pure deletion / cleanup on the Linux-runnable surface. Zero MLX runtime code touched. §9 last-paragraph carve-out invoked:
Linux-only path. No Mac M4 report needed.Reviewer checklist
grep -r "path_selection\|continuation_tokens_skipped\|verifier_prefill_duration\|cache_invariant_violations_total\|record_path_selection\|record_cache_invariant_violation\|_emit_path_selection_metric\|_session_acceptance_rate\|engine_result" inference_engine/ tests/ scripts/returns only docstring / comment hits, no live code.Next PR
PR-D2 (queued, not blocking v0.3 GA): the HTTP-shim refactor proper.
/v1/chat/completionsbecomes a single-shot session underSessionStore;PooledVerifierretires;Deprecation/Sunsetheaders added. Linux-only path; §9 carve-out continues to apply.After PR-D1 merges, PR-E1 (the integration suite + Mac M4 INV-3 determinism gate) can also start in parallel — they touch disjoint files.