PR-D1 (ADR 0008 Phase D): remove ADR 0007 server-side dead code by FluffyAIcode · Pull Request #49 · FluffyAIcode/Kakeya-LLM-Inference-engine

FluffyAIcode · 2026-06-01T15:47:07Z

What

ADR 0008 §6.4 PR-D1 originally proposed two coupled changes:

Remove ADR-0007-vintage server dead code.
Refactor the HTTP shim onto SessionStore.

Implementation revealed (1) is a pure subtraction with no behavior dependence on (2), so they're split — same pattern as PR-A3 / PR-A3b. This PR is (1). PR-D2 (queued, not in this diff) is (2).

ADR 0008 §6.4 amended in this PR to record the split.

Net diff: −540 / +78

9 files changed, 78 insertions(+), 540 deletions(-)

Files (production)

File	Δ	What was removed
`inference_engine/server/engine.py`	−12	`EngineResult.path_selection / .tokens_skipped / .prefill_duration_seconds` fields; SpeculativeEngine no longer forwards them.
`inference_engine/server/metrics.py`	−74	4 metrics (`path_selection_total`, `continuation_tokens_skipped_total`, `verifier_prefill_duration_seconds`, `cache_invariant_violations_total`); 2 `record_*` methods.
`inference_engine/server/app.py`	−53	`_session_acceptance_rate` + `_emit_path_selection_metric` helpers; the two call sites pass `acceptance_rate=None` now. The OpenAI response loses its `acceptance_rate` field as a side effect — acceptable on a feature-frozen deprecated shim per §2.7.
`inference_engine/scheduler/session.py`	−7	`engine_result` field on `Session`.
`inference_engine/scheduler/scheduler.py`	±3	Replaced `session.engine_result = result` assignment with `del result`.
`scripts/bench_agentic/bench_long_session.py`	−211/+30	All path-selection scrape logic + `_adr_0007_summary` + render-summary block. Module docstring points at PR-E1's `bench_session_long_run.py` for the replacement bench.

Files (tests)

File	Δ	What was removed
`tests/inference_engine/server/test_metrics.py`	−107	4 entries from `test_build_registers_all_documented_metrics`'s expected set; 9 tests in the "ADR 0007 §2.10 — path_selection observability" section.
`tests/inference_engine/server/test_app_metrics_and_auth.py`	−98	4 tests: `test_metrics_path_selection_metrics_present_on_idle_metrics_scrape`, `test_metrics_path_selection_recorded_after_completion`, `test_session_acceptance_rate_returns_none_when_result_missing_rate`, `test_emit_path_selection_metric_noop_when_path_unset`.

Behavior changes visible to clients

/metrics no longer exposes 4 ADR-0007-vintage metrics. They were unreachable anyway after PR-A3 removed the path_select machinery from the verifier.
OpenAI chat-completions response no longer includes the acceptance_rate field. Migrate to the gRPC Generate stream + GetSessionInfo for richer telemetry.
The deprecated HTTP shim still works exactly as before in every other respect; PR-D2 is what does the real SessionStore migration.

Linux verification

PYTHONPATH=.:sdks/python pytest <Linux CI gate set>
682 passed (was 695 - 13 ADR-0007-specific tests removed)
TOTAL  1660 stmts  100.00 % coverage (was 1694 - 34 dead stmts removed)

Per ADR 0008 §9

Pure deletion / cleanup on the Linux-runnable surface. Zero MLX runtime code touched. §9 last-paragraph carve-out invoked: Linux-only path. No Mac M4 report needed.

Reviewer checklist

Every ADR-0007 metric is gone: grep -r "path_selection\|continuation_tokens_skipped\|verifier_prefill_duration\|cache_invariant_violations_total\|record_path_selection\|record_cache_invariant_violation\|_emit_path_selection_metric\|_session_acceptance_rate\|engine_result" inference_engine/ tests/ scripts/ returns only docstring / comment hits, no live code.
§6.4 amendment correctly records the PR-D1 / PR-D2 split.
No new test depends on the removed fields: 13 tests deleted, no replacements added (that's PR-D2's job).
Existing 461-test integration suite still passes (now 682 after the deletions; the suite has grown since PR-D1's original spec was written, all post-A2 / post-B1-B5 additions are intact).

Next PR

PR-D2 (queued, not blocking v0.3 GA): the HTTP-shim refactor proper. /v1/chat/completions becomes a single-shot session under SessionStore; PooledVerifier retires; Deprecation / Sunset headers added. Linux-only path; §9 carve-out continues to apply.

After PR-D1 merges, PR-E1 (the integration suite + Mac M4 INV-3 determinism gate) can also start in parallel — they touch disjoint files.

ADR 0008 \u00a76.4 PR-D1 originally proposed a coupled change \u2014 (a) remove ADR-0007-vintage server dead code, (b) refactor the HTTP shim's chat-completions handler onto SessionStore. Implementation revealed (a) is a pure subtraction with no behavior dependence on (b), so they're split (same pattern as PR-A3 / PR-A3b). This PR is (a); (b) becomes PR-D2 (queued, not in this diff). Net diff: -540 deletions, +78 insertions. ADR 0008 \u00a76.4 amended to record the split. Files modified \u2014 production: inference_engine/server/engine.py -12 lines: EngineResult.path_selection / .tokens_skipped / .prefill_duration_seconds fields removed; SpeculativeEngine no longer forwards them from SpeculativeRunResult. inference_engine/server/metrics.py -74 lines: path_selection_total, continuation_tokens_skipped_total, verifier_prefill_duration_seconds, cache_invariant_violations_total metrics removed from Metrics + factory; record_path_selection and record_cache_invariant_violation methods removed. inference_engine/server/app.py -53 lines: _session_acceptance_rate and _emit_path_selection_metric helpers removed; the two call sites in the streaming + non- streaming completion paths now pass acceptance_rate=None to record_completion. The OpenAI response loses its acceptance_rate field as a result \u2014 acceptable on a feature-frozen deprecated shim per ADR 0008 \u00a72.7. Migrate to gRPC for richer telemetry. inference_engine/scheduler/session.py -7 lines: engine_result field on Session removed. The scheduler worker (scheduler.py) no longer stashes engine.generate()'s result on the session \u2014 the only reader was app.py's removed helpers. inference_engine/scheduler/scheduler.py \u00b13 lines (renamed assignment to del): the line that wrote session.engine_result = result is gone. scripts/bench_agentic/bench_long_session.py -211/+30 lines: removed _PATH_SELECTION_METRIC / _CONTINUATION_TOKENS_SKIPPED_METRIC / _CACHE_INVARIANT_VIOLATIONS_METRIC constants, the labeled-line regex, the labeled-metric branch in _parse_prom_text, the _extract_label helper (no callers after PR-D1), the _adr_0007_summary aggregator, the adr_0007 payload field, and the §2.10 block in render_summary. Module docstring updated to point at PR-E1's bench_session_long_run.py for the replacement bench. Files modified \u2014 tests: tests/inference_engine/server/test_metrics.py -107 lines: 4 entries dropped from test_build_registers_all_documented_metrics expected set; 9 tests removed from the 'ADR 0007 \u00a72.10 \u2014 path_selection observability' section. tests/inference_engine/server/test_app_metrics_and_auth.py -98 lines: 4 ADR-0007-specific tests removed (test_metrics_path_selection_metrics_present_on_idle_metrics_scrape, test_metrics_path_selection_recorded_after_completion, test_session_acceptance_rate_returns_none_when_result_missing_rate, test_emit_path_selection_metric_noop_when_path_unset). Local verification (Linux VM, py3.12): PYTHONPATH=.:sdks/python pytest <Linux CI gate set> 682 passed (was 695 - 13 ADR-0007-specific tests), TOTAL 1660 stmts 100.00 % coverage (was 1694 - 34 dead stmts). Per ADR 0008 \u00a79: this PR is pure deletion / cleanup on the Linux- runnable surface. Zero MLX runtime code touched. \u00a79 carve-out applies; no Mac M4 report needed. Next PR after merge: PR-D2 (\u00a76.4 amended, queued): HTTP-shim refactor onto SessionStore proper. Each /v1/chat/completions request becomes a single- shot session; PooledVerifier retires; Deprecation / Sunset headers added per \u00a72.7. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

Stacks on PR-D1 (#49). When this PR merges, PR-D1 lands along with it. Per ADR 0008 \u00a76.5, PR-E1 ships: 1. tests/integration/ \u2014 new test directory with pytest.mark.integration marker auto-applied by tests/integration/conftest.py. Every test in this directory is opt-in via 'pytest -m integration'; a bare pytest invocation skips them. 2. tests/integration/test_inv3_session_determinism_gate.py \u2014 the INV-3 byte-exact GA gate (ADR 0008 \u00a77 G3). Drives two independent SinkWindowVerifier instances (real Qwen3-0.6B weights) through identical history fed via different chunkings, asserts the resulting greedy token streams are byte-identical. Three tests: test_one_call_vs_two_calls_yield_byte_identical_tokens Minimal gate: 1\u00d710 vs 2\u00d75 chunking on a 10-token history, 12 tokens of greedy generation. test_chunking_invariance_across_three_splits Stronger version: 1\u00d720 / 3\u00d7medium / 10\u00d7small chunkings on a 20-token history, 8 tokens of greedy generation. Catches chunk-boundary bugs the 1-vs-2 case might miss (e.g., a bug that only triggers when a chunk crosses a sink+window trim boundary). test_repeated_runs_with_same_history_byte_identical Sanity: same workload run twice on the same verifier produces the same output. Greedy decoding has no legitimate source of nondeterminism. 3. pytest.ini \u2014 minimal new file registering the 'integration' marker so it doesn't trigger PytestUnknownMarkWarning. Tests opt in via 'pytest -m integration'. Replaces the deleted tests/core/test_determinism_gate.py (PR-A3 removed it together with verifier.path_select; per ADR 0008 \u00a76.6 PR-E1's replacement is in tests/integration/, not tests/core/, because integration is where Mac-M4-only GA gates belong per \u00a79). NOT in this PR (deferred): scripts/bench_agentic/bench_session_long_run.py \u00a76.5 also mentions a Mac M4 long-session bench using the gRPC SDK. Splitting that out as PR-E1b (or rolled into PR-E2's CI workflow PR) so PR-E1's diff stays focused on the GA gate. PR-E2: GitHub Actions self-hosted Mac M4 runner workflow that invokes 'pytest -m integration' on every PR labelled 'needs-mac-m4'. Until that lands, the gate is run manually via scripts/review_pr_e1_on_mac.sh. Mac M4 reviewer aid: scripts/review_pr_e1_on_mac.sh Runs 'pytest -m integration tests/integration/' and produces one JSON artifact (pr-e1-mac-integration-tests-<unix>.json) under results/platform-tests/. Same coverage-free pattern as review_pr_b3_on_mac.sh + commit 9d1a250 hotfixes already folded in. Local verification (Linux VM, py3.12): Linux CI gate (existing 8 paths): 682 passed, 100% coverage, UNCHANGED. tests/integration/ is not in the Linux paths. Integration suite collection: 3 tests collected from tests/integration/, marker auto-applied via conftest. Bare 'pytest' from repo root: would still pick up tests/integration/ if discovered, but the existing project convention is explicit test paths in CI; the marker is the safety net. Per ADR 0008 \u00a79: this PR ships the test suite that IS the GA gate. Linux CI gate does not exercise the integration tests (HF-cache- bound, real-model-numerics-dependent), so PR-E1's true validation happens on Mac M4. Reviewer pushes scripts/review_pr_e1_on_mac.sh's JSON output to the PR branch as evidence. Next PR after merge: PR-E2: GitHub Actions self-hosted Mac M4 runner workflow. PR-E1b (or rolled into PR-E2): bench_session_long_run.py. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

FluffyAIcode mentioned this pull request Jun 1, 2026

PR-E1 (ADR 0008 §6.5): integration suite + INV-3 byte-exact GA gate #50

Merged

This was referenced Jun 1, 2026

PR-E1b (ADR 0008 §6.5): gRPC long-session bench + server CLI #51

Merged

PR-E1c: fix kv_live_bytes reporting path #52

Merged

FluffyAIcode marked this pull request as ready for review June 2, 2026 04:02

FluffyAIcode merged commit d12af68 into main Jun 2, 2026
6 checks passed

FluffyAIcode deleted the AgentMemory/v030-pr-d1-remove-adr-0007-server-deadcode-8e7f branch June 2, 2026 04:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR-D1 (ADR 0008 Phase D): remove ADR 0007 server-side dead code#49

PR-D1 (ADR 0008 Phase D): remove ADR 0007 server-side dead code#49
FluffyAIcode merged 1 commit into
mainfrom
AgentMemory/v030-pr-d1-remove-adr-0007-server-deadcode-8e7f

FluffyAIcode commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Jun 1, 2026

What

Net diff: −540 / +78

Files (production)

Files (tests)

Behavior changes visible to clients

Linux verification

Per ADR 0008 §9

Reviewer checklist

Next PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants