Skip to content

PR-D1 (ADR 0008 Phase D): remove ADR 0007 server-side dead code#49

Merged
FluffyAIcode merged 1 commit into
mainfrom
AgentMemory/v030-pr-d1-remove-adr-0007-server-deadcode-8e7f
Jun 2, 2026
Merged

PR-D1 (ADR 0008 Phase D): remove ADR 0007 server-side dead code#49
FluffyAIcode merged 1 commit into
mainfrom
AgentMemory/v030-pr-d1-remove-adr-0007-server-deadcode-8e7f

Conversation

@FluffyAIcode
Copy link
Copy Markdown
Owner

What

ADR 0008 §6.4 PR-D1 originally proposed two coupled changes:

  1. Remove ADR-0007-vintage server dead code.
  2. Refactor the HTTP shim onto SessionStore.

Implementation revealed (1) is a pure subtraction with no behavior dependence on (2), so they're split — same pattern as PR-A3 / PR-A3b. This PR is (1). PR-D2 (queued, not in this diff) is (2).

ADR 0008 §6.4 amended in this PR to record the split.

Net diff: −540 / +78

9 files changed, 78 insertions(+), 540 deletions(-)

Files (production)

File Δ What was removed
inference_engine/server/engine.py −12 EngineResult.path_selection / .tokens_skipped / .prefill_duration_seconds fields; SpeculativeEngine no longer forwards them.
inference_engine/server/metrics.py −74 4 metrics (path_selection_total, continuation_tokens_skipped_total, verifier_prefill_duration_seconds, cache_invariant_violations_total); 2 record_* methods.
inference_engine/server/app.py −53 _session_acceptance_rate + _emit_path_selection_metric helpers; the two call sites pass acceptance_rate=None now. The OpenAI response loses its acceptance_rate field as a side effect — acceptable on a feature-frozen deprecated shim per §2.7.
inference_engine/scheduler/session.py −7 engine_result field on Session.
inference_engine/scheduler/scheduler.py ±3 Replaced session.engine_result = result assignment with del result.
scripts/bench_agentic/bench_long_session.py −211/+30 All path-selection scrape logic + _adr_0007_summary + render-summary block. Module docstring points at PR-E1's bench_session_long_run.py for the replacement bench.

Files (tests)

File Δ What was removed
tests/inference_engine/server/test_metrics.py −107 4 entries from test_build_registers_all_documented_metrics's expected set; 9 tests in the "ADR 0007 §2.10 — path_selection observability" section.
tests/inference_engine/server/test_app_metrics_and_auth.py −98 4 tests: test_metrics_path_selection_metrics_present_on_idle_metrics_scrape, test_metrics_path_selection_recorded_after_completion, test_session_acceptance_rate_returns_none_when_result_missing_rate, test_emit_path_selection_metric_noop_when_path_unset.

Behavior changes visible to clients

  • /metrics no longer exposes 4 ADR-0007-vintage metrics. They were unreachable anyway after PR-A3 removed the path_select machinery from the verifier.
  • OpenAI chat-completions response no longer includes the acceptance_rate field. Migrate to the gRPC Generate stream + GetSessionInfo for richer telemetry.
  • The deprecated HTTP shim still works exactly as before in every other respect; PR-D2 is what does the real SessionStore migration.

Linux verification

PYTHONPATH=.:sdks/python pytest <Linux CI gate set>
682 passed (was 695 - 13 ADR-0007-specific tests removed)
TOTAL  1660 stmts  100.00 % coverage (was 1694 - 34 dead stmts removed)

Per ADR 0008 §9

Pure deletion / cleanup on the Linux-runnable surface. Zero MLX runtime code touched. §9 last-paragraph carve-out invoked: Linux-only path. No Mac M4 report needed.

Reviewer checklist

  • Every ADR-0007 metric is gone: grep -r "path_selection\|continuation_tokens_skipped\|verifier_prefill_duration\|cache_invariant_violations_total\|record_path_selection\|record_cache_invariant_violation\|_emit_path_selection_metric\|_session_acceptance_rate\|engine_result" inference_engine/ tests/ scripts/ returns only docstring / comment hits, no live code.
  • §6.4 amendment correctly records the PR-D1 / PR-D2 split.
  • No new test depends on the removed fields: 13 tests deleted, no replacements added (that's PR-D2's job).
  • Existing 461-test integration suite still passes (now 682 after the deletions; the suite has grown since PR-D1's original spec was written, all post-A2 / post-B1-B5 additions are intact).

Next PR

PR-D2 (queued, not blocking v0.3 GA): the HTTP-shim refactor proper. /v1/chat/completions becomes a single-shot session under SessionStore; PooledVerifier retires; Deprecation / Sunset headers added. Linux-only path; §9 carve-out continues to apply.

After PR-D1 merges, PR-E1 (the integration suite + Mac M4 INV-3 determinism gate) can also start in parallel — they touch disjoint files.

Open in Web Open in Cursor 

ADR 0008 \u00a76.4 PR-D1 originally proposed a coupled change \u2014 (a)
remove ADR-0007-vintage server dead code, (b) refactor the HTTP
shim's chat-completions handler onto SessionStore. Implementation
revealed (a) is a pure subtraction with no behavior dependence on
(b), so they're split (same pattern as PR-A3 / PR-A3b). This PR is
(a); (b) becomes PR-D2 (queued, not in this diff).

Net diff: -540 deletions, +78 insertions. ADR 0008 \u00a76.4 amended
to record the split.

Files modified \u2014 production:

  inference_engine/server/engine.py
    -12 lines: EngineResult.path_selection / .tokens_skipped /
    .prefill_duration_seconds fields removed; SpeculativeEngine
    no longer forwards them from SpeculativeRunResult.

  inference_engine/server/metrics.py
    -74 lines: path_selection_total, continuation_tokens_skipped_total,
    verifier_prefill_duration_seconds, cache_invariant_violations_total
    metrics removed from Metrics + factory; record_path_selection and
    record_cache_invariant_violation methods removed.

  inference_engine/server/app.py
    -53 lines: _session_acceptance_rate and _emit_path_selection_metric
    helpers removed; the two call sites in the streaming + non-
    streaming completion paths now pass acceptance_rate=None to
    record_completion. The OpenAI response loses its acceptance_rate
    field as a result \u2014 acceptable on a feature-frozen deprecated
    shim per ADR 0008 \u00a72.7. Migrate to gRPC for richer telemetry.

  inference_engine/scheduler/session.py
    -7 lines: engine_result field on Session removed. The scheduler
    worker (scheduler.py) no longer stashes engine.generate()'s
    result on the session \u2014 the only reader was app.py's removed
    helpers.

  inference_engine/scheduler/scheduler.py
    \u00b13 lines (renamed assignment to del): the line that wrote
    session.engine_result = result is gone.

  scripts/bench_agentic/bench_long_session.py
    -211/+30 lines: removed _PATH_SELECTION_METRIC /
    _CONTINUATION_TOKENS_SKIPPED_METRIC / _CACHE_INVARIANT_VIOLATIONS_METRIC
    constants, the labeled-line regex, the labeled-metric branch in
    _parse_prom_text, the _extract_label helper (no callers after
    PR-D1), the _adr_0007_summary aggregator, the adr_0007 payload
    field, and the §2.10 block in render_summary.
    Module docstring updated to point at PR-E1's
    bench_session_long_run.py for the replacement bench.

Files modified \u2014 tests:

  tests/inference_engine/server/test_metrics.py
    -107 lines: 4 entries dropped from test_build_registers_all_documented_metrics
    expected set; 9 tests removed from the 'ADR 0007 \u00a72.10 \u2014
    path_selection observability' section.

  tests/inference_engine/server/test_app_metrics_and_auth.py
    -98 lines: 4 ADR-0007-specific tests removed
    (test_metrics_path_selection_metrics_present_on_idle_metrics_scrape,
    test_metrics_path_selection_recorded_after_completion,
    test_session_acceptance_rate_returns_none_when_result_missing_rate,
    test_emit_path_selection_metric_noop_when_path_unset).

Local verification (Linux VM, py3.12):
  PYTHONPATH=.:sdks/python pytest <Linux CI gate set>
  682 passed (was 695 - 13 ADR-0007-specific tests),
  TOTAL  1660 stmts  100.00 % coverage (was 1694 - 34 dead stmts).

Per ADR 0008 \u00a79: this PR is pure deletion / cleanup on the Linux-
runnable surface. Zero MLX runtime code touched. \u00a79 carve-out
applies; no Mac M4 report needed.

Next PR after merge:
  PR-D2 (\u00a76.4 amended, queued): HTTP-shim refactor onto SessionStore
        proper. Each /v1/chat/completions request becomes a single-
        shot session; PooledVerifier retires; Deprecation / Sunset
        headers added per \u00a72.7.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
cursor Bot pushed a commit that referenced this pull request Jun 1, 2026
Stacks on PR-D1 (#49). When this PR merges, PR-D1 lands along with
it.

Per ADR 0008 \u00a76.5, PR-E1 ships:

  1. tests/integration/ \u2014 new test directory with pytest.mark.integration
     marker auto-applied by tests/integration/conftest.py. Every
     test in this directory is opt-in via 'pytest -m integration';
     a bare pytest invocation skips them.

  2. tests/integration/test_inv3_session_determinism_gate.py \u2014
     the INV-3 byte-exact GA gate (ADR 0008 \u00a77 G3). Drives two
     independent SinkWindowVerifier instances (real Qwen3-0.6B
     weights) through identical history fed via different
     chunkings, asserts the resulting greedy token streams are
     byte-identical. Three tests:

       test_one_call_vs_two_calls_yield_byte_identical_tokens
         Minimal gate: 1\u00d710 vs 2\u00d75 chunking on a 10-token history,
         12 tokens of greedy generation.

       test_chunking_invariance_across_three_splits
         Stronger version: 1\u00d720 / 3\u00d7medium / 10\u00d7small chunkings on
         a 20-token history, 8 tokens of greedy generation. Catches
         chunk-boundary bugs the 1-vs-2 case might miss (e.g., a
         bug that only triggers when a chunk crosses a sink+window
         trim boundary).

       test_repeated_runs_with_same_history_byte_identical
         Sanity: same workload run twice on the same verifier
         produces the same output. Greedy decoding has no
         legitimate source of nondeterminism.

  3. pytest.ini \u2014 minimal new file registering the 'integration'
     marker so it doesn't trigger PytestUnknownMarkWarning. Tests
     opt in via 'pytest -m integration'.

Replaces the deleted tests/core/test_determinism_gate.py (PR-A3
removed it together with verifier.path_select; per ADR 0008 \u00a76.6
PR-E1's replacement is in tests/integration/, not tests/core/,
because integration is where Mac-M4-only GA gates belong per \u00a79).

NOT in this PR (deferred):

  scripts/bench_agentic/bench_session_long_run.py
    \u00a76.5 also mentions a Mac M4 long-session bench using the gRPC
    SDK. Splitting that out as PR-E1b (or rolled into PR-E2's CI
    workflow PR) so PR-E1's diff stays focused on the GA gate.

  PR-E2: GitHub Actions self-hosted Mac M4 runner workflow that
    invokes 'pytest -m integration' on every PR labelled
    'needs-mac-m4'. Until that lands, the gate is run manually via
    scripts/review_pr_e1_on_mac.sh.

Mac M4 reviewer aid:

  scripts/review_pr_e1_on_mac.sh
    Runs 'pytest -m integration tests/integration/' and produces
    one JSON artifact (pr-e1-mac-integration-tests-<unix>.json)
    under results/platform-tests/. Same coverage-free pattern as
    review_pr_b3_on_mac.sh + commit 9d1a250 hotfixes already
    folded in.

Local verification (Linux VM, py3.12):
  Linux CI gate (existing 8 paths): 682 passed, 100% coverage,
                                    UNCHANGED. tests/integration/
                                    is not in the Linux paths.
  Integration suite collection:     3 tests collected from
                                    tests/integration/, marker
                                    auto-applied via conftest.
  Bare 'pytest' from repo root:     would still pick up
                                    tests/integration/ if
                                    discovered, but the existing
                                    project convention is explicit
                                    test paths in CI; the marker
                                    is the safety net.

Per ADR 0008 \u00a79: this PR ships the test suite that IS the GA gate.
Linux CI gate does not exercise the integration tests (HF-cache-
bound, real-model-numerics-dependent), so PR-E1's true validation
happens on Mac M4. Reviewer pushes scripts/review_pr_e1_on_mac.sh's
JSON output to the PR branch as evidence.

Next PR after merge:
  PR-E2: GitHub Actions self-hosted Mac M4 runner workflow.
  PR-E1b (or rolled into PR-E2): bench_session_long_run.py.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
@FluffyAIcode FluffyAIcode marked this pull request as ready for review June 2, 2026 04:02
@FluffyAIcode FluffyAIcode merged commit d12af68 into main Jun 2, 2026
6 checks passed
@FluffyAIcode FluffyAIcode deleted the AgentMemory/v030-pr-d1-remove-adr-0007-server-deadcode-8e7f branch June 2, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants