Skip to content

feat(app): F4.5 — completion logic (offer-to-submit gate + sweep + close)#35

Merged
JohnD-EE merged 3 commits into
mainfrom
feat/f4.5-completion-logic
Jun 5, 2026
Merged

feat(app): F4.5 — completion logic (offer-to-submit gate + sweep + close)#35
JohnD-EE merged 3 commits into
mainfrom
feat/f4.5-completion-logic

Conversation

@JohnD-EE
Copy link
Copy Markdown
Contributor

@JohnD-EE JohnD-EE commented Jun 5, 2026

F4.5 — Completion logic

The close primitive of P4 (the non-streaming conversational engine), after selection (F4.1), extraction (F4.2), contradiction detection (F4.3), and refinement (F4.4). F4.5 decides when the agent may offer to submit, phrases that offer, resolves the respondent's accept / hold, drives the F4.3 contradiction completion-sweep at the moment of offer, and transitions the session active → completed on a clean submit.

Built as pure core → capability + agent + seeds → routes + docs, mirroring F4.1–F4.4. Delivered as 3 risk-sequenced commits on one branch. No schema migration — it builds on F4.4's AppQuestionnaireSession / AppAnswerSlot.

Tracker: planning/features/f4.5.md · Docs: completion-logic.md

Design (decisions confirmed at planning)

  • Two layers. Eligibility (assessCompletion) is a deterministic gatewhether to offer. The LLM capability only phrases the offer (the "agent contract") — how to say it. The model never decides eligibility, so the gate (incl. required questions) stays authoritative.
  • Required gate blocks the offer. An unanswered required slot → blocked_on_required, even when weighted coverage already meets the threshold — the gate selection's terminalDecision lacks.
  • Sweep conflict → hold for review. On accept, if the completion-sweep finds contradictions, do not auto-submit: return findings, leave the session active, reconcile via F4.4, re-offer. Consistent with F4.3 never auto-overwriting.
  • Two routes. completion-status (read-only assess + optional composed offer, no persistence) and complete (accept/hold action; persists status; runs the sweep).
  • No new config / no sweep_only mode. Maps onto the committed flat config fields (minQuestionsAnswered, coverageThreshold, maxQuestionsPerSession) and the existing shouldRunDetection(mode, windowN, 'completion-sweep').
  • Completion sub-flag does NOT 404 its route. The free deterministic assessment is always available under the master flag; only the paid LLM phrasing is gated.

What's in each commit

PR1 — pure core (lib/app/questionnaire/completion/, Prisma-free): assessCompletion (reuses the F4.1 coverage helpers + the new required gate; cap → required → thresholds ordering) and resolveCompletion (accept+clean/skipped→submit, accept+conflicts→hold_for_review, hold→continue). Zod offer contract + provider-agnostic prompt builder. Narrowed the selection coverage helpers' param to a structural CoverageContext so completion reuses them — backward-compatible (SelectionContext still satisfies it; all F4.1 callers unaffected).

PR2 — capability + agent + seeds (AppComposeCompletionOfferCapability): provider-agnostic runStructuredCompletion → parse/retry → cost-sum, resolves the chat tier; processesPii with a counts/flags-only redactProvenance (the recap echoes prompts + recent messages). Constants, isCompletionEnabled(), registration, seeds 015-completion-agent / 016-completion-capability / 017-completion-flag (disabled by default). The sweep reuses F4.3's app_detect_contradictions — no new detection capability.

PR3 — routes + session seam + docs: markSessionCompleted added to the F4.4 _lib/answer-slots.ts seam; completionLimiter (60/min per admin). completion-status route (assess + optional offer, fail-soft, sub-flag-soft) and complete route (seed answers → eligible-accept sweep → resolve → persist). Both reuse buildSelectionContext (no new context builder) and buildContradictionContext for the sweep. New completion-logic.md + cross-refs (contradiction-detection.md, configuration.md, README) + tracker + dev-plan status flips.

Routes

Route Purpose Persists
POST …/versions/:vid/completion-status Deterministic assessment + (when eligible & sub-flag on) composed offer No
POST …/versions/:vid/complete accept/hold → resolve, run completion-sweep, transition active→completed on submit Yes (preview session status)

Both: master-flag-gated, admin-only, fail-soft (a failed offer composition → assessment + diagnostic; a failed/disabled sweep → treated clean). accept while blocked_on_required refuses to submit.

Tests — 61 new cases, full questionnaire suite green (960)

  • Unit (29): assessCompletion edge cases incl. the required-blocks-despite-coverage headline + epsilon boundary + cap override; resolveCompletion branches; schema round-trip; prompt structure; vocab parity.
  • Integration — capability (14): real dispatcher + runStructuredCompletion, mocked provider — happy path, optional remainingNote, retry, fail-closed paths, cost-log isolation, binding coercion, counts-only redaction.
  • Integration — seam (+2): markSessionCompleted transition + boundary narrowing.
  • Integration — routes (12 + 16): gate order, the three assessment kinds, composer + sweep wiring, accept→submit / accept→hold_for_review (status stays active, findings returned) / hold→continue / accept-while-blocked, sub-flag-off-returns-assessment-not-404, fail-soft sweep, sub-flag/agent-missing handling.

npm run validate clean (type-check + lint + format). Seeds 015–017 apply idempotently. No CHANGELOG entry — app surface, not the Sunrise platform public surface (consistent with F1.1–F4.4).

Operator note

The deterministic assessment is on under APP_QUESTIONNAIRES_ENABLED. For a composed offer, also enable APP_QUESTIONNAIRES_COMPLETION_ENABLED + db:seed. The completion-sweep runs only when APP_QUESTIONNAIRES_CONTRADICTION_DETECTION_ENABLED is on and the version's mode is flag/probe; otherwise it's skipped (treated clean). The complete route writes to a per-version preview session (isPreview, excluded from P8).

🤖 Generated with Claude Code

JohnD-EE and others added 2 commits June 5, 2026 12:08
The deterministic "may we offer to submit?" gate for the conversational
engine (P4), built Prisma-free and exhaustively unit-testable by hand,
mirroring F4.1–F4.4.

- lib/app/questionnaire/completion/: types, completion-logic
  (assessCompletion + resolveCompletion), completion-schema (Zod offer
  contract), completion-prompt (offer composer), index barrel.
- assessCompletion reuses the F4.1 coverage helpers (coverageRatio,
  answeredCount, unansweredQuestions) and adds the required-questions
  gate selection lacks: an unanswered required slot → blocked_on_required,
  even when weighted coverage already meets the threshold. Ordering
  mirrors terminalDecision (cap first, then thresholds).
- resolveCompletion: accept + clean/skipped sweep → submit; accept +
  sweep contradictions → hold_for_review (never auto-submit over a
  conflict); hold → continue.
- Narrowed the selection coverage helpers' param to a structural
  CoverageContext so completion reuses them without dragging in
  selection-only fields. Backward-compatible — SelectionContext still
  satisfies it; all F4.1 callers unaffected.
- 29 unit tests (logic edge cases incl. the required gate + epsilon
  boundary, schema round-trip, prompt structure, vocab parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The LLM layer that *phrases* the offer-to-submit, on top of PR1's
deterministic gate. The capability never decides whether to offer (that
stays deterministic) — only how to say it.

- AppComposeCompletionOfferCapability (compose-completion-offer.ts):
  provider-agnostic runStructuredCompletion → parse/retry → cost-sum,
  resolves the chat tier from entityContext.completionAgent; processesPii
  with a counts-only redactProvenance (the recap echoes prompts + recent
  messages). Registered in lib/app/capabilities.ts + barrel.
- constants.ts: COMPOSE_COMPLETION_OFFER_* slugs/handler/function-def,
  QUESTIONNAIRE_COMPLETION_AGENT_SLUG, APP_QUESTIONNAIRES_COMPLETION_FLAG.
- feature-flag.ts: isCompletionEnabled() — master AND completion sub-flag.
- Seeds 015 (completion agent), 016 (capability + binding), 017 (sub-flag,
  disabled by default).
- 14 integration-capability tests (real dispatcher + runStructuredCompletion,
  mocked provider): happy path, optional remainingNote, retry, fail-closed
  paths, cost-log isolation, binding coercion, counts-only redaction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JohnD-EE JohnD-EE force-pushed the feat/f4.5-completion-logic branch from 711833d to fe4209a Compare June 5, 2026 12:13
@JohnD-EE
Copy link
Copy Markdown
Contributor Author

JohnD-EE commented Jun 5, 2026

Test review

8 file pairs reviewed (4 unit · 4 integration) across assertion quality, coverage, mock realism, brittleness, and alignment. Mock realism was clean (provider/dispatch/limiter/session mock shapes all match their real contracts). Two assertion-quality findings ≥80 surfaced — both fixed in fe4209a.

  1. .find().toBeDefined() then optional-chained access (90, assertion-quality) — fixed

    );
    expect(chatCall, 'a CHAT cost log for this capability should have been emitted').toBeDefined();
    // Non-null after the guard so a regression points at these lines, not a silent ?. no-op.
    const chatArg = chatCall![0];
    expect(chatArg).toMatchObject({
    operation: CostOperation.CHAT,
    metadata: { capability: SLUG },
    });
    expect(chatArg).not.toHaveProperty('agentId');

    Follow-up assertions used chatCall?.[0], so a regression could weaken the failure signal. Replaced with a non-null chatArg local after the toBeDefined() guard.

  2. .toBeDefined() where the exact value is the contract (82, assertion-quality) — fixed

    expect(data.offer.coveredSummary).toContain('goals');
    expect(data.offer.remainingNote).toBe('You can still add anything else before we wrap up.');
    expect(provider.chat).toHaveBeenCalledTimes(1);

    remainingNote was asserted only present; the capability passes a concrete string through. Now asserts the exact value (round-trip fidelity).

~13 sub-threshold items (minor branch-coverage adds, cosmetic clock-in-mock) were left per the 80 threshold. Full local report retained at .reviews/tests-branch-f4.5-completion-logic.md.


🤖 Generated with Claude Code.

React with 👍 or 👎 on this comment to help calibrate future reviews.

The admin-facing surface for completion logic: two preview routes, the
active→completed transition, and docs. Completes F4.5.

- _lib/answer-slots.ts: markSessionCompleted(sessionId) — the accept→submit
  transition (idempotent, status-narrowed). Documented as the F4.6
  session-event seam.
- _lib/rate-limit.ts: completionLimiter (60/min per admin).
- completion-status/route.ts (read-only): assessCompletion over a
  hand-supplied answer state → returns the assessment; when offer + the
  completion sub-flag is on, composes the offer (fail-soft). The sub-flag
  does NOT 404 — the free assessment stays available, only LLM phrasing is
  gated.
- complete/route.ts (accept/hold): seeds answers into the preview session,
  runs the F4.3 completion-sweep on an eligible accept (gated by the
  contradiction sub-flag, fail-soft = clean), resolves via resolveCompletion,
  and on a clean submit transitions the session to completed. hold /
  hold_for_review / blocked leave it active.
- Both routes reuse buildSelectionContext + buildContradictionContext.
- Docs: new completion-logic.md + cross-refs; tracker planning/features/f4.5.md;
  dev-plan F4.5 status + P4 status lines. No CHANGELOG entry (app surface).

PR-gate fixes folded in (/pre-pr, /code-review, /test-review, /security-review):
- compose-completion-offer.ts capability test: +6 cases lifting branch
  coverage 70%→96%; +2 assertion-quality fixes.
- complete/route.ts: completionLimiter check moved to gate only the paid sweep
  dispatch; completion-status: removed unused `byId`.
- /security-review: clean (no findings).

Post-review hardening (the two accepted code-review observations):
- Preview-session race (TOCTOU): new raw-SQL PARTIAL UNIQUE INDEX
  (idx_app_questionnaire_session_preview_per_version, WHERE isPreview = true;
  migration 20260605141500) + a defensive dedupe; getOrCreatePreviewSession now
  catches P2002 and resolves to the winning row, so concurrent first-touch can't
  split a version's preview answers. Drift-probed in lib/app/db-drift.ts;
  schema carries a DRIFT WARNING. +2 seam tests.
- Completion-sweep size safety: the sweep now sends only ANSWERED slots to the
  detector (an unanswered slot can't contradict and is never prompted), so the
  MAX_CONTRADICTION_SLOTS cap tracks answer count not questionnaire size; if the
  trimmed input still exceeds the detector caps the sweep is skipped with an
  explicit `sweep_skipped_oversized` diagnostic + warn instead of a silent
  fail-soft submit. +2 route tests.

Full questionnaire suite green (972). validate + db:drift-check (11/11) + migrate
status all clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JohnD-EE JohnD-EE force-pushed the feat/f4.5-completion-logic branch from fe4209a to 002d67f Compare June 5, 2026 13:20
@JohnD-EE
Copy link
Copy Markdown
Contributor Author

JohnD-EE commented Jun 5, 2026

Hardening (review follow-up) — 002d67f

Addressed the two accepted code-review observations:

  1. Preview-session race (TOCTOU) — added a raw-SQL partial unique index idx_app_questionnaire_session_preview_per_version (WHERE isPreview = true; migration 20260605141500, with a defensive dedupe). getOrCreatePreviewSession now catches the resulting P2002 and resolves to the winning row, so two concurrent first-touch requests can't split a version's preview answers across two sessions. Drift-probed in lib/app/db-drift.ts (11/11 probes pass); schema carries a DRIFT WARNING.

  2. Completion-sweep size safety — the sweep now sends only the answered slots to the detector (an unanswered slot can't contradict and was never prompted), so MAX_CONTRADICTION_SLOTS tracks the answer count rather than the questionnaire's size. If the trimmed input still exceeds the detector caps, the sweep is skipped with an explicit sweep_skipped_oversized diagnostic + warn log instead of a silent fail-soft submit.

+4 tests (2 seam race-safety, 2 route sweep-shaping). Full questionnaire suite green (972); validate, db:drift-check, and migrate status all clean.


🤖 Generated with Claude Code.

@JohnD-EE JohnD-EE merged commit 8bddf8d into main Jun 5, 2026
14 checks passed
@JohnD-EE JohnD-EE deleted the feat/f4.5-completion-logic branch June 5, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant