feat(app): F4.5 — completion logic (offer-to-submit gate + sweep + close)#35
Conversation
The deterministic "may we offer to submit?" gate for the conversational engine (P4), built Prisma-free and exhaustively unit-testable by hand, mirroring F4.1–F4.4. - lib/app/questionnaire/completion/: types, completion-logic (assessCompletion + resolveCompletion), completion-schema (Zod offer contract), completion-prompt (offer composer), index barrel. - assessCompletion reuses the F4.1 coverage helpers (coverageRatio, answeredCount, unansweredQuestions) and adds the required-questions gate selection lacks: an unanswered required slot → blocked_on_required, even when weighted coverage already meets the threshold. Ordering mirrors terminalDecision (cap first, then thresholds). - resolveCompletion: accept + clean/skipped sweep → submit; accept + sweep contradictions → hold_for_review (never auto-submit over a conflict); hold → continue. - Narrowed the selection coverage helpers' param to a structural CoverageContext so completion reuses them without dragging in selection-only fields. Backward-compatible — SelectionContext still satisfies it; all F4.1 callers unaffected. - 29 unit tests (logic edge cases incl. the required gate + epsilon boundary, schema round-trip, prompt structure, vocab parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The LLM layer that *phrases* the offer-to-submit, on top of PR1's deterministic gate. The capability never decides whether to offer (that stays deterministic) — only how to say it. - AppComposeCompletionOfferCapability (compose-completion-offer.ts): provider-agnostic runStructuredCompletion → parse/retry → cost-sum, resolves the chat tier from entityContext.completionAgent; processesPii with a counts-only redactProvenance (the recap echoes prompts + recent messages). Registered in lib/app/capabilities.ts + barrel. - constants.ts: COMPOSE_COMPLETION_OFFER_* slugs/handler/function-def, QUESTIONNAIRE_COMPLETION_AGENT_SLUG, APP_QUESTIONNAIRES_COMPLETION_FLAG. - feature-flag.ts: isCompletionEnabled() — master AND completion sub-flag. - Seeds 015 (completion agent), 016 (capability + binding), 017 (sub-flag, disabled by default). - 14 integration-capability tests (real dispatcher + runStructuredCompletion, mocked provider): happy path, optional remainingNote, retry, fail-closed paths, cost-log isolation, binding coercion, counts-only redaction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
711833d to
fe4209a
Compare
Test review8 file pairs reviewed (4 unit · 4 integration) across assertion quality, coverage, mock realism, brittleness, and alignment. Mock realism was clean (provider/dispatch/limiter/session mock shapes all match their real contracts). Two assertion-quality findings ≥80 surfaced — both fixed in fe4209a.
~13 sub-threshold items (minor branch-coverage adds, cosmetic clock-in-mock) were left per the 80 threshold. Full local report retained at 🤖 Generated with Claude Code. React with 👍 or 👎 on this comment to help calibrate future reviews. |
The admin-facing surface for completion logic: two preview routes, the active→completed transition, and docs. Completes F4.5. - _lib/answer-slots.ts: markSessionCompleted(sessionId) — the accept→submit transition (idempotent, status-narrowed). Documented as the F4.6 session-event seam. - _lib/rate-limit.ts: completionLimiter (60/min per admin). - completion-status/route.ts (read-only): assessCompletion over a hand-supplied answer state → returns the assessment; when offer + the completion sub-flag is on, composes the offer (fail-soft). The sub-flag does NOT 404 — the free assessment stays available, only LLM phrasing is gated. - complete/route.ts (accept/hold): seeds answers into the preview session, runs the F4.3 completion-sweep on an eligible accept (gated by the contradiction sub-flag, fail-soft = clean), resolves via resolveCompletion, and on a clean submit transitions the session to completed. hold / hold_for_review / blocked leave it active. - Both routes reuse buildSelectionContext + buildContradictionContext. - Docs: new completion-logic.md + cross-refs; tracker planning/features/f4.5.md; dev-plan F4.5 status + P4 status lines. No CHANGELOG entry (app surface). PR-gate fixes folded in (/pre-pr, /code-review, /test-review, /security-review): - compose-completion-offer.ts capability test: +6 cases lifting branch coverage 70%→96%; +2 assertion-quality fixes. - complete/route.ts: completionLimiter check moved to gate only the paid sweep dispatch; completion-status: removed unused `byId`. - /security-review: clean (no findings). Post-review hardening (the two accepted code-review observations): - Preview-session race (TOCTOU): new raw-SQL PARTIAL UNIQUE INDEX (idx_app_questionnaire_session_preview_per_version, WHERE isPreview = true; migration 20260605141500) + a defensive dedupe; getOrCreatePreviewSession now catches P2002 and resolves to the winning row, so concurrent first-touch can't split a version's preview answers. Drift-probed in lib/app/db-drift.ts; schema carries a DRIFT WARNING. +2 seam tests. - Completion-sweep size safety: the sweep now sends only ANSWERED slots to the detector (an unanswered slot can't contradict and is never prompted), so the MAX_CONTRADICTION_SLOTS cap tracks answer count not questionnaire size; if the trimmed input still exceeds the detector caps the sweep is skipped with an explicit `sweep_skipped_oversized` diagnostic + warn instead of a silent fail-soft submit. +2 route tests. Full questionnaire suite green (972). validate + db:drift-check (11/11) + migrate status all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fe4209a to
002d67f
Compare
Hardening (review follow-up) — 002d67fAddressed the two accepted code-review observations:
+4 tests (2 seam race-safety, 2 route sweep-shaping). Full questionnaire suite green (972); 🤖 Generated with Claude Code. |
F4.5 — Completion logic
The close primitive of P4 (the non-streaming conversational engine), after selection (F4.1), extraction (F4.2), contradiction detection (F4.3), and refinement (F4.4). F4.5 decides when the agent may offer to submit, phrases that offer, resolves the respondent's accept / hold, drives the F4.3 contradiction completion-sweep at the moment of offer, and transitions the session
active → completedon a clean submit.Built as pure core → capability + agent + seeds → routes + docs, mirroring F4.1–F4.4. Delivered as 3 risk-sequenced commits on one branch. No schema migration — it builds on F4.4's
AppQuestionnaireSession/AppAnswerSlot.Tracker:
planning/features/f4.5.md· Docs:completion-logic.mdDesign (decisions confirmed at planning)
assessCompletion) is a deterministic gate — whether to offer. The LLM capability only phrases the offer (the "agent contract") — how to say it. The model never decides eligibility, so the gate (incl. required questions) stays authoritative.requiredslot →blocked_on_required, even when weighted coverage already meets the threshold — the gate selection'sterminalDecisionlacks.accept, if the completion-sweep finds contradictions, do not auto-submit: return findings, leave the sessionactive, reconcile via F4.4, re-offer. Consistent with F4.3 never auto-overwriting.completion-status(read-only assess + optional composed offer, no persistence) andcomplete(accept/hold action; persists status; runs the sweep).sweep_onlymode. Maps onto the committed flat config fields (minQuestionsAnswered,coverageThreshold,maxQuestionsPerSession) and the existingshouldRunDetection(mode, windowN, 'completion-sweep').What's in each commit
PR1 — pure core (
lib/app/questionnaire/completion/, Prisma-free):assessCompletion(reuses the F4.1 coverage helpers + the new required gate; cap → required → thresholds ordering) andresolveCompletion(accept+clean/skipped→submit, accept+conflicts→hold_for_review, hold→continue). Zod offer contract + provider-agnostic prompt builder. Narrowed the selection coverage helpers' param to a structuralCoverageContextso completion reuses them — backward-compatible (SelectionContextstill satisfies it; all F4.1 callers unaffected).PR2 — capability + agent + seeds (
AppComposeCompletionOfferCapability): provider-agnosticrunStructuredCompletion→ parse/retry → cost-sum, resolves thechattier;processesPiiwith a counts/flags-onlyredactProvenance(the recap echoes prompts + recent messages). Constants,isCompletionEnabled(), registration, seeds015-completion-agent/016-completion-capability/017-completion-flag(disabled by default). The sweep reuses F4.3'sapp_detect_contradictions— no new detection capability.PR3 — routes + session seam + docs:
markSessionCompletedadded to the F4.4_lib/answer-slots.tsseam;completionLimiter(60/min per admin).completion-statusroute (assess + optional offer, fail-soft, sub-flag-soft) andcompleteroute (seed answers → eligible-accept sweep → resolve → persist). Both reusebuildSelectionContext(no new context builder) andbuildContradictionContextfor the sweep. Newcompletion-logic.md+ cross-refs (contradiction-detection.md,configuration.md, README) + tracker + dev-plan status flips.Routes
POST …/versions/:vid/completion-statusPOST …/versions/:vid/completeaccept/hold→ resolve, run completion-sweep, transitionactive→completedon submitBoth: master-flag-gated, admin-only, fail-soft (a failed offer composition → assessment + diagnostic; a failed/disabled sweep → treated clean).
acceptwhileblocked_on_requiredrefuses to submit.Tests — 61 new cases, full questionnaire suite green (960)
assessCompletionedge cases incl. the required-blocks-despite-coverage headline + epsilon boundary + cap override;resolveCompletionbranches; schema round-trip; prompt structure; vocab parity.runStructuredCompletion, mocked provider — happy path, optional remainingNote, retry, fail-closed paths, cost-log isolation, binding coercion, counts-only redaction.markSessionCompletedtransition + boundary narrowing.npm run validateclean (type-check + lint + format). Seeds 015–017 apply idempotently. No CHANGELOG entry — app surface, not the Sunrise platform public surface (consistent with F1.1–F4.4).Operator note
The deterministic assessment is on under
APP_QUESTIONNAIRES_ENABLED. For a composed offer, also enableAPP_QUESTIONNAIRES_COMPLETION_ENABLED+db:seed. The completion-sweep runs only whenAPP_QUESTIONNAIRES_CONTRADICTION_DETECTION_ENABLEDis on and the version's mode is flag/probe; otherwise it's skipped (treated clean). Thecompleteroute writes to a per-version preview session (isPreview, excluded from P8).🤖 Generated with Claude Code