From f8f873aad170820c369deae1d02260f0961de31f Mon Sep 17 00:00:00 2001 From: Clint Berry Date: Mon, 8 Jun 2026 23:48:32 +0000 Subject: [PATCH 1/7] docs: requirements + plan for interactive typed agent questions --- ...nteractive-agent-questions-requirements.md | 201 +++++++ ...1-feat-interactive-agent-questions-plan.md | 495 ++++++++++++++++++ 2 files changed, 696 insertions(+) create mode 100644 docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md create mode 100644 docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md diff --git a/docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md b/docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md new file mode 100644 index 0000000..e3d86de --- /dev/null +++ b/docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md @@ -0,0 +1,201 @@ +--- +date: 2026-06-08 +topic: interactive-agent-questions +--- + +# Interactive Agent Questions + +## Summary + +When an agent needs input mid-task, render its question as an interactive +inline prompt in the agent thread — a text field for open questions, +selectable buttons when the agent offers choices, with a free-text fallback — +and guarantee that a question is never shown to the user as raw JSON, +regardless of which path produced it. + +## Problem Frame + +Today an agent question can reach the user as a literal tool-call string in +the thread, e.g.: + +``` +Ask_user({"question":"What file would you like me to create? Please provide:\n1. The filename...\n2. The content..."}) +``` + +This is the structured question path *failing*. The intended pipeline is +`extension_ui_request` → `KindAwaitingInput` → `task.pendingQuestion`, which +the Super Thread drawer/card already consumes. The JSON appears when that +structured event never fires — the model emits the call as assistant prose +instead. Known triggers: the Pi `ask_user` extension install silently failed +(the installer is non-fatal and returns `nil` on error), the legacy `claude` +harness has no `ask_user` tool at all, or the model narrates the call in text +even when the tool exists. + +Two distinct quality gaps stack here: + +1. **Even on the happy path, the question renders as plain text.** The drawer + and task card print `task.pendingQuestion` verbatim — there is no input + affordance attached to it, and no way for the agent to offer concrete + choices. +2. **On the failure path, the user sees raw JSON** — which reads as a broken + product, not a question. + +Other harnesses (including Claude Code's own question UI) present questions as +typed, answerable controls. Deuce should meet that bar and additionally +guarantee the JSON failure mode can't surface. + +## Key Decisions + +- **Typed prompts, not just clean text.** The agent can attach a question + *kind* (free-text / pick-one / confirm) and, for choice kinds, a set of + options. The thread renders the matching control: a text field, selectable + buttons, or a yes/no affirmation. This is a deliberate step beyond "strip the + JSON and show prose" — chosen because concrete choices lower answer friction + and let agents ask better questions. + +- **A two-layer no-JSON guarantee.** Layer one hardens the structured path so a + question reliably arrives as a structured event (install failures become + loud/recoverable rather than silently dropping the tool). Layer two is a + text-shaped-question backstop: when an agent message *looks* like a tool call + (`ask_user(...)` / `Ask_user({...})`), it is intercepted, the question is + extracted, and it renders through the same prompt widget instead of as JSON. + +- **The guaranteed floor is "never raw JSON," not "always a rich prompt."** + When only text leaks and the structured event never fired, the backstop can + reconstruct a *clean question prompt* but not choices the agent never emitted + structurally. Degrading to a free-text prompt on those paths is acceptable; + showing JSON is not. + +- **Rich choices are Pi-path only.** The legacy `claude` harness has no + extension mechanism, so it cannot emit structured options. The backstop keeps + it from leaking JSON, but its questions stay free-text. + +- **Questions stay in the agent thread, not the main chat timeline.** This + matches the existing `awaiting_input` routing; the prompt is modal to the + task, answered in place. + +## Actors + +- A1. **Agent** — running inside a session's DevPod, blocks on a question when + it needs a decision only the human can make. +- A2. **Human collaborator** — sees the prompt in the agent thread and answers + it in place; their answer resumes the agent. + +## Key Flows + +- F1. **Structured typed question (happy path).** + - **Trigger:** the agent calls `ask_user` with a question and (optionally) a + kind and choices. + - The task enters `awaiting_input`; the thread renders the matching control + inline. + - The human answers in place (types, picks a button, confirms, or uses the + free-text fallback on a choice question). + - The answer is delivered back to the agent and the task resumes. + +- F2. **Leaked question (backstop path).** + - **Trigger:** a question reaches the client as assistant text shaped like a + tool call (legacy harness, failed extension, or model narration). + - The text is detected, the question string extracted, and a free-text prompt + rendered in place of the raw JSON. + - The human answers; the answer is routed back through the normal reply path. + - **Floor:** if extraction fails, the surfaced content is still a readable + question, never a JSON blob. + +## Requirements + +**Prompt rendering & interaction** + +- R1. An agent question in `awaiting_input` renders as an interactive prompt in + the agent thread, with an affordance to answer it in place (no copy-pasting, + no separate composer hunt). +- R2. For a free-text question, the prompt presents a text input and a send + action. +- R3. For a pick-one question, the prompt presents the agent's options as + selectable buttons, plus a free-text "Other" fallback so the human is never + trapped by the offered set. +- R4. For a confirm question, the prompt presents an affirm/decline control. +- R5. Selecting or submitting an answer delivers it back to the waiting agent + and transitions the task out of `awaiting_input`. + +**Question data model** + +- R6. The `ask_user` capability supports an optional question *kind* (free-text + / pick-one / confirm) and, for choice kinds, an optional list of options. +- R7. A question with no kind/options behaves as free-text — the change is + additive and backward-compatible with the current question-only shape. +- R8. The kind and options survive end to end (agent → structured event → + thread render) without being flattened back into a text string. + +**Leak prevention (the no-JSON guarantee)** + +- R9. A question is never displayed to the user as raw JSON or as a literal + tool-call string, on any path. +- R10. The structured-path failure modes that currently cause leaks are + hardened: a failed `ask_user` extension install is surfaced (loud / + recoverable), not silently dropped such that the agent narrates the call as + text. +- R11. A backstop detects agent text shaped like an `ask_user` tool call, + extracts the question, and renders it as a free-text prompt (R2) instead of + the raw string. +- R12. When the backstop cannot parse the leaked text into a question, the + surfaced content is still readable prose, never a JSON blob. + +## Acceptance Examples + +- AE1. **Covers R3.** Agent asks "Which framework?" with options + `[React, Vue, Svelte]`. The thread shows three buttons plus an "Other" field. + Picking "Vue" resumes the agent with "Vue"; typing "Solid" in Other resumes + it with "Solid". +- AE2. **Covers R2, R7.** Agent asks a question with no kind or options. The + thread shows a single text field — identical to today's question semantics, + now interactive. +- AE3. **Covers R9, R11.** A legacy-harness run emits + `Ask_user({"question":"What file should I create?"})` as text. The user sees + a free-text prompt asking "What file should I create?", not the JSON. +- AE4. **Covers R12.** A malformed leak (truncated/garbled tool-call text) that + can't be parsed surfaces as readable text, not a JSON fragment. +- AE5. **Covers R4.** Agent asks a confirm-kind question ("Proceed with the + force-push?"). The thread shows affirm/decline; declining resumes the agent + with a negative answer. + +## Scope Boundaries + +**Deferred for later** + +- Multi-question batches (asking several questions in one prompt) — the model + stays one question per `awaiting_input`. +- Surfacing prompts in the main chat timeline as a distinct message type — they + remain scoped to the agent thread. + +**Outside this product's identity** + +- Full typed-prompt support inside the legacy `claude` harness — it has no + extension channel to carry structured choices. Its only guarantee is the + no-JSON backstop with free-text prompts. (If/when the legacy harness is + retired, this boundary dissolves.) + +## Dependencies / Assumptions + +- The Pi `ask_user` extension is the carrier for structured kind/options; this + assumes Pi's extension UI channel can convey option sets back through the + `extension_ui_request` event (the decoder already carries a `RequestKind`). +- Assumes `ctx.hasUI` is true in Deuce's Pi RPC mode (the extension's happy + path calls `ctx.ui.input` directly); the headless `hasUI === false` branch + remains the correct behavior for genuinely non-interactive contexts. +- Assumes the agent-thread drawer/card is the right and only surface for the + prompt (consistent with current `awaiting_input` routing). + +## Outstanding Questions + +**Deferred to planning** + +- Exact shape of the options payload through Pi's extension UI channel + (whether `ctx.ui` exposes a select/confirm primitive or whether options ride + inside the input request) — confirm against Pi's extension API during + planning. +- How "loud / recoverable" extension-install failure should manifest + operationally (retry, surfaced session warning, or agent-disable) — a + reliability design choice for planning. +- The precise detection boundary for the text backstop (which patterns count as + a leaked `ask_user` call without false-positiving on legitimate prose that + mentions the tool). diff --git a/docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md b/docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md new file mode 100644 index 0000000..cacaa6b --- /dev/null +++ b/docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md @@ -0,0 +1,495 @@ +--- +title: "feat: Interactive typed agent questions with no-JSON guarantee" +type: feat +status: active +date: 2026-06-08 +origin: docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md +--- + +# feat: Interactive typed agent questions with no-JSON guarantee + +## Summary + +Render agent questions as interactive typed prompts in the agent thread — a +text field for open questions, selectable buttons when the agent offers +choices, a confirm control for yes/no, each with a free-text "Other" fallback — +and guarantee a question is never shown as raw JSON. Scoped entirely to the Pi +harness; the legacy `claude` executor is being removed and gets no work here. + +--- + +## Problem Frame + +Today an agent question can reach the user as a literal tool-call string in the +thread, e.g. `Ask_user({"question":"What file would you like me to create?"})`. +That is the structured question path *failing*: the intended pipeline is +`extension_ui_request` → `KindAwaitingInput` → `task.pendingQuestion`, which the +Super Thread drawer/card already consume. The JSON appears when that structured +event never fires — the Pi `ask_user` extension install silently failed +(`InstallPiExtension` returns `nil` on error), or the model narrated the call as +assistant prose (which streams into the reply buffer and posts as a chat +message). + +Two quality gaps stack: (1) even on the happy path the question renders as plain +text with no input affordance and no way for the agent to offer choices +(`AgentTaskCard.tsx` / `AgentThreadDrawer.tsx` print `task.pendingQuestion` +verbatim), and (2) on the failure path the user sees raw JSON, which reads as a +broken product. See origin: `docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md`. + +--- + +## Requirements Traceability + +Carried from the origin requirements doc (R1–R12, A1–A2, F1–F2, AE1–AE5): + +- **Typed prompts (R1–R5, R6–R8):** interactive prompt per kind (free-text / + pick-one / confirm) with "Other" fallback; `ask_user` extended additively to + carry `kind` + `options`; kind/options survive end to end without flattening. +- **No-JSON guarantee (R9–R12):** never display raw JSON; harden the silent + install failure (R10); backstop narrated tool-call text into a clean question + (R11) with a readable-prose floor when unparseable (R12). +- **Actors:** A1 agent (asks), A2 human (answers in place). +- **Flows:** F1 structured typed question; F2 leaked question → backstop. + +**Origin note — AE3 reframed.** The origin's AE3 described a *legacy-harness* +leak. Since the `claude` harness is being removed, AE3 is covered here as a +**Pi-path narration leak** (the model writes the call into its assistant text +instead of invoking the tool). The "Outside this product's identity" origin +boundary about legacy-harness typed prompts is moot — that harness is going +away, not being supported at a lower tier. + +--- + +## Key Technical Decisions + +- **Plumb `kind` + `options` through the existing `pendingQuestion` channel, not + a new event.** `task_awaiting_input` is part of the append-only, seq-ordered + AgentRunEvent family (KTD6) and the decoder already extracts `RequestKind` + (`decoder.go` `Event.RequestKind`). The gap is that `ws.TaskEventPayload` drops + it. Add `pendingQuestionKind` + `pendingQuestionOptions` fields mirroring + `pendingQuestion` exactly, additive and backward-compatible (a question with no + kind behaves as free-text, R7). + +- **Answers keep flowing through the existing steer → `ExtensionUIResponse` + path.** Clicking a choice or confirm sends the chosen value as the steer + message; `RouteOrEnqueue` already routes it to Pi as + `ExtensionUIResponse{id, response}` keyed by the tracked request id (KTD15), + under the per-`(session,agent)` lock (KTD9), clearing the awaiting-input + ceiling (KTD8). No new answer transport. `ExtensionUIResponse.Response` is + already `any`, so a string value needs no protocol change. + +- **The backstop lives in the Pi reply-finalize path, not a shared post path.** + With the legacy harness removed, the only remaining leak source is Pi-path + assistant-text narration, which accumulates via `appendReply` and is taken at + `takeReply` before `replyPoster` fires in `finalizeLocked`. Intercepting there + sanitizes a tool-call-shaped reply into a clean question before it ever posts. + Floor is clean readable text (the user answers via the normal composer), not a + synthesized interactive prompt — a narrated question means the agent is not + actually blocked on a structured request. + +- **Install-failure hardening is loud, not self-healing.** `InstallPiExtension` + escalates a failure to error-level logging and surfaces a session-visible + notice that the agent cannot ask questions, rather than retrying or disabling + the agent. Preserve the deliberate base64-over-the-wire encoding. + +- **Rich choices depend on verifying the live Pi `ctx.ui` API.** The + `@earendil-works/pi-coding-agent` types are not vendored locally, so whether + `ctx.ui.select` / `ctx.ui.confirm` exist is unverified. The extension verifies + at implementation time and falls back to `ctx.ui.input` with options rendered + into the prompt text when the richer primitives are absent — still no JSON, + still answerable. Keep the `ctx.hasUI === false` headless branch intact. + +--- + +## High-Level Technical Design + +Two layers, one shared prompt surface. The happy path (F1) carries structured +kind/options end to end; the backstop (F2) sanitizes a narrated leak before it +reaches chat. + +```mermaid +flowchart TD + subgraph Pi["Pi container"] + EXT["ask_user extension
kind + options (U1)"] + end + EXT -->|extension_ui_request| DEC["decoder: RequestKind + options (U2)"] + DEC --> RT["runtime: SetAwaitingInput + setPending"] + RT -->|task_awaiting_input
+kind +options| WS["ws.TaskEventPayload (U2)"] + WS --> RED["agent-runs reducer
+kind +options (U3)"] + RED --> UI["drawer / card typed controls (U4)"] + UI -->|steer: chosen value| ROE["RouteOrEnqueue"] + ROE -->|extension_ui_response{id,response}| EXT + + subgraph Leak["F2 — narration leak (Pi only)"] + TXT["assistant text narrates Ask_user(...)"] --> AR["appendReply → takeReply"] + AR --> BS{"backstop:
tool-call-shaped? (U6)"} + BS -->|yes, parseable| CLEAN["post clean question text"] + BS -->|yes, unparseable| PROSE["post readable prose (floor)"] + BS -->|no| NORMAL["post reply unchanged"] + end + + INST["InstallPiExtension fails → loud notice (U5)"] -.prevents.-> TXT +``` + +Diagram is authoritative alongside the prose below. + +--- + +## Implementation Units + +### U1. Extend the `ask_user` Pi extension to carry kind + options + +**Goal:** Let the agent attach a question kind (free-text / pick-one / confirm) +and, for choice kinds, a list of options — rendering the matching Pi UI +primitive, with a graceful fallback when the richer primitive is unavailable. + +**Requirements:** R6, R7, R8. Advances F1. + +**Dependencies:** none. + +**Files:** +- `server/internal/agent/pirun/extension/ask-user.ts` (modify) +- (embed is automatic via `server/internal/agent/pirun/extension/embed.go` — no change unless a new file is added) + +**Approach:** +- Add optional `kind` (`"input" | "select" | "confirm"`, default `input`) and + `options: string[]` parameters to the tool's typebox schema. Keep `question` + required. Additive — omitting kind/options preserves today's free-text + behavior (R7). +- Dispatch on kind: `select` → `ctx.ui.select` (or equivalent) with options; + `confirm` → `ctx.ui.confirm`; `input`/default → existing `ctx.ui.input`. +- **Verify the real `ctx.ui` surface against the live Pi API first** (see + Verification). If `select`/`confirm` are not exposed, fall back to + `ctx.ui.input` with the options enumerated in the prompt string and return the + raw typed answer — never emit JSON. +- Return the chosen option / confirm result as `content: [{type:"text", text}]`, + matching the current contract. +- Preserve the `!ctx.hasUI` headless branch unchanged. + +**Patterns to follow:** the current `registerTool` + typebox shape in the same +file; the existing `ctx.ui.input` call and text-content return. + +**Test scenarios:** +- Covers AE2. `ask_user` with no kind/options → behaves as free-text `input`, + identical request shape to today. +- Covers AE1. `ask_user` with `kind:"select"`, options `[React, Vue, Svelte]` → + emits a select-style request carrying all three options. +- Covers AE5. `ask_user` with `kind:"confirm"` → emits a confirm-style request. +- Fallback: when the select/confirm primitive is unavailable, the tool still + returns a typed answer and never returns a JSON blob. +- Headless: `ctx.hasUI === false` returns the "proceed on best judgment" text + for every kind. + +**Execution note:** Verify the Pi `ctx.ui` API shape before committing the +dispatch — the primitive names are unverified locally. + +**Verification:** Run an agent in a real session; confirm a `select` question +renders options and a `confirm` question renders yes/no, and that the answer +returns to the agent as text. If primitives are missing, confirm the +input-fallback path produces a clean (non-JSON) prompt. + +--- + +### U2. Propagate kind + options through the Go event pipeline + +**Goal:** Carry the question kind and options from the decoded +`extension_ui_request` through the runtime to the `task_awaiting_input` WS +payload, mirroring the existing `pendingQuestion` plumbing. + +**Requirements:** R8. Advances F1, R3, R4. + +**Dependencies:** U1 (the extension must emit kind/options to decode). + +**Files:** +- `server/internal/agent/pirun/decoder.go` (modify — extract options alongside the existing `RequestKind`) +- `server/internal/ws/events.go` (modify — add `PendingQuestionKind`, `PendingQuestionOptions` to `TaskEventPayload`) +- `server/internal/agent/runtime.go` (modify — pass kind/options into the `TypeTaskAwaitingInput` broadcast) +- `server/internal/agent/pirun/decoder_test.go` (modify/add) + +**Approach:** +- Extend `decodeUIRequest` to pull an options list from the request params + (best-effort per KTD2 — tolerate absence; normalize the unverified field names + during U1 verification). `Event` already carries `RequestKind`; add an + `Options []string` field. +- Add `PendingQuestionKind string` and `PendingQuestionOptions []string` + (both `omitempty`) to `ws.TaskEventPayload`. +- In `runtime.go` `translate` (the `KindAwaitingInput` case), include + `ev.RequestKind` and the options in the existing broadcast. No change to + `SetAwaitingInput` persistence semantics or the pending-request id tracking. + +**Patterns to follow:** the existing `PendingQuestion: ev.Prompt` flow in the +`TypeTaskAwaitingInput` broadcast; `omitempty` JSON tags on `TaskEventPayload`. + +**Test scenarios:** +- Decoder: an `extension_ui_request` with kind `select` + options decodes to + `Event{RequestKind:"select", Options:[...]}`. +- Decoder (KTD2 tolerance): a request missing kind/options decodes to free-text + with empty options, no error. +- Decoder: malformed/extra fields are tolerated, stream continues. +- Payload: marshaled `TaskEventPayload` includes camelCase + `pendingQuestionKind` / `pendingQuestionOptions` only when present (omitempty). + +**Verification:** Unit tests pass; a live `select` question produces a +`task_awaiting_input` WS frame carrying kind + options. + +--- + +### U3. Carry kind + options in the frontend store and types + +**Goal:** Thread the new fields into the client `TaskEventPayload`/`AgentTask` +types and the `agent-runs` reducer so they survive both the live-event and +snapshot-replace paths. + +**Requirements:** R8. Advances R3, R4. + +**Dependencies:** U2 (wire fields exist). + +**Files:** +- `src/types/index.ts` (modify — `TaskEventPayload` + `AgentTask`) +- `src/stores/agent-runs.ts` (modify — `task_awaiting_input` reduction; keep `AGENT_RUN_EVENT_TYPES` in sync if touched) +- `src/stores/agent-runs.test.ts` (new) +- `package.json` / `vitest.config.ts` (new — establish a vitest runner) + +**Approach:** +- Add `pendingQuestionKind?: "input" | "select" | "confirm"` and + `pendingQuestionOptions?: string[]` to the client `TaskEventPayload` and + `AgentTask`. +- In the `task_awaiting_input` reducer case, set the new fields next to + `pendingQuestion`. Ensure the snapshot-apply path (`applySnapshot`) carries + them too, so a snapshot refetch doesn't clobber them (per the residual-findings + flicker window). +- Establish a vitest runner — the reducer is currently untested and no frontend + test runner exists. This is a prerequisite for testing the feature-bearing + reducer change, not adjacent cleanup. + +**Patterns to follow:** the existing `pendingQuestion` field on both types and +its reducer assignment; the existing snapshot vs event reconcile logic. + +**Test scenarios:** +- Covers AE1. `task_awaiting_input` with kind `select` + options reduces to an + `AgentTask` carrying both. +- Covers AE2. `task_awaiting_input` with no kind reduces to free-text (kind + undefined), backward-compatible. +- Snapshot path: applying a snapshot that contains an awaiting-input task + preserves kind/options (no clobber after a seq-gap refetch). +- Event ordering: a later `task_started` for the same task clears the pending + fields as today. + +**Execution note:** Establish the vitest runner first, then add the reducer +test (test-first for the new field reduction). + +**Verification:** `npx vitest run` passes; `npx tsc --noEmit` clean. + +--- + +### U4. Render typed prompt controls and wire answers back + +**Goal:** Replace the plain-text `pendingQuestion` rendering with interactive +controls — text field, choice buttons, or confirm — each with an "Other" +free-text fallback, answered in place and routed through the existing steer +path. + +**Requirements:** R1, R2, R3, R4, R5. Advances F1; A1, A2. + +**Dependencies:** U3 (store carries kind/options). + +**Files:** +- `src/components/super-threads/AgentThreadDrawer.tsx` (modify — render controls; reuse the existing composer as the "Other" / free-text input) +- `src/components/super-threads/AgentTaskCard.tsx` (modify — reflect kind in the inline summary) +- `src/components/super-threads/ThreadDrawerPanel.tsx` (modify if the answer wiring needs the chosen value) +- relevant CSS (e.g. the `q-pending-q` / `q-drawer-composer` styles) +- `src/components/super-threads/AgentThreadDrawer.test.tsx` (new) + +**Approach:** +- Branch the awaiting-input render on `pendingQuestionKind`: + - `input`/undefined → existing text field + send (R2). + - `select` → a button per option (R3); selecting one sends that option's text + via `steer` (the existing `onSend` → `steer` → `sendSteer` path). Keep the + composer visible as the "Other" fallback (R3). + - `confirm` → affirm / decline controls (R4); each sends its value via the + same path. +- On submit/selection, reuse `steer(sessionId, agentId, value)` — no new + transport. The backend `RouteOrEnqueue` already turns it into + `ExtensionUIResponse` when the task is `awaiting_input` (R5), and the answer is + also posted as the human's chat message as today. +- Confirm the exact value Pi expects for select/confirm answers during U1 + verification (option text vs index; "yes"/"no" vs boolean) and send that. + +**Patterns to follow:** the existing composer (`val`/`send()`/`onSend`) in +`AgentThreadDrawer.tsx`; the `awaiting_input` render branch in both components; +`AgentAvatar` + agent color usage. + +**Test scenarios:** +- Covers AE1. A `select` task renders three option buttons plus a free-text + field; clicking "Vue" calls `steer` with "Vue"; typing "Solid" in Other calls + `steer` with "Solid". +- Covers AE2. A free-text task renders a single text field; submitting calls + `steer` with the typed value. +- Covers AE5. A `confirm` task renders affirm/decline; declining calls `steer` + with the decline value. +- Empty/whitespace input is not sendable (send disabled), matching today. +- The card summary reflects the question across kinds without rendering JSON. + +**Verification:** Manual run of a real `select`/`confirm`/free-text question in +a session — each renders the right control, answering resumes the agent, and the +awaiting-input state clears. + +--- + +### U5. Harden the silent extension-install failure + +**Goal:** Make a failed `ask_user` extension install loud and visible instead of +silently leaving the agent unable to ask, which is what lets a narrated leak +happen. + +**Requirements:** R10. Advances R9. + +**Dependencies:** none. + +**Files:** +- `server/internal/workspace/manager.go` (`InstallPiExtension`, ~lines 592–616) +- `server/internal/handler/workspace.go` and/or `server/internal/handler/sessions.go` (the `provisionAgentTools` call sites — surface the failure to the session) +- relevant manager/handler test file + +**Approach:** +- Escalate the install failure from `slog.Warn` to `slog.Error` and return/ + propagate a signal so the caller can surface it (rather than swallowing to + `nil`). Keep provisioning non-fatal for the *workspace* (the session still + comes up) but no longer silent. +- Surface a session-visible notice that the agent cannot ask questions + (mechanism: a system/agent chat message or an existing notice channel — choose + the lightest existing surface during implementation). +- Preserve the base64-over-the-wire install command (shell-quoting safety). + +**Patterns to follow:** the existing `logFn` warning in `InstallPiExtension`; how +other provisioning failures are surfaced, if any; the residual-findings note +about a shared `InstallPi`/`InstallTools` installer (coordinate but do not +expand scope into that refactor). + +**Test scenarios:** +- Install exec failure logs at error level and returns a non-nil signal (not + swallowed). +- Workspace/session still reaches ready state on install failure (non-fatal + preserved). +- The session receives a visible notice when the install fails. +- Success path is unchanged and emits no spurious notice. + +**Verification:** Simulate an install failure (e.g. force the exec to fail) and +confirm the error log + session notice appear and the agent does not silently +narrate questions. + +--- + +### U6. Backstop: sanitize narrated tool-call text before it posts + +**Goal:** Detect a reply that is shaped like an `ask_user` tool call and turn it +into a clean question (or, if unparseable, readable prose) before it reaches +chat — so a narrated question is never shown as raw JSON. + +**Requirements:** R9, R11, R12. Advances F2. + +**Dependencies:** none (independent of U1–U4). + +**Files:** +- `server/internal/agent/runtime.go` (intercept in `finalizeLocked` between `takeReply` and `replyPoster`) +- a small detector/sanitizer helper (new file under `server/internal/agent/`, e.g. `agent/question_backstop.go`) +- corresponding `_test.go` + +**Approach:** +- Add a detector that recognizes a reply whose content is dominated by an + `ask_user(...)` / `Ask_user({...})` tool-call shape (case-insensitive, + tolerant of whitespace/escaping). +- When matched and the `question` value is extractable, replace the posted reply + with the clean question text (R11). The user answers via the normal composer; + this is the clean-text floor, not a synthesized interactive prompt. +- When matched but unparseable, strip to readable prose — never post a JSON + fragment (R12). +- When not matched, post the reply unchanged. +- Scope the pattern narrowly to avoid false positives on prose that merely + mentions `ask_user` (e.g. require the call-shape to be the substantive content + of the reply, not an inline mention). + +**Patterns to follow:** the `takeReply` → `replyPoster` call sequence in +`finalizeLocked`; existing helper/test layout in `server/internal/agent/`. + +**Test scenarios:** +- Covers AE3 (reframed). A reply of + `Ask_user({"question":"What file should I create?"})` posts as the clean + question "What file should I create?", not the JSON. +- Covers AE4. A truncated/garbled tool-call reply posts as readable prose, never + a JSON fragment. +- Negative: a normal reply that mentions the words "ask_user" in a sentence is + posted unchanged (no false positive). +- Negative: an empty reply / fallback "(agent finished without a text + response.)" path is unaffected. +- A reply that is partly prose and partly a trailing tool-call shape extracts + the question without dropping meaningful prose (or posts clean text per the + chosen rule — pin the rule in the test). + +**Verification:** Unit tests pass; in a live session, force a narrated +`ask_user` (e.g. before/without the extension) and confirm the chat shows a +clean question, never JSON. + +--- + +## Scope Boundaries + +**Deferred for later** +- Multi-question batches (several questions in one prompt) — one question per + `awaiting_input`. +- Surfacing prompts in the main chat timeline as a distinct message type — they + stay scoped to the agent thread (the U6 backstop's clean-text floor is the one + exception, and it is a sanitized chat message, not a thread prompt). + +**Outside this product's identity** +- Any typed-prompt or backstop work inside the legacy `claude` executor — that + harness is being removed (see [[legacy-claude-harness-removal]] / origin). All + units here target the Pi path only. + +**Deferred to Follow-Up Work** +- Extracting the shared `InstallPi` / `InstallTools` / `InstallPiExtension` + installer (residual-findings P2) — touch-adjacent to U5 but out of scope for + this plan unless it falls out naturally. + +--- + +## Dependencies / Assumptions + +- **Pi `ctx.ui` API shape is unverified locally.** U1 must verify whether + `select`/`confirm` primitives exist; the fallback path keeps the feature + shippable either way. +- **No frontend test runner exists yet.** U3 establishes vitest; U3/U4 test + scenarios depend on it. +- **The answer transport is unchanged.** Choice/confirm answers ride the + existing `steer` WS path and resolve as `ExtensionUIResponse` keyed by the + tracked request id (KTD15), under the per-key lock (KTD9), clearing the + awaiting-input ceiling (KTD8). Verify the exact `response` value Pi expects for + select/confirm during U1. +- **KTD constraints honored:** KTD2 (decoder tolerates drift), KTD6 (kind/options + stay in the AgentRunEvent seq family, not `session_update`), KTD14 (the answer + path stays membership-gated as today). + +--- + +## Outstanding Questions + +**Deferred to Planning → resolved** +- Backstop placement → Pi reply-finalize path (legacy harness removed). +- Install-failure contract → loud (error log + session notice), non-fatal. + +**Deferred to Implementation** +- Exact Pi `ctx.ui.select`/`confirm` request + response shapes (verify against + the live API in U1). +- The precise lightest session-visible surface for the U5 install-failure notice. +- The exact extraction rule for a mixed prose+tool-call reply in U6 (pin via + test). + +--- + +## Sources & Research + +- Origin requirements: `docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md` +- Pi harness KTDs: `docs/plans/2026-06-03-002-feat-pi-harness-integration-plan.md` (KTD15 ask-user mechanism, KTD8 ceiling, KTD9 lock, KTD6 event family) +- Residual findings (test gaps, install dedup): `docs/residual-review-findings/feat-pi-harness-integration.md` +- Answer-path trace: `server/internal/handler/websocket.go` (`handleSteer`), `server/internal/agent/runtime.go` (`RouteOrEnqueue`, `setPending`/`pendingRequest`/`clearPending`, `finalizeLocked`), `server/internal/agent/pirun/protocol.go` (`ExtensionUIResponse`) +- Render sites: `src/components/super-threads/AgentThreadDrawer.tsx`, `AgentTaskCard.tsx`; reducer `src/stores/agent-runs.ts`; payload `server/internal/ws/events.go` From e8ba462a89fd1922b7561dab908e40f4d4fddc72 Mon Sep 17 00:00:00 2001 From: Clint Berry Date: Mon, 8 Jun 2026 23:48:32 +0000 Subject: [PATCH 2/7] feat(agent): carry typed question kind + options through the Pi pipeline U1: ask_user extension accepts optional kind (input/select/confirm) and options, feature-detecting ctx.ui.select/confirm with an input fallback. U2: decoder extracts options; kind+options persist on tasks (migration 012) and flow through SetAwaitingInput, the task_awaiting_input WS payload, and the snapshot response so a reconnect reconstructs the typed prompt. --- server/internal/agent/dbstore.go | 13 ++- server/internal/agent/pirun/decoder.go | 24 ++++-- server/internal/agent/pirun/decoder_test.go | 28 +++++++ .../agent/pirun/extension/ask-user.ts | 79 ++++++++++++++++++- server/internal/agent/runtime.go | 5 +- server/internal/agent/runtime_test.go | 2 +- server/internal/agent/store.go | 6 +- .../db/migrations/012_task_question_kind.sql | 15 ++++ server/internal/db/models.go | 28 ++++--- server/internal/db/queries/tasks.sql | 6 +- server/internal/db/tasks.sql.go | 38 ++++++--- server/internal/handler/agent_run.go | 15 ++-- server/internal/ws/events.go | 8 +- 13 files changed, 214 insertions(+), 53 deletions(-) create mode 100644 server/internal/db/migrations/012_task_question_kind.sql diff --git a/server/internal/agent/dbstore.go b/server/internal/agent/dbstore.go index 98d4079..51164d4 100644 --- a/server/internal/agent/dbstore.go +++ b/server/internal/agent/dbstore.go @@ -135,13 +135,22 @@ func (s *DBStore) CompleteAction(ctx context.Context, sessionID, taskID, callID, }) } -func (s *DBStore) SetAwaitingInput(ctx context.Context, sessionID, taskID, question string) (int64, error) { +func (s *DBStore) SetAwaitingInput(ctx context.Context, sessionID, taskID, question, kind string, options []string) (int64, error) { tid, err := uuid.Parse(taskID) if err != nil { return 0, err } + if options == nil { + options = []string{} + } return s.withSeq(ctx, sessionID, func(q *db.Queries, seq int64) error { - return q.SetTaskAwaitingInput(ctx, db.SetTaskAwaitingInputParams{ID: tid, PendingQuestion: question, Seq: seq}) + return q.SetTaskAwaitingInput(ctx, db.SetTaskAwaitingInputParams{ + ID: tid, + PendingQuestion: question, + PendingQuestionKind: kind, + PendingQuestionOptions: options, + Seq: seq, + }) }) } diff --git a/server/internal/agent/pirun/decoder.go b/server/internal/agent/pirun/decoder.go index a925157..41e0775 100644 --- a/server/internal/agent/pirun/decoder.go +++ b/server/internal/agent/pirun/decoder.go @@ -58,8 +58,9 @@ type Event struct { // Awaiting-input (KindAwaitingInput). RequestID string - RequestKind string // select / confirm / input / editor + RequestKind string // select / confirm / input / editor Prompt string + Options []string // choice labels for a select request (empty otherwise) // Command reply (KindCommandReply). Command string @@ -191,13 +192,15 @@ func decodeUIRequest(line []byte) (Event, error) { // extension (U12) lands; decode best-effort by id + common prompt/kind keys // so the awaiting-input transition fires regardless of minor field naming. var p struct { - ID string `json:"id"` - Method string `json:"method"` - Kind string `json:"kind"` - Prompt string `json:"prompt"` - Params struct { - Prompt string `json:"prompt"` - Message string `json:"message"` + ID string `json:"id"` + Method string `json:"method"` + Kind string `json:"kind"` + Prompt string `json:"prompt"` + Options []string `json:"options"` + Params struct { + Prompt string `json:"prompt"` + Message string `json:"message"` + Options []string `json:"options"` } `json:"params"` } if err := json.Unmarshal(line, &p); err != nil { @@ -205,12 +208,17 @@ func decodeUIRequest(line []byte) (Event, error) { } prompt := firstNonEmpty(p.Prompt, p.Params.Prompt, p.Params.Message) kind := firstNonEmpty(p.Kind, p.Method) + options := p.Options + if len(options) == 0 { + options = p.Params.Options + } return Event{ Kind: KindAwaitingInput, RawType: "extension_ui_request", RequestID: p.ID, RequestKind: kind, Prompt: prompt, + Options: options, }, nil } diff --git a/server/internal/agent/pirun/decoder_test.go b/server/internal/agent/pirun/decoder_test.go index fc77a94..f641dda 100644 --- a/server/internal/agent/pirun/decoder_test.go +++ b/server/internal/agent/pirun/decoder_test.go @@ -139,6 +139,34 @@ func TestDecodeExtensionUIRequest(t *testing.T) { if ev.RequestID != "ui-7" || ev.RequestKind != "input" || ev.Prompt != "Which environment?" { t.Errorf("ui request decoded as %+v", ev) } + if len(ev.Options) != 0 { + t.Errorf("free-text request should carry no options, got %v", ev.Options) + } +} + +func TestDecodeExtensionUIRequestSelectOptions(t *testing.T) { + // A select-kind request carries choice options; decode them best-effort + // whether they ride top-level or under params. + ev, err := Decode([]byte(`{"type":"extension_ui_request","id":"ui-9","kind":"select","prompt":"Which framework?","options":["React","Vue","Svelte"]}`)) + if err != nil { + t.Fatalf("decode select request: %v", err) + } + if ev.Kind != KindAwaitingInput || ev.RequestKind != "select" { + t.Fatalf("kind=%q requestKind=%q, want awaiting_input/select", ev.Kind, ev.RequestKind) + } + if got := ev.Options; len(got) != 3 || got[0] != "React" || got[2] != "Svelte" { + t.Errorf("options = %v, want [React Vue Svelte]", got) + } +} + +func TestDecodeExtensionUIRequestParamsOptions(t *testing.T) { + ev, err := Decode([]byte(`{"type":"extension_ui_request","id":"ui-10","method":"select","params":{"prompt":"Pick","options":["a","b"]}}`)) + if err != nil { + t.Fatalf("decode: %v", err) + } + if len(ev.Options) != 2 || ev.Options[1] != "b" || ev.Prompt != "Pick" { + t.Errorf("params-options request decoded as %+v", ev) + } } func TestNormalizeTool(t *testing.T) { diff --git a/server/internal/agent/pirun/extension/ask-user.ts b/server/internal/agent/pirun/extension/ask-user.ts index 0914d4f..68bf53f 100644 --- a/server/internal/agent/pirun/extension/ask-user.ts +++ b/server/internal/agent/pirun/extension/ask-user.ts @@ -2,12 +2,21 @@ // // Pi has no native "agent is waiting on the human" event (verified in the U1 // spike). This extension gives the agent a blocking `ask_user` tool: when the -// agent calls it, ctx.ui.input emits an `extension_ui_request` on the RPC +// agent calls it, a ctx.ui primitive emits an `extension_ui_request` on the RPC // stdout stream and blocks until the client sends a matching // `extension_ui_response`. The Deuce runtime maps that request to the task's // `awaiting_input` state and routes the human's drawer reply back as the // response (KTD15 / R7 / R16 / AE3). // +// The tool optionally carries a `kind` (free-text / pick-one / confirm) and, +// for choice kinds, an `options` list, so the client can render a typed prompt +// (text field / buttons / yes-no) instead of a bare text box. `kind`/`options` +// are additive: omitting them preserves the original free-text behavior. The +// richer ctx.ui primitives (select/confirm) are feature-detected at runtime — +// when the running Pi build does not expose them, the tool falls back to +// ctx.ui.input with the options enumerated in the prompt. Either way it returns +// the answer as plain text and never emits raw JSON to the user. +// // Auto-discovered when placed at ~/.pi/agent/extensions/ in the container. import type { ExtensionAPI } from "@earendil-works/pi-coding-agent"; @@ -21,11 +30,31 @@ export default function (pi: ExtensionAPI) { "Ask the human a clarifying question and block until they answer. " + "Use this whenever you are blocked on a decision only the user can make " + "(ambiguous requirements, a risky action needing approval, missing " + - "context) instead of guessing. Returns the user's answer as text.", + "context) instead of guessing. " + + "Set kind to 'select' and provide options when the answer is one of a " + + "small set of choices, or kind 'confirm' for a yes/no decision; omit " + + "kind for an open-ended text answer. Returns the user's answer as text.", parameters: Type.Object({ question: Type.String({ description: "The question to ask the user, phrased clearly.", }), + kind: Type.Optional( + Type.Union([ + Type.Literal("input"), + Type.Literal("select"), + Type.Literal("confirm"), + ], { + description: + "How the user answers: 'input' (free text, default), 'select' " + + "(pick one of options), or 'confirm' (yes/no).", + }), + ), + options: Type.Optional( + Type.Array(Type.String(), { + description: + "Choices to offer when kind is 'select'. Ignored for other kinds.", + }), + ), }), async execute(toolCallId, params, signal, onUpdate, ctx) { // In headless contexts with no UI channel, don't block forever — tell the @@ -42,9 +71,51 @@ export default function (pi: ExtensionAPI) { }; } - const answer = await ctx.ui.input("A question for you", params.question); + const ui = ctx.ui as Record; + const options = Array.isArray(params.options) ? params.options : []; + // Infer select when options were supplied without an explicit kind. + const kind = + params.kind ?? (options.length > 0 ? "select" : "input"); + + const text = (answer: unknown): string => + answer == null ? "" : String(answer); + + let answer: unknown; + if (kind === "select" && options.length > 0) { + if (typeof ui.select === "function") { + answer = await (ui.select as ( + title: string, + prompt: string, + options: string[], + ) => Promise)("A question for you", params.question, options); + } else { + // Fallback: enumerate the options in a text prompt. The answer is + // still plain text — never JSON. + const list = options.map((o, i) => `${i + 1}. ${o}`).join("\n"); + answer = await ctx.ui.input( + "A question for you", + `${params.question}\n\nOptions:\n${list}`, + ); + } + } else if (kind === "confirm") { + if (typeof ui.confirm === "function") { + const ok = await (ui.confirm as ( + title: string, + prompt: string, + ) => Promise)("A question for you", params.question); + answer = ok ? "yes" : "no"; + } else { + answer = await ctx.ui.input( + "A question for you", + `${params.question} (yes/no)`, + ); + } + } else { + answer = await ctx.ui.input("A question for you", params.question); + } + return { - content: [{ type: "text", text: answer ?? "" }], + content: [{ type: "text", text: text(answer) }], details: {}, }; }, diff --git a/server/internal/agent/runtime.go b/server/internal/agent/runtime.go index c28d3e3..e2b867f 100644 --- a/server/internal/agent/runtime.go +++ b/server/internal/agent/runtime.go @@ -328,7 +328,7 @@ func (r *Runtime) translate(key pirun.Key, ev pirun.Event) { case pirun.KindAssistantText: r.appendReply(taskID, ev.Text) case pirun.KindAwaitingInput: - seq, err := r.store.SetAwaitingInput(ctx, key.SessionID, taskID, ev.Prompt) + seq, err := r.store.SetAwaitingInput(ctx, key.SessionID, taskID, ev.Prompt, ev.RequestKind, ev.Options) if err != nil { slog.Error("runtime: set awaiting input", "task", taskID, "error", err) return @@ -336,7 +336,8 @@ func (r *Runtime) translate(key pirun.Key, ev pirun.Event) { r.setPending(taskID, ev.RequestID) r.enterAwaiting(key, taskID) // suspend active timeout, start ceiling (KTD8) r.broadcastTask(ws.TypeTaskAwaitingInput, ws.TaskEventPayload{ - Seq: seq, TaskID: taskID, AgentID: key.AgentID, State: StateAwaitingInput, PendingQuestion: ev.Prompt, + Seq: seq, TaskID: taskID, AgentID: key.AgentID, State: StateAwaitingInput, + PendingQuestion: ev.Prompt, PendingQuestionKind: ev.RequestKind, PendingQuestionOptions: ev.Options, }, key.SessionID) case pirun.KindRunCompleted: unlock := r.keys.lock(key) diff --git a/server/internal/agent/runtime_test.go b/server/internal/agent/runtime_test.go index 98ab8ec..dc54976 100644 --- a/server/internal/agent/runtime_test.go +++ b/server/internal/agent/runtime_test.go @@ -66,7 +66,7 @@ func (s *fakeStore) setState(sessionID, taskID, state string) int64 { func (s *fakeStore) MarkRunning(_ context.Context, sessionID, taskID string) (int64, error) { return s.setState(sessionID, taskID, StateRunning), nil } -func (s *fakeStore) SetAwaitingInput(_ context.Context, sessionID, taskID, _ string) (int64, error) { +func (s *fakeStore) SetAwaitingInput(_ context.Context, sessionID, taskID, _, _ string, _ []string) (int64, error) { return s.setState(sessionID, taskID, StateAwaitingInput), nil } func (s *fakeStore) ResolveAwaitingInput(_ context.Context, sessionID, taskID string) (int64, error) { diff --git a/server/internal/agent/store.go b/server/internal/agent/store.go index bafad31..8381040 100644 --- a/server/internal/agent/store.go +++ b/server/internal/agent/store.go @@ -27,8 +27,10 @@ type Store interface { CompleteAction(ctx context.Context, sessionID, taskID, callID, text string, isError bool) (seq int64, err error) // SetAwaitingInput transitions running→awaiting_input with the pending - // question and returns the event seq. - SetAwaitingInput(ctx context.Context, sessionID, taskID, question string) (seq int64, err error) + // question and returns the event seq. kind is the question type (input / + // select / confirm; empty means free-text input) and options are the choice + // labels for a select question (nil otherwise). + SetAwaitingInput(ctx context.Context, sessionID, taskID, question, kind string, options []string) (seq int64, err error) // ResolveAwaitingInput transitions awaiting_input→running and returns the seq. ResolveAwaitingInput(ctx context.Context, sessionID, taskID string) (seq int64, err error) diff --git a/server/internal/db/migrations/012_task_question_kind.sql b/server/internal/db/migrations/012_task_question_kind.sql new file mode 100644 index 0000000..0051a07 --- /dev/null +++ b/server/internal/db/migrations/012_task_question_kind.sql @@ -0,0 +1,15 @@ +-- +goose Up + +-- Typed agent questions: a question carries a kind (free-text / pick-one / +-- confirm) and, for pick-one, the offered options, so the client can render a +-- typed prompt instead of a bare text box. Persisted alongside pending_question +-- so a snapshot refetch (seq-gap reconcile, reconnect) reconstructs the typed +-- prompt rather than degrading it to free text. Empty kind ('') means free-text +-- input — the backward-compatible default for questions that predate this +-- column or omit the kind. +ALTER TABLE tasks ADD COLUMN pending_question_kind TEXT NOT NULL DEFAULT ''; +ALTER TABLE tasks ADD COLUMN pending_question_options TEXT[] NOT NULL DEFAULT '{}'; + +-- +goose Down +ALTER TABLE tasks DROP COLUMN pending_question_options; +ALTER TABLE tasks DROP COLUMN pending_question_kind; \ No newline at end of file diff --git a/server/internal/db/models.go b/server/internal/db/models.go index 646657b..ed895a2 100644 --- a/server/internal/db/models.go +++ b/server/internal/db/models.go @@ -88,19 +88,21 @@ type SessionMember struct { } type Task struct { - ID uuid.UUID `json:"id"` - SessionID uuid.UUID `json:"session_id"` - AgentID uuid.UUID `json:"agent_id"` - RequestedBy pgtype.UUID `json:"requested_by"` - AnchorMessageID pgtype.UUID `json:"anchor_message_id"` - Prompt string `json:"prompt"` - State string `json:"state"` - Seq int64 `json:"seq"` - PendingQuestion string `json:"pending_question"` - Reply string `json:"reply"` - Work []byte `json:"work"` - CreatedAt time.Time `json:"created_at"` - UpdatedAt time.Time `json:"updated_at"` + ID uuid.UUID `json:"id"` + SessionID uuid.UUID `json:"session_id"` + AgentID uuid.UUID `json:"agent_id"` + RequestedBy pgtype.UUID `json:"requested_by"` + AnchorMessageID pgtype.UUID `json:"anchor_message_id"` + Prompt string `json:"prompt"` + State string `json:"state"` + Seq int64 `json:"seq"` + PendingQuestion string `json:"pending_question"` + Reply string `json:"reply"` + Work []byte `json:"work"` + CreatedAt time.Time `json:"created_at"` + UpdatedAt time.Time `json:"updated_at"` + PendingQuestionKind string `json:"pending_question_kind"` + PendingQuestionOptions []string `json:"pending_question_options"` } type TaskAction struct { diff --git a/server/internal/db/queries/tasks.sql b/server/internal/db/queries/tasks.sql index 240b519..d21cf09 100644 --- a/server/internal/db/queries/tasks.sql +++ b/server/internal/db/queries/tasks.sql @@ -25,13 +25,13 @@ SELECT * FROM tasks WHERE id = $1; UPDATE tasks SET state = $2, seq = $3, updated_at = now() WHERE id = $1; -- name: SetTaskAwaitingInput :exec -UPDATE tasks SET state = 'awaiting_input', pending_question = $2, seq = $3, updated_at = now() WHERE id = $1; +UPDATE tasks SET state = 'awaiting_input', pending_question = $2, pending_question_kind = $3, pending_question_options = $4, seq = $5, updated_at = now() WHERE id = $1; -- name: ResolveTaskInput :exec -UPDATE tasks SET state = 'running', pending_question = '', seq = $2, updated_at = now() WHERE id = $1; +UPDATE tasks SET state = 'running', pending_question = '', pending_question_kind = '', pending_question_options = '{}', seq = $2, updated_at = now() WHERE id = $1; -- name: FinishTask :exec -UPDATE tasks SET state = $2, reply = $3, work = $4, pending_question = '', seq = $5, updated_at = now() WHERE id = $1; +UPDATE tasks SET state = $2, reply = $3, work = $4, pending_question = '', pending_question_kind = '', pending_question_options = '{}', seq = $5, updated_at = now() WHERE id = $1; -- name: AppendAction :exec -- Idempotent on (task_id, call_id): a replayed tool-start after re-attach is a diff --git a/server/internal/db/tasks.sql.go b/server/internal/db/tasks.sql.go index cab52bb..2a5201a 100644 --- a/server/internal/db/tasks.sql.go +++ b/server/internal/db/tasks.sql.go @@ -110,7 +110,7 @@ func (q *Queries) CompleteAction(ctx context.Context, arg CompleteActionParams) const createTask = `-- name: CreateTask :one INSERT INTO tasks (session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq) VALUES ($1, $2, $3, $4, $5, $6, $7) -RETURNING id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at +RETURNING id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at, pending_question_kind, pending_question_options ` type CreateTaskParams struct { @@ -148,6 +148,8 @@ func (q *Queries) CreateTask(ctx context.Context, arg CreateTaskParams) (Task, e &i.Work, &i.CreatedAt, &i.UpdatedAt, + &i.PendingQuestionKind, + &i.PendingQuestionOptions, ) return i, err } @@ -164,7 +166,7 @@ func (q *Queries) FailStuckTasks(ctx context.Context) error { } const finishTask = `-- name: FinishTask :exec -UPDATE tasks SET state = $2, reply = $3, work = $4, pending_question = '', seq = $5, updated_at = now() WHERE id = $1 +UPDATE tasks SET state = $2, reply = $3, work = $4, pending_question = '', pending_question_kind = '', pending_question_options = '{}', seq = $5, updated_at = now() WHERE id = $1 ` type FinishTaskParams struct { @@ -214,7 +216,7 @@ func (q *Queries) GetPiSessionID(ctx context.Context, arg GetPiSessionIDParams) } const getTask = `-- name: GetTask :one -SELECT id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at FROM tasks WHERE id = $1 +SELECT id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at, pending_question_kind, pending_question_options FROM tasks WHERE id = $1 ` func (q *Queries) GetTask(ctx context.Context, id uuid.UUID) (Task, error) { @@ -234,6 +236,8 @@ func (q *Queries) GetTask(ctx context.Context, id uuid.UUID) (Task, error) { &i.Work, &i.CreatedAt, &i.UpdatedAt, + &i.PendingQuestionKind, + &i.PendingQuestionOptions, ) return i, err } @@ -258,7 +262,7 @@ func (q *Queries) IsSessionMember(ctx context.Context, arg IsSessionMemberParams } const listAgentTasks = `-- name: ListAgentTasks :many -SELECT id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at FROM tasks WHERE session_id = $1 AND agent_id = $2 ORDER BY created_at ASC +SELECT id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at, pending_question_kind, pending_question_options FROM tasks WHERE session_id = $1 AND agent_id = $2 ORDER BY created_at ASC ` type ListAgentTasksParams struct { @@ -291,6 +295,8 @@ func (q *Queries) ListAgentTasks(ctx context.Context, arg ListAgentTasksParams) &i.Work, &i.CreatedAt, &i.UpdatedAt, + &i.PendingQuestionKind, + &i.PendingQuestionOptions, ); err != nil { return nil, err } @@ -346,7 +352,7 @@ func (q *Queries) ListSessionTaskActions(ctx context.Context, sessionID uuid.UUI } const listSessionTasks = `-- name: ListSessionTasks :many -SELECT id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at FROM tasks WHERE session_id = $1 ORDER BY created_at ASC +SELECT id, session_id, agent_id, requested_by, anchor_message_id, prompt, state, seq, pending_question, reply, work, created_at, updated_at, pending_question_kind, pending_question_options FROM tasks WHERE session_id = $1 ORDER BY created_at ASC ` func (q *Queries) ListSessionTasks(ctx context.Context, sessionID uuid.UUID) ([]Task, error) { @@ -372,6 +378,8 @@ func (q *Queries) ListSessionTasks(ctx context.Context, sessionID uuid.UUID) ([] &i.Work, &i.CreatedAt, &i.UpdatedAt, + &i.PendingQuestionKind, + &i.PendingQuestionOptions, ); err != nil { return nil, err } @@ -435,7 +443,7 @@ func (q *Queries) PeekEventSeq(ctx context.Context, sessionID uuid.UUID) (int64, } const resolveTaskInput = `-- name: ResolveTaskInput :exec -UPDATE tasks SET state = 'running', pending_question = '', seq = $2, updated_at = now() WHERE id = $1 +UPDATE tasks SET state = 'running', pending_question = '', pending_question_kind = '', pending_question_options = '{}', seq = $2, updated_at = now() WHERE id = $1 ` type ResolveTaskInputParams struct { @@ -449,17 +457,25 @@ func (q *Queries) ResolveTaskInput(ctx context.Context, arg ResolveTaskInputPara } const setTaskAwaitingInput = `-- name: SetTaskAwaitingInput :exec -UPDATE tasks SET state = 'awaiting_input', pending_question = $2, seq = $3, updated_at = now() WHERE id = $1 +UPDATE tasks SET state = 'awaiting_input', pending_question = $2, pending_question_kind = $3, pending_question_options = $4, seq = $5, updated_at = now() WHERE id = $1 ` type SetTaskAwaitingInputParams struct { - ID uuid.UUID `json:"id"` - PendingQuestion string `json:"pending_question"` - Seq int64 `json:"seq"` + ID uuid.UUID `json:"id"` + PendingQuestion string `json:"pending_question"` + PendingQuestionKind string `json:"pending_question_kind"` + PendingQuestionOptions []string `json:"pending_question_options"` + Seq int64 `json:"seq"` } func (q *Queries) SetTaskAwaitingInput(ctx context.Context, arg SetTaskAwaitingInputParams) error { - _, err := q.db.Exec(ctx, setTaskAwaitingInput, arg.ID, arg.PendingQuestion, arg.Seq) + _, err := q.db.Exec(ctx, setTaskAwaitingInput, + arg.ID, + arg.PendingQuestion, + arg.PendingQuestionKind, + arg.PendingQuestionOptions, + arg.Seq, + ) return err } diff --git a/server/internal/handler/agent_run.go b/server/internal/handler/agent_run.go index 754d2db..1ff03cb 100644 --- a/server/internal/handler/agent_run.go +++ b/server/internal/handler/agent_run.go @@ -33,10 +33,12 @@ type agentTaskResp struct { AnchorMessageID string `json:"anchorMessageId,omitempty"` Prompt string `json:"prompt"` State string `json:"state"` - Seq int64 `json:"seq"` - PendingQuestion string `json:"pendingQuestion,omitempty"` - Reply string `json:"reply,omitempty"` - Actions []agentActionResp `json:"actions"` + Seq int64 `json:"seq"` + PendingQuestion string `json:"pendingQuestion,omitempty"` + PendingQuestionKind string `json:"pendingQuestionKind,omitempty"` + PendingQuestionOptions []string `json:"pendingQuestionOptions,omitempty"` + Reply string `json:"reply,omitempty"` + Actions []agentActionResp `json:"actions"` } type agentRunSnapshotResp struct { @@ -123,7 +125,10 @@ func buildSnapshot(tasks []db.Task, actions []db.TaskAction) agentRunSnapshotRes RequestedBy: uuidStr(t.RequestedBy.Bytes, t.RequestedBy.Valid), AnchorMessageID: uuidStr(t.AnchorMessageID.Bytes, t.AnchorMessageID.Valid), Prompt: t.Prompt, State: t.State, Seq: t.Seq, - PendingQuestion: t.PendingQuestion, Reply: t.Reply, Actions: acts, + PendingQuestion: t.PendingQuestion, + PendingQuestionKind: t.PendingQuestionKind, + PendingQuestionOptions: t.PendingQuestionOptions, + Reply: t.Reply, Actions: acts, }) } resp.LatestSeq = latest diff --git a/server/internal/ws/events.go b/server/internal/ws/events.go index 4bed373..ba9f9df 100644 --- a/server/internal/ws/events.go +++ b/server/internal/ws/events.go @@ -61,8 +61,12 @@ type TaskEventPayload struct { State string `json:"state,omitempty"` Position int `json:"position,omitempty"` // queue #N for queued tasks PendingQuestion string `json:"pendingQuestion,omitempty"` // awaiting_input - Reply string `json:"reply,omitempty"` // completed - Status string `json:"status,omitempty"` // completed: done|failed|cancelled + // Typed-question metadata (awaiting_input): kind is input|select|confirm + // (empty means free-text input); options are the choice labels for a select. + PendingQuestionKind string `json:"pendingQuestionKind,omitempty"` + PendingQuestionOptions []string `json:"pendingQuestionOptions,omitempty"` + Reply string `json:"reply,omitempty"` // completed + Status string `json:"status,omitempty"` // completed: done|failed|cancelled } // ActionEventPayload is the JSON payload for action_started / action_completed. From b0773b78392bb540c1e61fc300ffbb0e9694ba1e Mon Sep 17 00:00:00 2001 From: Clint Berry Date: Tue, 9 Jun 2026 00:01:28 +0000 Subject: [PATCH 3/7] feat(super-threads): render typed agent question prompts U3: carry pendingQuestionKind + pendingQuestionOptions through the AgentTask / TaskEventPayload types and the agent-runs reducer (live event + snapshot paths). Adds a Vitest-style reducer spec following the repo's tsc-checked convention (no runner wired yet). U4: the thread drawer renders choice buttons for a select question and yes/no for a confirm, with the composer as the free-text / Other fallback; answers route through the existing steer path. --- .../super-threads/AgentThreadDrawer.tsx | 57 +++++++++++- src/stores/agent-runs.test.ts | 90 +++++++++++++++++++ src/stores/agent-runs.ts | 6 ++ src/styles/globals.css | 29 ++++++ src/types/index.ts | 10 +++ 5 files changed, 191 insertions(+), 1 deletion(-) create mode 100644 src/stores/agent-runs.test.ts diff --git a/src/components/super-threads/AgentThreadDrawer.tsx b/src/components/super-threads/AgentThreadDrawer.tsx index 0ffd08b..d674105 100644 --- a/src/components/super-threads/AgentThreadDrawer.tsx +++ b/src/components/super-threads/AgentThreadDrawer.tsx @@ -51,16 +51,67 @@ function RequesterAvatar({ user }: { user?: Pick }) { ); } +// QuestionControls renders the typed-prompt affordances for an awaiting-input +// task: choice buttons for a select question, yes/no for a confirm, and nothing +// extra for free text (the drawer composer below is the text input, and also +// serves as the "Other" fallback for a select). Answering routes through the +// same steer path as a typed reply (onAnswer → onSend → steer). +function QuestionControls({ + task, + onAnswer, +}: { + task: AgentTask; + onAnswer: (message: string) => void; +}) { + const kind = task.pendingQuestionKind; + const options = task.pendingQuestionOptions ?? []; + + if (kind === "select" && options.length > 0) { + return ( +
+ {options.map((opt) => ( + + ))} + or type another answer below +
+ ); + } + + if (kind === "confirm") { + return ( +
+ + +
+ ); + } + + return null; +} + function Turn({ agent, task, queuePos, lookupUser, + onAnswer, }: { agent: Agent; task: AgentTask; queuePos?: number; lookupUser: UserLookup; + onAnswer: (message: string) => void; }) { const [open, setOpen] = useState(false); const requester = lookupUser(task.requestedBy); @@ -100,7 +151,10 @@ function Turn({ {agent.name} needs your input - {task.pendingQuestion ?? "Reply below to continue."} +
+ {task.pendingQuestion ?? "Reply below to continue."} +
+ )} @@ -246,6 +300,7 @@ export function AgentThreadDrawer({ task={t} queuePos={queuePositions[t.id]} lookupUser={lookupUser} + onAnswer={onSend} /> )) )} diff --git a/src/stores/agent-runs.test.ts b/src/stores/agent-runs.test.ts new file mode 100644 index 0000000..4e97f83 --- /dev/null +++ b/src/stores/agent-runs.test.ts @@ -0,0 +1,90 @@ +import { describe, expect, it } from "vitest"; + +import { applyEvent, applySnapshot } from "./agent-runs"; +import type { AgentRunSnapshot, TaskEventPayload } from "@/types"; + +// NOTE: The frontend has no test runner wired up yet (see CLAUDE.md / repo.test.ts). +// These Vitest-style specs capture the intended behavior and run as soon as a +// runner is added. Until then, `npx tsc --noEmit` keeps them type-checked. + +const empty = { tasks: {}, lastSeq: 0, nextOrder: 0 }; + +function awaitingEvent(extra: Partial): TaskEventPayload { + return { + seq: 1, + taskId: "t1", + agentId: "a1", + pendingQuestion: "Which framework?", + ...extra, + }; +} + +describe("agent-runs reducer — typed questions", () => { + it("carries kind + options on task_awaiting_input (select)", () => { + const { state } = applyEvent( + empty, + "task_awaiting_input", + awaitingEvent({ + pendingQuestionKind: "select", + pendingQuestionOptions: ["React", "Vue", "Svelte"], + }), + ); + const task = state.tasks["t1"]; + expect(task.state).toBe("awaiting_input"); + expect(task.pendingQuestionKind).toBe("select"); + expect(task.pendingQuestionOptions).toEqual(["React", "Vue", "Svelte"]); + }); + + it("is backward-compatible: no kind reduces to free text", () => { + const { state } = applyEvent(empty, "task_awaiting_input", awaitingEvent({})); + const task = state.tasks["t1"]; + expect(task.state).toBe("awaiting_input"); + expect(task.pendingQuestion).toBe("Which framework?"); + expect(task.pendingQuestionKind).toBeUndefined(); + expect(task.pendingQuestionOptions).toBeUndefined(); + }); + + it("clears kind + options when the task resumes (task_started)", () => { + const afterAsk = applyEvent( + empty, + "task_awaiting_input", + awaitingEvent({ + pendingQuestionKind: "confirm", + }), + ).state; + const { state } = applyEvent(afterAsk, "task_started", { + seq: 2, + taskId: "t1", + agentId: "a1", + }); + const task = state.tasks["t1"]; + expect(task.state).toBe("running"); + expect(task.pendingQuestion).toBeUndefined(); + expect(task.pendingQuestionKind).toBeUndefined(); + expect(task.pendingQuestionOptions).toBeUndefined(); + }); + + it("preserves kind + options through a snapshot (reconnect / seq-gap refetch)", () => { + const snapshot: AgentRunSnapshot = { + tasks: [ + { + id: "t1", + sessionId: "s1", + agentId: "a1", + prompt: "@coder build it", + state: "awaiting_input", + seq: 5, + pendingQuestion: "Which framework?", + pendingQuestionKind: "select", + pendingQuestionOptions: ["React", "Vue"], + actions: [], + }, + ], + latestSeq: 5, + }; + const state = applySnapshot(snapshot); + const task = state.tasks["t1"]; + expect(task.pendingQuestionKind).toBe("select"); + expect(task.pendingQuestionOptions).toEqual(["React", "Vue"]); + }); +}); diff --git a/src/stores/agent-runs.ts b/src/stores/agent-runs.ts index fcf2b4e..3d82770 100644 --- a/src/stores/agent-runs.ts +++ b/src/stores/agent-runs.ts @@ -136,14 +136,20 @@ function reduceTask( next.state = "running"; next.position = undefined; next.pendingQuestion = undefined; + next.pendingQuestionKind = undefined; + next.pendingQuestionOptions = undefined; break; case "task_awaiting_input": next.state = "awaiting_input"; next.pendingQuestion = p.pendingQuestion; + next.pendingQuestionKind = p.pendingQuestionKind; + next.pendingQuestionOptions = p.pendingQuestionOptions; break; case "task_completed": next.state = (p.status as TaskState) ?? p.state ?? "done"; next.pendingQuestion = undefined; + next.pendingQuestionKind = undefined; + next.pendingQuestionOptions = undefined; if (p.reply) next.reply = p.reply; break; } diff --git a/src/styles/globals.css b/src/styles/globals.css index 7ca83ee..3f194bf 100644 --- a/src/styles/globals.css +++ b/src/styles/globals.css @@ -696,6 +696,35 @@ button, text-transform: uppercase; letter-spacing: 0.03em; } +.q-pending-q .q-text { + white-space: pre-wrap; +} +/* typed-question controls (select buttons / confirm) */ +.q-choices { + display: flex; + flex-wrap: wrap; + align-items: center; + gap: 6px; + margin-top: 8px; +} +.q-choice { + border: 1px solid color-mix(in srgb, var(--color-warning) 45%, var(--color-border)); + border-radius: var(--radius-md); + background: var(--color-background-input); + color: var(--color-foreground); + padding: 5px 11px; + font-size: 12px; + font-weight: 500; + cursor: pointer; +} +.q-choice:hover { + background: color-mix(in srgb, var(--color-warning) 14%, var(--color-background-input)); + border-color: var(--color-warning); +} +.q-choice-hint { + font-size: 11px; + color: var(--color-foreground-subtle); +} /* drawer composer */ .q-drawer-composer { diff --git a/src/types/index.ts b/src/types/index.ts index 4a991a1..f71a684 100644 --- a/src/types/index.ts +++ b/src/types/index.ts @@ -170,6 +170,10 @@ export interface AgentAction { } // AgentTask is one @mention-spawned agent run, anchored to a channel message. +// How the user answers an awaiting-input question: free text, pick one of +// options, or yes/no. Mirrors the Pi ask_user extension's `kind`. +export type QuestionKind = "input" | "select" | "confirm"; + export interface AgentTask { id: string; sessionId: string; @@ -181,6 +185,10 @@ export interface AgentTask { seq: number; position?: number; // queue #N while queued pendingQuestion?: string; + // Typed-question metadata. kind is undefined/"input" for free text, "select" + // for a pick-one (options populated), or "confirm" for yes/no. + pendingQuestionKind?: QuestionKind; + pendingQuestionOptions?: string[]; reply?: string; actions: AgentAction[]; // order is a client-only stable creation-order index assigned by the reducer @@ -201,6 +209,8 @@ export interface TaskEventPayload { state?: TaskState; position?: number; pendingQuestion?: string; + pendingQuestionKind?: QuestionKind; + pendingQuestionOptions?: string[]; reply?: string; status?: "done" | "failed" | "cancelled"; } From dd6db10ac38fd0fa139502992495a67d49715f17 Mon Sep 17 00:00:00 2001 From: Clint Berry Date: Tue, 9 Jun 2026 00:05:17 +0000 Subject: [PATCH 4/7] fix(workspace): make ask-user extension install failure loud (R10) Escalate a failed InstallPiExtension to error level, return the error so the caller can react, and tell the user via logFn that agents in the session cannot ask questions and may guess instead. Provisioning stays non-fatal. --- server/internal/handler/workspace.go | 5 ++++- server/internal/workspace/manager.go | 17 +++++++++++------ 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/server/internal/handler/workspace.go b/server/internal/handler/workspace.go index 82ffd37..f75e8b3 100644 --- a/server/internal/handler/workspace.go +++ b/server/internal/handler/workspace.go @@ -27,7 +27,10 @@ func (h *Handler) provisionAgentTools(ctx context.Context, workspaceID string, l slog.Warn("pi installation failed", "workspace", workspaceID, "error", err) } if err := h.workspaces.InstallPiExtension(ctx, workspaceID, extension.AskUserFilename, extension.AskUser, logFn); err != nil { - slog.Warn("pi extension installation failed", "workspace", workspaceID, "error", err) + // Loud, not fatal: the workspace still comes up, but the user has been + // told via logFn that agents can't ask questions here (R10). Error level + // so it stands out from routine provisioning warnings. + slog.Error("pi extension installation failed", "workspace", workspaceID, "error", err) } } diff --git a/server/internal/workspace/manager.go b/server/internal/workspace/manager.go index f3d5e1f..9687e8a 100644 --- a/server/internal/workspace/manager.go +++ b/server/internal/workspace/manager.go @@ -591,9 +591,14 @@ func (m *Manager) symlinkPi(ctx context.Context, workspaceID string) { // InstallPiExtension writes a Pi extension file to the container's auto-discovery // path (~/.pi/agent/extensions/) so Pi loads it on launch. Content is -// base64-encoded over the wire to avoid any shell-quoting hazards. Non-fatal: -// without the extension, agents simply lose the ask-user (awaiting-input) -// capability rather than failing the workspace. +// base64-encoded over the wire to avoid any shell-quoting hazards. +// +// A failure is loud, not silent: without the ask-user extension Pi has no way to +// surface a blocking question, so the agent narrates the call as plain text +// (raw `ask_user(...)` in the chat) or guesses instead of asking. The failure is +// logged at error level and surfaced to the user through logFn with the concrete +// consequence, and the error is returned so the caller can react — but +// provisioning stays non-fatal at the call site (the workspace still comes up). func (m *Manager) InstallPiExtension(ctx context.Context, workspaceID, filename, content string, logFn LogFunc) error { encoded := base64.StdEncoding.EncodeToString([]byte(content)) cmd := fmt.Sprintf( @@ -603,10 +608,10 @@ func (m *Manager) InstallPiExtension(ctx context.Context, workspaceID, filename, out, err := m.ExecInWorkspace(ctx, workspaceID, cmd).CombinedOutput() if err != nil { if logFn != nil { - logFn("WARNING: failed to install Pi ask-user extension") + logFn("ERROR: failed to install the Pi ask-user extension — agents in this session cannot ask you questions and may proceed on assumptions instead. Rebuild the workspace to retry.") } - slog.Warn("pi extension install failed", "workspace", workspaceID, "error", err, "output", strings.TrimSpace(string(out))) - return nil // Non-fatal + slog.Error("pi extension install failed", "workspace", workspaceID, "error", err, "output", strings.TrimSpace(string(out))) + return fmt.Errorf("install pi ask-user extension %q: %w", filename, err) } if logFn != nil { logFn(fmt.Sprintf("Pi extension installed: %s", filename)) From 8ddcfbdbb4b8ed798bf683d25a402d5aea1bd141 Mon Sep 17 00:00:00 2001 From: Clint Berry Date: Tue, 9 Jun 2026 00:05:17 +0000 Subject: [PATCH 5/7] feat(agent): backstop narrated ask_user calls so chat never shows raw JSON When the structured extension_ui_request never fires (extension missing, or the model narrates the call as text), the reply arrives shaped like ask_user({"question":"..."}). sanitizeNarratedQuestion rewrites it to the plain question before persist/broadcast/post, preserving surrounding prose and degrading a truncated call to a readable floor rather than a JSON fragment (R9/R11/R12, AE3/AE4). --- server/internal/agent/question_backstop.go | 98 +++++++++++++++++++ .../internal/agent/question_backstop_test.go | 79 +++++++++++++++ server/internal/agent/runtime.go | 4 + 3 files changed, 181 insertions(+) create mode 100644 server/internal/agent/question_backstop.go create mode 100644 server/internal/agent/question_backstop_test.go diff --git a/server/internal/agent/question_backstop.go b/server/internal/agent/question_backstop.go new file mode 100644 index 0000000..ebda23a --- /dev/null +++ b/server/internal/agent/question_backstop.go @@ -0,0 +1,98 @@ +package agent + +import ( + "encoding/json" + "regexp" + "strings" +) + +// The no-JSON backstop (R9/R11/R12). When the structured extension_ui_request +// never fires — the ask-user extension failed to install, or the model narrated +// the call instead of invoking the tool — the question arrives as assistant text +// shaped like `ask_user({"question":"..."})`. Left alone it posts to chat as raw +// JSON, which reads as a broken product. sanitizeNarratedQuestion rewrites that +// text into the plain question before it is persisted, broadcast, or posted. +var ( + // Leak shape: an ask_user call opening with a JSON object. Requires `({` + // so a bare prose mention of "ask_user" never matches. Catches truncated + // calls too (no closing required here — that's the AE4 floor). + askUserLeakRe = regexp.MustCompile(`(?is)ask_user\s*\(\s*\{`) + // A complete `ask_user({...})` call, captured so surrounding prose survives. + askUserCallRe = regexp.MustCompile(`(?is)ask_user\s*\(\s*\{.*\}\s*\)`) + // The JSON object within a matched call. + askUserObjRe = regexp.MustCompile(`(?s)\{.*\}`) + // Best-effort "question" value extraction when the object won't parse as + // JSON (handles a closed string even if the call braces are truncated). + askUserQuestionRe = regexp.MustCompile(`(?is)"question"\s*:\s*"((?:[^"\\]|\\.)*)"`) +) + +const malformedQuestionFloor = "(The agent tried to ask you a question, but the request was malformed.)" + +// looksLikeNarratedQuestion reports whether reply contains an ask_user tool call +// rendered as text rather than a normal prose reply. +func looksLikeNarratedQuestion(reply string) bool { + return askUserLeakRe.MatchString(reply) +} + +// sanitizeNarratedQuestion turns a narrated ask_user call into the plain +// question text. A complete call is replaced in place so any surrounding prose +// is preserved; a truncated/garbled call degrades to the extracted question or, +// failing that, a readable placeholder — never a JSON fragment. A reply with no +// leak shape is returned unchanged. +func sanitizeNarratedQuestion(reply string) string { + if !looksLikeNarratedQuestion(reply) { + return reply + } + if askUserCallRe.MatchString(reply) { + out := strings.TrimSpace(askUserCallRe.ReplaceAllStringFunc(reply, replaceNarratedCall)) + if out != "" { + return out + } + } + // Truncated / garbled call (no complete ({...}) to replace): salvage the + // question if a closed "question" string survives, else the floor. + if q := extractQuestionText(reply); q != "" { + return q + } + return malformedQuestionFloor +} + +// replaceNarratedCall maps one complete ask_user(...) call to its question text, +// or the floor when the object can't yield a question. +func replaceNarratedCall(match string) string { + obj := askUserObjRe.FindString(match) + if obj != "" { + var parsed struct { + Question string `json:"question"` + } + if err := json.Unmarshal([]byte(obj), &parsed); err == nil { + if q := strings.TrimSpace(parsed.Question); q != "" { + return q + } + } + } + if q := extractQuestionText(match); q != "" { + return q + } + return malformedQuestionFloor +} + +// extractQuestionText pulls a "question":"..." value out of arbitrary text by +// regex (a fallback for malformed JSON) and unescapes it. +func extractQuestionText(s string) string { + m := askUserQuestionRe.FindStringSubmatch(s) + if m == nil { + return "" + } + return strings.TrimSpace(unescapeJSONString(m[1])) +} + +// unescapeJSONString decodes JSON string escapes (\n, \", \\, …) in a raw +// captured value, falling back to the input when it isn't decodable. +func unescapeJSONString(s string) string { + var out string + if err := json.Unmarshal([]byte(`"`+s+`"`), &out); err == nil { + return out + } + return s +} diff --git a/server/internal/agent/question_backstop_test.go b/server/internal/agent/question_backstop_test.go new file mode 100644 index 0000000..bbb637e --- /dev/null +++ b/server/internal/agent/question_backstop_test.go @@ -0,0 +1,79 @@ +package agent + +import "testing" + +func TestSanitizeNarratedQuestion(t *testing.T) { + cases := []struct { + name string + reply string + want string + }{ + { + // AE3: the canonical leak — a bare narrated call posts as the question. + name: "bare call", + reply: `Ask_user({"question":"What file should I create?"})`, + want: "What file should I create?", + }, + { + // Lowercase tool name, the literal extension name. + name: "lowercase call", + reply: `ask_user({"question": "Which framework?"})`, + want: "Which framework?", + }, + { + // Escaped newlines in the question are decoded, not shown raw. + name: "escaped multiline question", + reply: `Ask_user({"question":"What file would you like me to create? Please provide:\n1. The filename\n2. The content"})`, + want: "What file would you like me to create? Please provide:\n1. The filename\n2. The content", + }, + { + // Leading prose is preserved; only the call shape is rewritten. + name: "prose then call", + reply: `Sure — let me check. ask_user({"question":"Which env?"})`, + want: "Sure — let me check. Which env?", + }, + { + // AE4: truncated call still yields the question (closed string), no JSON. + name: "truncated but quoted", + reply: `ask_user({"question":"What file?"`, + want: "What file?", + }, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + if got := sanitizeNarratedQuestion(tc.reply); got != tc.want { + t.Errorf("sanitize(%q) = %q, want %q", tc.reply, got, tc.want) + } + }) + } +} + +func TestSanitizeNarratedQuestionFloor(t *testing.T) { + // AE4: a garbled call with no recoverable question degrades to readable + // prose, never a JSON fragment. + got := sanitizeNarratedQuestion(`ask_user({"questi`) + if got != malformedQuestionFloor { + t.Errorf("garbled call = %q, want the readable floor", got) + } + if containsJSONFragment(got) { + t.Errorf("floor should not contain a JSON fragment: %q", got) + } +} + +func TestSanitizeNarratedQuestionPassthrough(t *testing.T) { + cases := []string{ + "", + "(The agent finished without a text response.)", + "I considered using ask_user to confirm, but proceeded with the default.", + "Here is your summary: all tests pass and the build is green.", + } + for _, reply := range cases { + if got := sanitizeNarratedQuestion(reply); got != reply { + t.Errorf("passthrough reply changed: sanitize(%q) = %q", reply, got) + } + } +} + +func containsJSONFragment(s string) bool { + return len(s) > 0 && (s[0] == '{' || s[len(s)-1] == '}') +} diff --git a/server/internal/agent/runtime.go b/server/internal/agent/runtime.go index e2b867f..3a0018c 100644 --- a/server/internal/agent/runtime.go +++ b/server/internal/agent/runtime.go @@ -390,6 +390,10 @@ func (r *Runtime) finalizeLocked(ctx context.Context, key pirun.Key, taskID, sta if ok && isTerminal(cur) { return // already terminal — second signal is a no-op (idempotent, KTD12) } + // No-JSON backstop: if the agent narrated an ask_user tool call as text + // (the ask-user extension didn't fire), rewrite it to the plain question + // before it is persisted, broadcast, and posted to chat (R9/R11/R12). + reply = sanitizeNarratedQuestion(reply) seq, err := r.store.FinishTask(ctx, key.SessionID, taskID, state, reply) if err != nil { slog.Error("runtime: finish task", "task", taskID, "state", state, "error", err) From e28f3632564f8f29a8d091ea5cff200f2204ecb5 Mon Sep 17 00:00:00 2001 From: Clint Berry Date: Tue, 9 Jun 2026 00:05:29 +0000 Subject: [PATCH 6/7] chore: gofmt struct alignment for question kind/options fields --- server/internal/agent/pirun/decoder.go | 2 +- server/internal/handler/agent_run.go | 14 +++++++------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/server/internal/agent/pirun/decoder.go b/server/internal/agent/pirun/decoder.go index 41e0775..a42ea54 100644 --- a/server/internal/agent/pirun/decoder.go +++ b/server/internal/agent/pirun/decoder.go @@ -58,7 +58,7 @@ type Event struct { // Awaiting-input (KindAwaitingInput). RequestID string - RequestKind string // select / confirm / input / editor + RequestKind string // select / confirm / input / editor Prompt string Options []string // choice labels for a select request (empty otherwise) diff --git a/server/internal/handler/agent_run.go b/server/internal/handler/agent_run.go index 1ff03cb..d4ad737 100644 --- a/server/internal/handler/agent_run.go +++ b/server/internal/handler/agent_run.go @@ -26,13 +26,13 @@ type agentActionResp struct { } type agentTaskResp struct { - ID string `json:"id"` - SessionID string `json:"sessionId"` - AgentID string `json:"agentId"` - RequestedBy string `json:"requestedBy,omitempty"` - AnchorMessageID string `json:"anchorMessageId,omitempty"` - Prompt string `json:"prompt"` - State string `json:"state"` + ID string `json:"id"` + SessionID string `json:"sessionId"` + AgentID string `json:"agentId"` + RequestedBy string `json:"requestedBy,omitempty"` + AnchorMessageID string `json:"anchorMessageId,omitempty"` + Prompt string `json:"prompt"` + State string `json:"state"` Seq int64 `json:"seq"` PendingQuestion string `json:"pendingQuestion,omitempty"` PendingQuestionKind string `json:"pendingQuestionKind,omitempty"` From cbdae59bf4bd1f84f75de2dbb7f961a78f66a7f2 Mon Sep 17 00:00:00 2001 From: Clint Berry Date: Tue, 9 Jun 2026 00:06:51 +0000 Subject: [PATCH 7/7] docs: mark interactive-agent-questions plan completed --- .../2026-06-08-001-feat-interactive-agent-questions-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md b/docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md index cacaa6b..17047f2 100644 --- a/docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md +++ b/docs/plans/2026-06-08-001-feat-interactive-agent-questions-plan.md @@ -1,7 +1,7 @@ --- title: "feat: Interactive typed agent questions with no-JSON guarantee" type: feat -status: active +status: completed date: 2026-06-08 origin: docs/brainstorms/2026-06-08-interactive-agent-questions-requirements.md ---