Skip to content

No way to author a HITL / client-resolved tool: authored tools require execute, which yields a duplicate tool_result when the same call parks for input #203

Description

@keesvandorp

No way to author a HITL / client-resolved tool: authored tools require execute, which yields a duplicate tool_result when the same call also parks for input

Summary

There is no first-class way to author a tool that participates in eve's
human-in-the-loop (HITL) input-request flow (input.requested
session.waiting → resume via inputResponses) the way the built-in
ask_question does.

  • The built-in ask_question works because it has no execute — the model
    emits the call, nothing runs it, the harness parks it, and the user's answer
    becomes its single tool_result.

  • Any authored tool is required by the compiler to have an execute
    function. That execute produces a tool_result for the call. If the same
    call also parks for input (e.g. an authored override of ask_question, or any
    attempt at an HITL tool), the resumed turn then carries two tool_result
    blocks for one tool_use id
    , and the provider rejects it:

    GatewayInternalServerError: messages.18.content.1: each tool_use must have a
    single result. Found multiple `tool_result` blocks with id:
    toolu_01QQFSTbsnsdR6qXFt5EFaCS
    

Net effect: you cannot widen/customize ask_question (its input schema is fixed
and strict), and you cannot build your own parking/HITL tool, because authoring
forces an execute whose result collides with the input-response result.

Environment

  • eve 0.11.6
  • Next.js 16 (App Router, Turbopack), useEveAgent over a same-origin
    /eve/v1/* proxy (withEve)
  • Model: anthropic/claude-sonnet-4.6 via AI Gateway (@ai-sdk/gateway)
  • Default harness, conversation mode (ask_question enabled)

Goal / use case

We want rich, typed HITL pickers in a chat UI — e.g. "choose media", "pick a
template", "pick an aspect ratio", a multi-select, or a multi-question form. The
ideal is the exact semantics of ask_question, but with a richer, typed input
schema
so the client can render a dedicated widget from the parked request,
and the user's structured choice resumes the same turn.

ask_question's built-in input schema is fixed and .strict()
({ prompt, options?, allowFreeform? },
runtime/framework-tools/ask-question.jsinputRequestSchema.omit(...)), so
extra fields can't be smuggled through it. The natural approach is to author
agent/tools/ask_question.ts to override it with a superset schema
— which the
graph supports (an authored tool with a framework name replaces the framework
tool, see runtime/resolve-agent-graph.js).

The override's schema half works perfectly: the model calls our widened
ask_question with a typed ui payload, the harness emits input.requested
with our input intact, the client renders the widget, the user answers, and the
turn resumes (turn.started). The only failure is the duplicate
tool_result described below.

Reproduction

  1. Author agent/tools/ask_question.ts overriding the built-in with a wider
    inputSchema (any superset is fine). A no-op execute is required to satisfy
    the compiler:

    import { defineTool } from 'eve/tools'
    import { z } from 'zod'
    
    export default defineTool({
      description: 'Ask the user a question (widened).',
      inputSchema: z.object({
        prompt: z.string(),
        options: z.array(z.object({ id: z.string(), label: z.string() })).optional(),
        allowFreeform: z.boolean().optional(),
        ui: z.object({ kind: z.string() }).passthrough().optional(), // the widening
      }),
      // Required by the compiler. This is the whole problem — see below.
      execute: () => ({ status: 'ignored' as const }),
    })
  2. Have the agent call ask_question (with or without the ui field) and answer
    it from the client via agent.send({ inputResponses: [{ requestId, text }] }).

  3. The turn parks and resumes, then fails on the next model call with
    each tool_use must have a single result … multiple tool_result blocks with id <ask_question callId>.

The built-in ask_question (no override) does not fail — because it has no
execute.

Observed event trace (client onEvent)

… reasoning/message/step events …
input.requested            ← harness parks the ask_question call
turn.completed
session.waiting            ← stream closes, status → ready  ✅ (parking works)
onFinish                   ← client resumes with inputResponses
turn.started               ← resume accepted  ✅
step.started
step.failed                ← model call rejected: duplicate tool_result  ❌
turn.failed
session.failed

So pause/resume is functioning; the failure is purely the two results for one
tool_use
in the reconstructed history sent to the model.

Root-cause analysis (source references, eve 0.11.6)

  1. Authored tools must have an execute (function).
    compiler/normalize-tool.jscompileToolEntry
    normalizeToolDefinition (internal/authored-definition/schema-backed.js)
    calls expectFunction(n.execute, t). Omitting execute (or setting it to
    undefined) fails compilation with "Expected the tool export 'default' …
    to match the public eve shape."
    So an executeless authored tool is
    impossible.

  2. The built-in ask_question has no execute.
    runtime/framework-tools/ask-question.js defines
    ASK_QUESTION_TOOL_DEFINITION with only description / inputSchema /
    outputSchema — no execute. In harness/tools.js buildToolSet,
    execute: wrapToolExecute(s) and wrapToolExecute returns undefined when
    s.execute === undefined. So the model-facing built-in tool has no executor
    and is never auto-run; it only ever resolves via the input-request path.

  3. The override replaces the framework tool with execute.
    runtime/resolve-agent-graph.js builds the registry as
    createRuntimeToolRegistry({ tools: [...framework.filter(notOverridden), ...authored] }),
    so the authored ask_question (which, per (1), has an execute) replaces
    the built-in. Now buildToolSet wraps a real executor onto the model-facing
    ask_question, and a tool_result is produced for the call in addition
    to
    the input-request resolution.

  4. ask_question is excluded from eve's own action runner, but that does not
    prevent the result.
    harness/tool-loop.js passes
    excludedActionToolNames: new Set([ASK_QUESTION_TOOL_NAME, CODE_MODE_TOOL_NAME, FINAL_OUTPUT_TOOL_NAME])
    to emitStepActions, and harness/input-extraction.js
    extractQuestionInputRequests keys the park on toolName === ASK_QUESTION_TOOL_NAME.
    So the call is both parked for input and (because the tool now carries an
    execute) resolved with an execute-derived tool_result. On resume the
    input answer adds a second tool_result for the same id → the provider 400.

In short: HITL parking and "authored tool ⇒ has execute ⇒ produces a
result" are mutually exclusive, and there is no supported way to opt an authored
tool out of producing a result.

Why the obvious workarounds don't work

  • Omit execute → compiler rejects the tool (requirement (1)).
  • execute: undefined as never → same compiler rejection (expectFunction).
  • execute returns {} / throws / no-ops → it still yields a tool_result
    block; the duplicate remains.
  • Smuggle rich fields through the built-in ask_question → its input schema
    is fixed and .strict(); unknown keys (ui, etc.) aren't part of the
    model-facing schema and don't survive.
  • needsApproval instead of ask_question → approval defers execute
    until after the park (so no duplicate), but execute only receives the
    original tool input via ToolContext; the user's structured answer
    (inputResponse / approval text) is not exposed to execute
    (public/definitions/tool.d.ts ToolContext = SessionContext + token
    accessors only). So an approval-gated tool cannot return the user's selection
    as its result.
  • awaiting_selection + relay (execute returns a marker, turn completes,
    client relays the choice as a follow-up message) does avoid the duplicate,
    but it abandons native HITL: it relies on a synthetic out-of-band user message
    instead of inputResponses, loses the deterministic single-turn pause/resume,
    and pollutes session history.

Proposed solutions (any one would unblock this)

  1. First-class authored HITL / client-resolved tools (preferred).
    Allow an authored tool to declare it has no executor and is resolved by the
    client/HITL channel — e.g. defineClientTool({...}), a clientResolved: true
    flag, or letting defineTool accept a definition without execute (compiler
    allows it; buildToolSet/wrapToolExecute already handle
    execute === undefined). Such a tool would behave exactly like the built-in
    ask_question: park on call, resolve via inputResponses, single result.

  2. Allow customizing the built-in ask_question input schema.
    A config hook to widen/replace ASK_QUESTION_INPUT_SCHEMA (e.g. on
    defineAgent, or a defineAskQuestion({ inputSchema }) helper) so apps can
    pass typed payloads through the existing, working (executeless) HITL tool
    without authoring a replacement.

  3. Suppress the auto-result when a call is parked for input.
    When extractQuestionInputRequests (or an approval) parks a tool call, ensure
    the harness does not also emit/persist an execute-derived tool_result
    for that same call id — i.e. parking wins and the input response is the sole
    result. This would make an authored ask_question override "just work."

  4. Expose the input/approval response to execute.
    Pass the resolved inputResponse ({ optionId?, text? }) into ToolContext
    so a needsApproval/HITL tool's execute can read the user's structured
    answer and return it. Enables clean authored HITL tools whose result is the
    user's choice.

Acceptance criteria

  • An app can author a tool that:
    • parks the turn on call (input.requested / session.waiting),
    • carries a custom, typed input schema the client can render from,
    • resolves via agent.send({ inputResponses }), and
    • produces exactly one tool_result for its call id (no provider 400 on
      resume).
  • Equivalently: a supported way to widen ask_question's input schema, or to
    opt an authored tool out of producing an execute result when it is parked.

Appendix — exact provider error

GatewayInternalServerError: messages.18.content.1: each tool_use must have a
single result. Found multiple `tool_result` blocks with id:
toolu_01QQFSTbsnsdR6qXFt5EFaCS
  at createGatewayErrorFromResponse (@ai-sdk/gateway/src/errors/create-gateway-error.ts:121)
  …
upstreamType: 'AI_APICallError'
gatewayType: 'internal_server_error'
resolvedProvider: 'anthropic', canonicalSlug: 'anthropic/claude-sonnet-4.6'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions