Skip to content

feat(tools): add defineClientTool for client-resolved (HITL) tools#204

Open
keesvandorp wants to merge 4 commits into
vercel:mainfrom
keesvandorp:feat/client-resolved-tools
Open

feat(tools): add defineClientTool for client-resolved (HITL) tools#204
keesvandorp wants to merge 4 commits into
vercel:mainfrom
keesvandorp:feat/client-resolved-tools

Conversation

@keesvandorp

Copy link
Copy Markdown

Fixes #203.

Problem

Authored tools are required to provide an execute: both the compiler
(normalizeToolDefinitionexpectFunction(record.execute)) and the runtime
(resolveToolDefinitionexpectFunction(resolvedRecord.execute)) reject a
tool without one. That makes it impossible to author a tool that participates in
the human-in-the-loop input flow the way the built-in ask_question does — no
executor, the model emits the call, the harness parks it, and the user's answer
becomes its single tool_result.

The practical consequence (from #203): overriding ask_question to widen its
(fixed, .strict()) input schema for typed HITL pickers forces an execute,
whose auto-result collides with the input response. The resumed turn then
carries two tool_result blocks for one tool_use id and the provider
rejects it:

each tool_use must have a single result. Found multiple `tool_result` blocks with id: toolu_…

Change

Add defineClientTool({ description, inputSchema, outputSchema? }) — an
authored tool with no execute, stamped clientResolved: true. eve never
runs it; the call parks for input and resolves out-of-band, producing exactly
one result.

  • internal/authored-definition/schema-backed.ts — allow omitting execute
    when clientResolved; every other tool still requires it.
  • runtime/resolve-tool.ts — skip reattaching a live execute for
    client-resolved tools.
  • public/definitions/tool.tsdefineClientTool + ClientToolDefinition;
    passing execute throws.
  • public/tools/index.ts — export from eve/tools.

No harness change is needed: the runtime already surfaces executeless tools as
client-side (buildToolSet / wrapToolExecute return undefined) and
ResolvedToolDefinition.execute is already Optional. This PR just lets
authored tools reach that existing path. defineTool is unchanged and still
requires execute.

Authoring agent/tools/ask_question.ts with defineClientTool overrides the
built-in question tool with a wider, typed schema while keeping native
pause/resume — the parked input.requested carries the full typed input, so a
client can render a dedicated widget from it.

import { defineClientTool } from "eve/tools";
import { z } from "zod";

export default defineClientTool({
  description: "Ask the user to pick a template.",
  inputSchema: z.object({
    prompt: z.string(),
    ui: z.object({ kind: z.literal("template_picker") }).passthrough(),
  }),
});

Tests

  • defineClientTool brands the definition, marks it clientResolved, carries
    no execute, and throws when execute is supplied.
  • normalizeToolDefinition accepts a client-resolved tool without execute and
    still rejects a non-client tool that omits it.
  • pnpm --filter eve typecheck / oxlint / unit tests green.

Verified end-to-end in a downstream app (Next.js + useEveAgent): an
executeless ask_question override parks (input.requestedsession.waiting),
resumes from the user's structured answer, produces a single tool_result, and
the turn continues — no duplicate-result 400.

Notes

@vercel

vercel Bot commented Jun 23, 2026

Copy link
Copy Markdown

@keesvandorp is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

@keesvandorp

Copy link
Copy Markdown
Author

Thanks — addressed in d839f7b.

Reject mixed shapes (compiler + runtime). A client-resolved tool that also defines execute now throws at both normalizeToolDefinition (compile) and resolveToolDefinition (runtime), rather than silently dropping the executor. The non-client-without-marker direction was already rejected. (defineClientTool still throws at authoring time too.) Unit-tested.

Regression 1 — the one that matters. New e2e/fixtures/agent-tools-hitl eval client-resolved-question: an authored, widened ask_question (defineClientTool + typed ui) parks, resumes from a structured answer, and continues into a downstream note tool. The single-result invariant is asserted operationally — before the fix the authored override carried an execute, so the resume reconstructed two tool_result blocks for one id and the provider 400'd; a green expectOk() + downstream calledTool('note') + completed() can only happen with exactly one result for the call.

Regression 2 — separation. New eval approval-vs-client-resolved: guarded-echo (approval-gated executable) parks for approval and resolves via its executor (token in the result); ask_question (client-resolved) parks for input and the user's answer is the result, no executor. Same parking machinery, opposite result sources.

Verified: eve typecheck + unit tests (incl. the mixed-shape rejection) + oxlint; the HITL fixture typechecks and eve builds with the override. I haven't run the evals themselves (they need a gateway) — they mirror the existing ask-question-select / approve-then-no-regate evals.

@rpelevin

Copy link
Copy Markdown

Nice update. The important thing I would preserve before merge is that the mixed-shape rejection and the resume regression stay paired.

The compiler and runtime checks close the construction path, but the fixture is what proves the runtime history is actually reconstructing one result for the original parked call.

The remaining review lens I would use is:

  1. The client-resolved marker is the only path that allows no executor.
  2. Any exported execute on that shape fails before it can be silently dropped.
  3. The resumed structured input is bound to the same call id.
  4. The reconstructed provider history contains exactly one result for that call.
  5. The approval-gated executable path still proves the opposite source of truth: approval permits the executor to run; client input supplies the result.

If the gateway-backed evals cannot run in ordinary CI, I would at least keep the fixture build/typecheck plus the mixed-shape unit tests as merge blockers, and treat live eval execution as release evidence before closing the original issue.

Boundary: architecture and test feedback only; no claim about using this project or running its code.

@keesvandorp

Copy link
Copy Markdown
Author

Agreed on all five, and on keeping the rejection and the resume regression paired — the compiler/runtime checks guard construction, the fixture proves the reconstructed history settles to one result for the parked call. Neither substitutes for the other.

The gating maps cleanly onto the existing CI:

  • Deterministic merge blockers (no gateway): ci.yml runs linttypechecktest-unit (pnpm test:unit, where the mixed-shape rejection tests live) → test-integration on every PR. That covers invariants 1–2 (only the marker permits no executor; an exported execute on that shape fails, not silently dropped) plus the construction surface.
  • Release evidence (gateway-backed): e2e-local.yml triggers on the e2e/** change — it builds eve + the HITL fixture (so construction errors still gate there), then runs eve eval --strict. The eval execution (invariants 3–5: same-call-id binding, exactly-one-result on resume, and the approval path proving the opposite source of truth) needs a model gateway, so I'd treat a green eval run as the evidence before closing No way to author a HITL / client-resolved tool: authored tools require execute, which yields a duplicate tool_result when the same call parks for input #203 rather than a PR gate.

Locally verified on the branch: pnpm --filter eve typecheck + test:unit (incl. the mixed-shape rejection) + oxlint, and the agent-tools-hitl fixture both tsc-typechecks and eve builds with the defineClientTool override.

If it'd help, I'm happy to split the fixture's typecheck/build into an explicit gateway-free job so the construction guarantee gates independently of eval execution — just say the word.

@rpelevin

Copy link
Copy Markdown

Yes, I would split that gateway-free job.

The useful boundary is:

  1. A merge-blocking deterministic job proves construction: client-resolved tools can omit execute, mixed client-resolved plus execute fails, and the authored HITL fixture typechecks and builds.
  2. Gateway-backed eval remains release evidence: it proves resumed history creates exactly one result for the parked call and keeps approval-gated executable output separate from client-resolved input.
  3. Closing the original bug should wait for both pieces: deterministic construction gate green, and strict eval evidence green.

That split keeps CI fast while preserving the invariant that no executor-less path ships without a concrete fixture compiled against the authored surface.

I would make the job name explicit enough that future maintainers know what it protects, for example client-resolved-hitl-construction, and keep it pinned to the authored ask_question override plus the approval-vs-client-resolved fixture build.

Boundary: architecture and test feedback only; no claim about using this project or running its code.

keesvandorp added a commit to keesvandorp/eve that referenced this pull request Jun 23, 2026
Splits the construction contract from the gateway-backed evals (per review on
vercel#204). New merge-blocking job proves, with no model gateway:
- the client-resolved omit-execute + mixed-shape rejection unit guards, and
- that the authored HITL fixture (ask_question override + approval-vs-
  client-resolved fixture) typechecks and builds against the authored surface.

The gateway-backed `eve eval` (e2e-local / e2e-vercel) stays as runtime release
evidence (single result on resume; approval-gated execution kept separate from
client-resolved input). Keeps CI fast while guaranteeing no executor-less path
ships without a fixture compiled against it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
@keesvandorp

Copy link
Copy Markdown
Author

Done — split out as a dedicated merge-blocking job in 32782ea: client-resolved-hitl-construction (in ci.yml, gateway-free).

It proves construction, deterministically:

  • the omit-execute + mixed-shape rejection unit guards (define-client-tool + schema-backed), and
  • pnpm --filter agent-tools-hitl run typecheck (eve build && tsc) — so the authored ask_question override and the approval-vs-client-resolved fixture must compile and build against the authored surface.

The gateway-backed eve eval (e2e-local / e2e-vercel) stays as runtime release evidence: exactly one result on resume, and approval-gated execution kept separate from client-resolved input. So closing #203 waits for both — construction gate green here, strict eval green as release evidence.

Job name is intentionally explicit and pinned to that fixture pair, with a header comment stating what it protects, so it's legible to future maintainers. Verified locally: unit guards 18/18, and the fixture eve build && tsc clean.

keesvandorp added a commit to keesvandorp/eve that referenced this pull request Jun 23, 2026
Splits the construction contract from the gateway-backed evals (per review on
vercel#204). New merge-blocking job proves, with no model gateway:
- the client-resolved omit-execute + mixed-shape rejection unit guards, and
- that the authored HITL fixture (ask_question override + approval-vs-
  client-resolved fixture) typechecks and builds against the authored surface.

The gateway-backed `eve eval` (e2e-local / e2e-vercel) stays as runtime release
evidence (single result on resume; approval-gated execution kept separate from
client-resolved input). Keeps CI fast while guaranteeing no executor-less path
ships without a fixture compiled against it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
@keesvandorp keesvandorp force-pushed the feat/client-resolved-tools branch from 32782ea to c1f8a63 Compare June 23, 2026 15:07
keesvandorp and others added 4 commits June 23, 2026 17:20
Authored tools previously had to provide an `execute` (the compiler's
`normalizeToolDefinition` and the runtime's `resolveToolDefinition` both
called `expectFunction(execute)`). That made it impossible to author a
human-in-the-loop tool the way the built-in `ask_question` works — no
executor, the call parks for input and resolves out-of-band. Overriding
`ask_question` to widen its input schema forced an `execute`, whose
auto-result collided with the input response: two `tool_result` blocks for
one `tool_use` id, which the provider rejects on resume ("each tool_use
must have a single result").

Add `defineClientTool({ description, inputSchema, outputSchema? })`, which
stamps `clientResolved: true` and carries no `execute`:

- normalize-tool / schema-backed: allow omitting `execute` when
  `clientResolved`; every other tool still requires it.
- resolve-tool: skip reattaching a live `execute` for client-resolved tools.
- The runtime already surfaces executeless tools as client-side (buildToolSet
  / wrapToolExecute return undefined), so no harness change is needed; the
  resolved definition's `execute` is already Optional.

`defineTool` is unchanged and still requires `execute`. Passing `execute` to
`defineClientTool` throws.

Fixes vercel#203

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
…ressions

Address review feedback on the defineClientTool contract:

- Reject mixed shapes at BOTH the compiler (normalize-tool) and runtime
  (resolve-tool): a client-resolved tool that also defines `execute` now throws
  instead of silently dropping the executor. (A non-client tool that omits
  `execute` was already rejected.)
- e2e HITL fixture regressions:
  - client-resolved-question: an authored, widened `ask_question`
    (defineClientTool + typed `ui`) parks, resumes from a structured answer, and
    continues into a downstream `note` tool — exactly one tool_result for the
    parked call id. Before the fix this resume 400'd ("each tool_use must have a
    single result"); a green resume + downstream call proves the single result.
  - approval-vs-client-resolved: proves executable-with-approval and
    client-resolved input are separate paths — approval runs the executor;
    client input supplies the result.

Verified: eve typecheck + unit tests (incl. the mixed-shape rejection) + oxlint;
the HITL fixture typechecks (tsc) and `eve build`s with the override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
Splits the construction contract from the gateway-backed evals (per review on
vercel#204). New merge-blocking job proves, with no model gateway:
- the client-resolved omit-execute + mixed-shape rejection unit guards, and
- that the authored HITL fixture (ask_question override + approval-vs-
  client-resolved fixture) typechecks and builds against the authored surface.

The gateway-backed `eve eval` (e2e-local / e2e-vercel) stays as runtime release
evidence (single result on resume; approval-gated execution kept separate from
client-resolved input). Keeps CI fast while guaranteeing no executor-less path
ships without a fixture compiled against it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
Public API added in this PR needs docs (per CONTRIBUTING). Add a
"Custom client-resolved tools" section to the human-in-the-loop page
covering defineClientTool: no execute, the ask_question override for typed
pickers, the parked-input contract, and that defineTool/defineClientTool are
mutually exclusive (exactly one result). Cross-link from the tools overview.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kees van Dorp <keesvandorp@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No way to author a HITL / client-resolved tool: authored tools require execute, which yields a duplicate tool_result when the same call parks for input

2 participants