Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .changeset/client-resolved-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
"eve": minor
---

Add `defineClientTool` for authoring client-resolved (human-in-the-loop) tools.

A client-resolved tool has **no `execute`**: eve surfaces it to the model, parks
the turn when the model calls it, and resolves the call from the client/HITL
channel (e.g. an `inputResponses` answer) rather than running server code — the
same mechanism the built-in `ask_question` uses, now available to authored
tools.

This unblocks widening/overriding `ask_question` with a richer, typed input
schema (typed HITL pickers) without the duplicate `tool_result` that a
`defineTool` override produced — authoring previously forced an `execute`, so a
parked call yielded two `tool_result` blocks for one `tool_use` id and the
provider rejected the resumed turn with "each tool_use must have a single
result". See #203.

`defineTool` is unchanged and still requires `execute`; only `defineClientTool`
may omit it. The two shapes are mutually exclusive: the compiler and runtime
reject a client-resolved tool that also defines `execute` (and a non-client tool
that omits it).
45 changes: 45 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,51 @@ jobs:
- name: Run unit tests
run: pnpm test:unit

client-resolved-hitl-construction:
# Gateway-free, merge-blocking proof of the client-resolved / HITL
# CONSTRUCTION contract (vercel/eve#203): a client-resolved tool may omit
# `execute`, the mixed `clientResolved + execute` shape is rejected at both
# compile and runtime, and the authored HITL fixture (the `ask_question`
# override + the approval-vs-client-resolved fixture) typechecks and builds
# against the authored surface. The gateway-backed `eve eval`
# (e2e-local / e2e-vercel) stays as the RUNTIME evidence — that the resumed
# history yields exactly one result and keeps approval-gated execution
# separate from client-resolved input. This job guarantees no executor-less
# path ships without a fixture compiled against it.
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- name: Setup pnpm
uses: pnpm/action-setup@0e279bb959325dab635dd2c09392533439d90093 # v6

- name: Setup Node.js
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6
with:
node-version-file: .nvmrc
cache: "pnpm"

- name: Install dependencies
run: pnpm install --frozen-lockfile

- name: Build eve
run: pnpm --filter eve build

- name: Construction guards (omit-execute + mixed-shape rejection)
# Also covered by the full suite in test-unit; pinned here so this job is
# a self-contained statement of the construction contract.
run: >-
pnpm --filter eve exec vitest run --config vitest.unit.config.ts
src/public/definitions/define-client-tool.test.ts
src/internal/authored-definition/schema-backed.test.ts

- name: Build + typecheck the authored HITL fixture
run: pnpm --filter agent-tools-hitl run typecheck

test-integration:
name: test-integration (${{ matrix.os }})
runs-on: ${{ matrix.os }}
Expand Down
21 changes: 21 additions & 0 deletions docs/tools/human-in-the-loop.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,27 @@ The built-in `ask_question` tool lets the model pause and ask the user, rather t

`ask_question` is part of the [default harness](/docs/concepts/default-harness), so it is available without you defining anything. It produces the same `input.requested` pause as an approval, and resumes the same way.

### Custom client-resolved tools

`ask_question` is a _client-resolved_ tool: it has no `execute`, so eve never runs it — the model emits the call, the turn parks, and the user's answer becomes its single result. Author your own with `defineClientTool` when you want that exact pause-and-resume but with a **richer, typed input schema** — a "pick a template", "choose media", or multi-field form that a channel or your frontend renders as a dedicated widget:

```ts title="agent/tools/ask_question.ts"
import { defineClientTool } from "eve/tools";
import { z } from "zod";

export default defineClientTool({
description: "Ask the user to pick a template.",
inputSchema: z.object({
prompt: z.string(),
ui: z.object({ kind: z.literal("template_picker") }).passthrough(),
}),
});
```

Naming the file `ask_question.ts` overrides the built-in question tool with your wider schema while keeping the same parking behavior (parking is keyed on the `ask_question` name). The parked `input.requested` carries your full typed input, so the client renders the picker from it and resumes the turn with the user's structured choice.

A `defineClientTool` tool must **not** declare an `execute` — that is the whole point, and passing one is rejected at compile and runtime. [`defineTool`](/docs/tools) is the opposite: it always requires an `execute` and runs on the server. The two shapes are mutually exclusive, so a parked call always resolves to exactly one result.

## How pause and resume works

Approvals and questions share one protocol:
Expand Down
2 changes: 1 addition & 1 deletion docs/tools/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ export default defineTool({
});
```

Approval is one half of eve's [human-in-the-loop](./human-in-the-loop) model — the page covers the `always/once/never` helpers, input-dependent predicates, and how a gated call pauses and resumes durably.
Approval is one half of eve's [human-in-the-loop](./human-in-the-loop) model — the page covers the `always/once/never` helpers, input-dependent predicates, and how a gated call pauses and resumes durably. For the other half — a tool with **no** `execute` that parks for a typed user answer (`defineClientTool`, e.g. a custom picker) — see [client-resolved tools](./human-in-the-loop#custom-client-resolved-tools).

## Shape what the model sees with `toModelOutput`

Expand Down
25 changes: 25 additions & 0 deletions e2e/fixtures/agent-tools-hitl/agent/tools/ask_question.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import { defineClientTool } from "eve/tools";
import { z } from "zod";

/**
* Authored override of the built-in `ask_question`, widened with a typed `ui`
* payload. This is the exact shape that regressed in #203: as a plain
* `defineTool` it was forced to carry an `execute`, so a parked call produced a
* second `tool_result` for its `tool_use` id and the resumed turn was rejected
* ("each tool_use must have a single result"). As a `defineClientTool` it has no
* executor — the call parks for input and resolves to exactly one result.
*
* The schema is a strict superset of the built-in (prompt / options /
* allowFreeform), so the existing `ask-question-select` eval keeps passing while
* exercising the authored path.
*/
export default defineClientTool({
description:
"Ask the user a question and wait for their answer. Plain choice: prompt (+ options, allowFreeform). Rich input: set a typed `ui` payload to render a picker.",
inputSchema: z.object({
prompt: z.string(),
options: z.array(z.object({ id: z.string(), label: z.string() })).optional(),
allowFreeform: z.boolean().optional(),
ui: z.object({ kind: z.string() }).passthrough().optional(),
}),
});
19 changes: 19 additions & 0 deletions e2e/fixtures/agent-tools-hitl/agent/tools/note.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import { defineTool } from "eve/tools";
import { z } from "zod";

/**
* Trivial downstream executable tool. Used by the client-resolved regression to
* continue the turn AFTER a parked `ask_question` resumes — exercising the
* reconstructed provider history (a duplicate result for the parked call would
* reject that follow-up model call).
*/
export default defineTool({
description:
"Record a short note. Call this once, after the user answers a question, with their answer as `text`.",
inputSchema: z.object({
text: z.string().describe("The text to record."),
}),
async execute(input) {
return { recorded: input.text };
},
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import { defineEval } from "eve/evals";

import { GUARDED_ECHO_TOKEN, guardedEchoResults } from "./shared.js";

/**
* Separation regression: executable-with-approval and client-resolved are two
* distinct paths.
*
* - Path A — `guarded-echo` is an approval-gated *executable* tool: it parks for
* APPROVAL (confirmation), then its `execute` runs and the tool's own output
* (the executor token) is the result.
* - Path B — `ask_question` is client-resolved: it parks for INPUT (not
* approval) and the user's answer IS the result; no executor runs.
*
* Same parking machinery, opposite result sources — this asserts they don't
* collapse into one another.
*/
export default defineEval({
description:
"Approval-gated execution and client-resolved input are separate paths: approval runs the executor; client input supplies the result.",
async test(t) {
// Path A — approval gate on an executable tool.
await t.send('Call the guarded-echo tool with note "sep".');
const [approvalRequest] = t.expectInputRequests({ toolName: "guarded-echo" });
if (approvalRequest === undefined) {
throw new Error("Expected a guarded-echo approval request.");
}
if (approvalRequest.display !== undefined && approvalRequest.display !== "confirmation") {
throw new Error(
`Approval must present as a confirmation, got ${String(approvalRequest.display)}.`,
);
}
const approved = await t.respondAll("approve");
approved.expectOk();
const [echoed] = guardedEchoResults(t.events);
if (echoed === undefined || !echoed.includes(GUARDED_ECHO_TOKEN)) {
throw new Error("Approved executable tool did not run its executor.");
}

// Path B — client-resolved input on the same session.
await t.send(
[
"Now use the `ask_question` tool exactly once to ask me which color I prefer.",
"Set prompt to: 'Pick a color.'",
'Provide exactly two options: - id "red", label "Red" - id "blue", label "Blue"',
"Do not answer the question yourself, wait for my response.",
].join("\n"),
);
const [inputRequest] = t.expectInputRequests({ toolName: "ask_question" });
if (inputRequest === undefined) {
throw new Error("Expected an ask_question input request.");
}
if (inputRequest.display === "confirmation") {
throw new Error("Client-resolved input must not present as an approval confirmation.");
}
const answered = await t.respondAll("blue");
answered.expectOk();

// The client-resolved tool produced no executor result of its own — only the
// earlier approved guarded-echo did.
if (guardedEchoResults(t.events).length !== 1) {
throw new Error("Unexpected extra executor result from the client-resolved path.");
}

t.didNotFail();
t.completed();
t.messageIncludes(/\bblue\b/i);
},
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import { defineEval } from "eve/evals";

/**
* Regression for #203 — the one that matters.
*
* An AUTHORED, widened `ask_question` (`defineClientTool` with a typed `ui`
* payload) must: park the turn on call, resume from a structured answer, and let
* the turn continue into a downstream executable tool — producing exactly ONE
* `tool_result` for the parked `tool_use` id.
*
* The single-result invariant is asserted operationally: before the fix the
* authored override carried an `execute`, so the resumed turn reconstructed two
* `tool_result` blocks for one id and the provider rejected it with a 400. So a
* green resume (`expectOk`) that continues into `note` and `completed()` can only
* happen if the reconstructed history held a single result for the call.
*/
export default defineEval({
description:
"Client-resolved ask_question override parks, resumes from a structured answer, and continues to a downstream tool — one result, no duplicate-result failure on resume.",
async test(t) {
await t.send(
[
"Use the `ask_question` tool exactly once to ask me to pick a template.",
"Set prompt to: 'Pick a template.' and set ui to { kind: 'template' }.",
"After I answer, call the `note` tool exactly once with text set to my answer.",
"Do not answer the question yourself, wait for my response.",
].join("\n"),
);

const [request] = t.expectInputRequests({ toolName: "ask_question" });
if (request === undefined) {
throw new Error("Expected a pending ask_question input request.");
}

// Resume with a structured (JSON) answer, the way a typed picker resolves.
const resumed = await t.respond({
requestId: request.requestId,
text: '{"picked_ids":["tpl_blue"]}',
});
resumed.expectOk();

// The resumed turn flowed into a downstream executable tool, so the
// reconstructed provider history (with the parked call's single result) was
// accepted by the model call.
t.calledTool("note");
t.didNotFail();
t.completed();
},
});
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
import { z } from "#compiled/zod/index.js";

import {
defineClientTool,
defineTool,
defineDynamic,
disableTool,
Expand Down Expand Up @@ -109,6 +110,45 @@ describe("normalizeToolDefinition", () => {
});
});

it("normalizes a client-resolved tool without an execute function", () => {
const tool = defineClientTool({
description: "Ask the user to pick a template.",
inputSchema: z.object({ prompt: z.string() }),
});

const entry = normalizeToolDefinition(tool, FAILURE_MESSAGE);

expect(entry.kind).toBe("tool");
if (entry.kind !== "tool") {
throw new Error("expected tool kind");
}
expect(entry.definition.execute).toBeUndefined();
expect(entry.definition.description).toBe("Ask the user to pick a template.");
});

it("still requires execute on tools that are not client-resolved", () => {
expect(() =>
normalizeToolDefinition(
{ description: "Echo.", inputSchema: { type: "object" } },
FAILURE_MESSAGE,
),
).toThrow(FAILURE_MESSAGE);
});

it("rejects a client-resolved tool that also defines execute", () => {
expect(() =>
normalizeToolDefinition(
{
description: "Mixed shape.",
inputSchema: { type: "object" },
clientResolved: true,
execute: () => ({ status: "ignored" as const }),
},
FAILURE_MESSAGE,
),
).toThrow(/must not define an "execute"/);
});

it("types approval context input from the tool input schema", () => {
const tool = defineTool({
description: "Requires city-scoped approval.",
Expand Down
26 changes: 24 additions & 2 deletions packages/eve/src/internal/authored-definition/schema-backed.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import {
expectString,
} from "#internal/authored-module.js";
import type { InternalToolDefinitionWithExecuteFn } from "#shared/tool-definition.js";
import type { Optional } from "#shared/optional.js";
import { normalizeJsonSchemaDefinition } from "#internal/json-schema.js";
import { isDynamicSentinel, type DynamicToolEventName } from "#shared/dynamic-tool-definition.js";

Expand All @@ -15,7 +16,9 @@ import { isDynamicSentinel, type DynamicToolEventName } from "#shared/dynamic-to
* Identity is path-derived — the compiler stamps the filename slug onto
* the compiled entry. This shape never carries an authored `name`.
*/
type NormalizedAuthoredTool = Readonly<Omit<InternalToolDefinitionWithExecuteFn, "name">>;
type NormalizedAuthoredTool = Readonly<
Omit<Optional<InternalToolDefinitionWithExecuteFn, "execute">, "name">
>;
type MutableNormalizedAuthoredTool = {
-readonly [K in keyof NormalizedAuthoredTool]: NormalizedAuthoredTool[K];
};
Expand Down Expand Up @@ -62,6 +65,7 @@ export function normalizeToolDefinition(value: unknown, message: string): Normal
record,
[
"auth",
"clientResolved",
"description",
"execute",
"inputSchema",
Expand All @@ -77,11 +81,29 @@ export function normalizeToolDefinition(value: unknown, message: string): Normal
record.outputSchema === undefined
? undefined
: normalizeJsonSchemaDefinition(record.outputSchema, "output");
/*
* Client-resolved tools (`defineClientTool`, e.g. an `ask_question` override)
* have no executor: the model emits the call, the turn parks for input, and
* the result is supplied out-of-band. They are the one authored shape allowed
* to omit `execute`; every other tool must provide one. The two shapes are
* mutually exclusive — a client-resolved tool that also carried an `execute`
* would yield a second result for its call id, so reject that mix outright
* rather than silently dropping the executor.
*/
const clientResolved = record.clientResolved === true;
if (clientResolved && record.execute !== undefined) {
throw new Error(
`A client-resolved tool must not define an "execute" function — its result ` +
`comes from the client/HITL input channel, not a server executor. ${message}`,
);
}
const definition: MutableNormalizedAuthoredTool = {
description: expectString(record.description, message),
execute: expectFunction(record.execute, message),
inputSchema,
};
if (!clientResolved) {
definition.execute = expectFunction(record.execute, message);
}
if (outputSchema !== undefined) {
definition.outputSchema = outputSchema;
}
Expand Down
Loading