feat(0.22.0): OpenAI-compat backend exposes tools + transport errors fail loud by tangletools · Pull Request #52 · tangle-network/agent-runtime

tangletools · 2026-05-24T22:10:29Z

Root cause

Live full-loop smoke against gtm-agent's delegation eval on 2026-05-24 (`pnpm eval:delegation --scenario competitor-dashboard --backend tcloud`) returned composite 0.07 with `delegate_research: 0/3` calls — the agent never fired any delegation tool. Diagnostic: `/tmp/live-smoke-evidence.md`.

Two distinct substrate bugs collided:

1. `createOpenAICompatibleBackend` could not advertise tools to the LLM. The streaming request body was hard-coded to `{ model, stream, stream_options, messages }` with no `tools` field. Even when the caller's AgentProfile mounted the 5 MCP delegation tools, the OpenAI-compat backend had no way to tell the LLM they existed — production routes through MCP-aware Claude Code / Codex / etc., but the eval routes through the bare LLM via this backend. The eval rubric was checking for tool_call events that physically could not fire on this transport.

2. Transport errors silently swallowed. When the router returned HTTP 402 `Model "claude-sonnet-4-6" requires credits`, `BackendTransportError` fired inside the backend, the runtime emitted a `backend_error` event, the stream drained cleanly, and the resulting `RunRecord` showed `finalText: ""`, `toolCalls: []`, `error: null`. Operator sees "agent produced nothing" with no signal pointing at credit exhaustion. Direct violation of the runtime's fail-loud doctrine — "External-boundary calls return typed outcomes; callers MUST inspect succeeded before using value."

Changes

`createOpenAICompatibleBackend` — `tools` + `toolChoice`

```ts
createOpenAICompatibleBackend({
apiKey, baseUrl, model,
tools?: ReadonlyArray,
toolChoice?: 'auto' | 'none' | 'required' | { type: 'function'; function: { name: string } },
// ... existing fields
})
```

`OpenAIChatTool` mirrors the OpenAI Chat Completions spec exactly (`{ type: 'function', function: { name, description?, parameters? } }`) so callers pass tool definitions through without runtime translation. The router proxies this shape verbatim to Anthropic, DeepSeek, Groq, OpenAI, Gemini.

Streaming tool call assembly handles both:

OpenAI shape: incremental `delta.tool_calls[index].function.arguments` chunks per index, finalized by `finish_reason: 'tool_calls'`, plus the non-streamed `message.tool_calls` collapse.
Anthropic shape (router-proxied): `content_block_start` (`tool_use` block with id+name), `content_block_delta` (`input_json_delta.partial_json`), `content_block_stop`.

Each finalized call surfaces as a single `tool_call` RuntimeStreamEvent with `{ toolName, toolCallId, args }`. Args JSON-parsed when valid, raw string fallback when truncated.

The backend does not execute tools — surfacing the call is the contract; dispatch is the caller's problem (typically through their MCP / sandbox runtime).

Transport errors fail loud

`BackendTransportError` now carries:

```ts
class BackendTransportError extends AgentEvalError {
readonly backend: string
readonly status?: number
readonly body?: string // truncated to 2 KiB
}
```

Non-success HTTP responses read `response.text()` (best-effort, truncated) before throwing — so `free_tier_limit`, `invalid_api_key`, `model_not_found` reach the operator's eye.

`backend_error` and `final` events both carry a typed `error` field:

```ts
interface BackendErrorDetail {
kind: 'transport' | 'backend'
message: string
status?: number
body?: string
}
```

`run.ts` discriminates: `BackendTransportError` → `kind: 'transport'` with `status` + `body`; any other thrown error → `kind: 'backend'` (custom adapter / sandbox crashes).

Sanitized telemetry exposes `error.kind` + `error.status` always (operators need failure classification regardless of payload opt-in), but redacts `error.body` behind `RuntimeTelemetryOptions.includeControlPayloads` — a 4xx body can echo user-visible text from the provider's error page.

Tests

`tests/backends-openai-tools.test.ts` (11 new): request shape with/without tools; `toolChoice` variants; OpenAI streaming tool_calls — fragmented args reassembly, parallel calls, truncated JSON fallback, no-finish-reason flush, mixed text+tool streams, single-chunk `message.tool_calls` collapse; Anthropic `tool_use` content blocks assembled + interleaved with text.
`tests/backends-fail-loud.test.ts` (10 new): 402 free-tier denial with typed detail + upstream body; 401/403/404 parametric; 5xx after retry exhaustion; `fetch failed` (status=0); 2 KiB body truncation cap; non-transport custom-backend errors as `kind: 'backend'`; success path leaves `final.error` undefined; `BackendTransportError` catchable directly with `status` + `body`.
235 prior tests unchanged. Final count: 256 passing.

```
Test Files 26 passed (26)
Tests 256 passed (256)
```

`pnpm typecheck` clean. `pnpm build` clean. `biome check` clean.

Migration

Consumers that silently treated empty `finalText` on a transport failure as "agent produced nothing" will now see typed errors on `final.error`. Usually correct behavior; occasionally requires updating downstream error handling.

Concretely, the gtm-agent (and every other delegation-eval consumer) needs to:

Pass its delegation tools through `tools:` on `createOpenAICompatibleBackend` for the eval to be structurally capable of measuring tool dispatch.
Map `final.error` onto `RunRecord.error` so 402/401/5xx no longer becomes silent `error: null`.

Version

`0.22.0` minor. Additive surface (`tools`, `toolChoice`, `BackendErrorDetail`, `OpenAIChatTool`, `OpenAIChatToolChoice` exports). Behavior change is fail-loud-on-error — strictly more truthful telemetry, technically observable to consumers.

Test plan

Existing 235 tests pass unchanged
21 new tests pass
`pnpm typecheck` clean
`pnpm build` clean
`biome check` clean
Downstream smoke: gtm-agent delegation eval reruns with this version + tools wired through the eval-backend factory

…fail loud Two substrate fixes surfaced by the gtm-agent delegation eval smoke (2026-05-24). The smoke ran composite 0.07 with delegate_research 0/3 because the eval's OpenAI-compat backend (a) had no way to advertise MCP tools to the LLM and (b) silently swallowed the router's HTTP 402 free-tier denial into an empty finalText + null error. createOpenAICompatibleBackend - New `tools?: OpenAIChatTool[]` option — forwarded verbatim on every /chat/completions request when set. OpenAI Chat Completions tools[] shape; router proxies it to Anthropic, DeepSeek, Groq, etc. - New `toolChoice?` option — 'auto' | 'none' | 'required' | function pin. Omitted by default; provider falls back to its own default. - Streamed `tool_calls` deltas (OpenAI shape) and `tool_use` content blocks (Anthropic shape proxied by the router) are accumulated across SSE chunks and emitted as a single `tool_call` RuntimeStreamEvent per call. Args JSON-parsed when valid, raw string otherwise (truncation safety). - Parallel tool calls, mixed text+tool streams, and routers that drop the terminal `finish_reason` chunk all flush cleanly. Transport errors fail loud - BackendTransportError now carries `status` + truncated upstream `body` (≤2 KiB). Non-success HTTP responses capture the response text before throwing so `free_tier_limit`, `invalid_api_key`, etc. surface in the error envelope. - backend_error and final events both carry a typed `error: { kind: 'transport' | 'backend', message, status?, body? }` field. Consumers building a RunRecord MUST map final.error onto RunRecord.error — silent empty finalText hides credit exhaustion. - Sanitized telemetry exposes `error.kind` + `error.status` by default; `error.body` is gated behind `includeControlPayloads` because it can echo user-visible text from a provider's error page. Tests - 11 new in backends-openai-tools.test.ts: request shape with/without tools, tool_choice variants, OpenAI streaming tool_calls (fragmented args, parallel calls, truncated JSON, no-finish-reason flush, mixed text+tool, single-chunk message.tool_calls), Anthropic tool_use content blocks (assembled + interleaved with text). - 10 new in backends-fail-loud.test.ts: 402/401/403/404 with typed detail, 5xx after retry exhaustion, network failure (status=0), body truncation, non-transport custom-backend errors, success path leaves error undefined, BackendTransportError catchable directly. - 235 prior tests unchanged. Migration - Consumers that silently treated empty finalText on transport failure as "agent produced nothing" will now see typed errors on final.error. Usually correct behavior; sometimes requires updating downstream error handling.

tangletools merged commit bba2aeb into main May 24, 2026
1 check passed

tangletools deleted the feat/openai-compat-tools-and-fail-loud-errors branch May 24, 2026 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.22.0): OpenAI-compat backend exposes tools + transport errors fail loud#52

feat(0.22.0): OpenAI-compat backend exposes tools + transport errors fail loud#52
tangletools merged 1 commit into
mainfrom
feat/openai-compat-tools-and-fail-loud-errors

tangletools commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tangletools commented May 24, 2026

Root cause

Changes

`createOpenAICompatibleBackend` — `tools` + `toolChoice`

Transport errors fail loud

Tests

Migration

Version

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants