Skip to content

FYI: OpenRouter+DeepSeek emits corrupted second tool_call when tool_choice forces a function #21

@samkeen

Description

@samkeen

Summary

Discovered while probing whether the v1 judge verdict can be a forced tool call (see proposals/v1-implementation-plan.md Phase 1 and proposals/probes/phase1_verdict_tool_call_probe.py).

When calling deepseek/deepseek-v4-pro via OpenRouter with:

tool_choice={"type": "function", "function": {"name": "submit_verdict"}}

the response contains two tool_calls:

  1. A clean, schema-valid OpenAI-shape tool call (well-formed JSON in arguments).
  2. A second tool call whose arguments is corrupted with DeepSeek's internal template syntax leaking through OpenRouter's normalisation, e.g.:
{"verdict": "accept</|DSML|parameter_name>\n<|DSML|parameter name="concern" string="true">...

Without tool_choice (default auto), the same model returns exactly one clean tool call. So this isn't a model-side problem — it surfaces only when OpenRouter is normalising a forced-function response.

Why this is FYI, not actionable for v1

  • v1 won't force tool_choice (decided in the plan: "auto" produces cleaner results and the system prompt is enough to elicit the call).
  • The implementation will defensively pick the first parseable, schema-valid submit_verdict call from message.tool_calls and ignore siblings, so even if the quirk recurs under auto it's neutralised.
  • Filing this so we don't lose the finding — and so future judge-model changes (different provider/model) come with a probe pass that can detect equivalent quirks.

Repro

uv run python proposals/probes/phase1_verdict_tool_call_probe.py

The "accept-case (forced)" and "reject-case (forced)" scenarios reproduce; "reject-case (unforced)" is the clean control.

Possible follow-ups (not blocking)

  • Reproduce in isolation (minimal SDK script, no Tilth dependency) and report upstream to OpenRouter.
  • Confirm whether other DeepSeek models (v4-flash, etc.) have the same leakage.
  • Confirm whether forcing tool_choice on non-DeepSeek models via OpenRouter is clean.

Related

  • proposals/v1-implementation-plan.md — Phase 1 scope and risks now reflect this finding.
  • proposals/probes/phase1_verdict_tool_call_probe.py — the probe script (kept in-repo so future judge changes can re-run it).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions