FYI: OpenRouter+DeepSeek emits corrupted second tool_call when tool_choice forces a function

## Summary

Discovered while probing whether the v1 judge verdict can be a forced tool call (see `proposals/v1-implementation-plan.md` Phase 1 and `proposals/probes/phase1_verdict_tool_call_probe.py`).

When calling `deepseek/deepseek-v4-pro` via OpenRouter with:

```python
tool_choice={"type": "function", "function": {"name": "submit_verdict"}}
```

the response contains **two** `tool_calls`:

1. A clean, schema-valid OpenAI-shape tool call (well-formed JSON in `arguments`).
2. A second tool call whose `arguments` is corrupted with DeepSeek's internal template syntax leaking through OpenRouter's normalisation, e.g.:

```
{"verdict": "accept</｜DSML｜parameter_name>\n<｜DSML｜parameter name="concern" string="true">...
```

Without `tool_choice` (default `auto`), the same model returns **exactly one** clean tool call. So this isn't a model-side problem — it surfaces only when OpenRouter is normalising a forced-function response.

## Why this is FYI, not actionable for v1

- v1 won't force `tool_choice` (decided in the plan: "auto" produces cleaner results and the system prompt is enough to elicit the call).
- The implementation will defensively pick the first parseable, schema-valid `submit_verdict` call from `message.tool_calls` and ignore siblings, so even if the quirk recurs under `auto` it's neutralised.
- Filing this so we don't lose the finding — and so future judge-model changes (different provider/model) come with a probe pass that can detect equivalent quirks.

## Repro

```bash
uv run python proposals/probes/phase1_verdict_tool_call_probe.py
```

The "accept-case (forced)" and "reject-case (forced)" scenarios reproduce; "reject-case (unforced)" is the clean control.

## Possible follow-ups (not blocking)

- Reproduce in isolation (minimal SDK script, no Tilth dependency) and report upstream to OpenRouter.
- Confirm whether other DeepSeek models (v4-flash, etc.) have the same leakage.
- Confirm whether forcing tool_choice on non-DeepSeek models via OpenRouter is clean.

## Related

- `proposals/v1-implementation-plan.md` — Phase 1 scope and risks now reflect this finding.
- `proposals/probes/phase1_verdict_tool_call_probe.py` — the probe script (kept in-repo so future judge changes can re-run it).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FYI: OpenRouter+DeepSeek emits corrupted second tool_call when tool_choice forces a function #21

Summary

Why this is FYI, not actionable for v1

Repro

Possible follow-ups (not blocking)

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FYI: OpenRouter+DeepSeek emits corrupted second tool_call when tool_choice forces a function #21

Description

Summary

Why this is FYI, not actionable for v1

Repro

Possible follow-ups (not blocking)

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions