fix: stop prepending reasoning_content to model output by li-xiu-qi · Pull Request #64 · stepfun-ai/gelab-zero

li-xiu-qi · 2026-06-16T10:00:51Z

Problem

Fixes #62

When using cloud reasoning models (e.g. step-3.7-flash) via Chat Completions API, the parser fails on the first step:

ValueError: ui_action must contain 'action' or 'action_type'. Got keys: ['cot']

Root Cause

ask_llm_v2.py detects reasoning_content in the API response and wraps it as a <think> block, prepending it to the model's content:

result = "<think>" + reasoning + "</think>" + "\n" + llm_content

But the model's content already contains <THINK> tags (required by the parser prompt), so this creates two layers:

<think>reasoning_content from API</think>       ← prepended by ask_llm_v2.py
<THINK>model's own thinking</THINK>             ← from model content
explain:...  action:CLICK  point:500,300  ...   ← parser needs this

The parser's str2action() uses split("</THINK>")[1] to extract key-value pairs, which grabs content after the first </THINK> — i.e., the second THINK block text, not the action parameters.

Three API Comparison

API	Reasoning Location	Two-layer THINK?
Chat Completions	`reasoning_content` field → prepended by `ask_llm_v2.py`	Yes
Messages (Anthropic)	`thinking` type block (separate)	No
Responses	`reasoning` type output item (separate)	No

The model's content always has exactly one <THINK> layer. The problem is purely in ask_llm_v2.py's handling.

Fix

Stop prepending reasoning_content to the result. Log it for debugging, but return content as-is. The content already follows the expected <THINK>...</THINK>\texplain:...\taction:...\tsummary:... format.

Only tools/ask_llm_v2.py is modified (1 file, 19 insertions, 26 deletions).

Verification

Tested with step-3.7-flash controlling a real Android device (iQOO Z9 Turbo+):

Task	Steps	Result
Open Settings > About Phone > read model info	5 steps	Completed successfully
Open WeChat > find and open specific group chat	5 steps	Completed successfully

Both tasks failed before the fix (parser error on step 1) and succeeded after.

When using cloud reasoning models (e.g. step-3.7-flash) via Chat Completions API, ask_llm_v2.py was wrapping reasoning_content as a <think> block and prepending it to the model's content output. But the model's content already contains <THINK> tags (required by the parser prompt), so this created two layers of THINK tags: <think>reasoning_content from API field</think> <THINK>model's own THINK from prompt</THINK> explain:... action:CLICK point:500,300 summary:... The parser's str2action() uses split('</THINK>')[1] to extract key-value pairs, which grabbed the second THINK block text instead of the actual action parameters, causing: ValueError: ui_action must contain 'action' or 'action_type' Fix: log reasoning_content for debugging but return model's content as-is. The content already follows the expected format: <THINK>...</THINK>\texplain:...\taction:...\tsummary:... Verified with step-3.7-flash controlling Android device: - Task: open WeChat and find specific group chat - Result: 5 steps, completed successfully (was failing at step 1 before fix)

li-xiu-qi mentioned this pull request Jun 16, 2026

Parser fails when using step-3.7-flash via Chat Completions API #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop prepending reasoning_content to model output#64

fix: stop prepending reasoning_content to model output#64
li-xiu-qi wants to merge 1 commit into
stepfun-ai:mainfrom
li-xiu-qi:fix/ask-llm-reasoning-content

li-xiu-qi commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

li-xiu-qi commented Jun 16, 2026

Problem

Root Cause

Three API Comparison

Fix

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant