Skip to content

fix: stop prepending reasoning_content to model output#64

Open
li-xiu-qi wants to merge 1 commit into
stepfun-ai:mainfrom
li-xiu-qi:fix/ask-llm-reasoning-content
Open

fix: stop prepending reasoning_content to model output#64
li-xiu-qi wants to merge 1 commit into
stepfun-ai:mainfrom
li-xiu-qi:fix/ask-llm-reasoning-content

Conversation

@li-xiu-qi

Copy link
Copy Markdown

Problem

Fixes #62

When using cloud reasoning models (e.g. step-3.7-flash) via Chat Completions API, the parser fails on the first step:

ValueError: ui_action must contain 'action' or 'action_type'. Got keys: ['cot']

Root Cause

ask_llm_v2.py detects reasoning_content in the API response and wraps it as a <think> block, prepending it to the model's content:

result = "<think>" + reasoning + "</think>" + "\n" + llm_content

But the model's content already contains <THINK> tags (required by the parser prompt), so this creates two layers:

<think>reasoning_content from API</think>       ← prepended by ask_llm_v2.py
<THINK>model's own thinking</THINK>             ← from model content
explain:...  action:CLICK  point:500,300  ...   ← parser needs this

The parser's str2action() uses split("</THINK>")[1] to extract key-value pairs, which grabs content after the first </THINK> — i.e., the second THINK block text, not the action parameters.

Three API Comparison

API Reasoning Location Two-layer THINK?
Chat Completions reasoning_content field → prepended by ask_llm_v2.py Yes
Messages (Anthropic) thinking type block (separate) No
Responses reasoning type output item (separate) No

The model's content always has exactly one <THINK> layer. The problem is purely in ask_llm_v2.py's handling.

Fix

Stop prepending reasoning_content to the result. Log it for debugging, but return content as-is. The content already follows the expected <THINK>...</THINK>\texplain:...\taction:...\tsummary:... format.

Only tools/ask_llm_v2.py is modified (1 file, 19 insertions, 26 deletions).

Verification

Tested with step-3.7-flash controlling a real Android device (iQOO Z9 Turbo+):

Task Steps Result
Open Settings > About Phone > read model info 5 steps Completed successfully
Open WeChat > find and open specific group chat 5 steps Completed successfully

Both tasks failed before the fix (parser error on step 1) and succeeded after.

When using cloud reasoning models (e.g. step-3.7-flash) via Chat Completions
API, ask_llm_v2.py was wrapping reasoning_content as a <think> block and
prepending it to the model's content output. But the model's content already
contains <THINK> tags (required by the parser prompt), so this created two
layers of THINK tags:

  <think>reasoning_content from API field</think>
  <THINK>model's own THINK from prompt</THINK>
  explain:...  action:CLICK  point:500,300  summary:...

The parser's str2action() uses split('</THINK>')[1] to extract key-value
pairs, which grabbed the second THINK block text instead of the actual
action parameters, causing:
  ValueError: ui_action must contain 'action' or 'action_type'

Fix: log reasoning_content for debugging but return model's content as-is.
The content already follows the expected format:
  <THINK>...</THINK>\texplain:...\taction:...\tsummary:...

Verified with step-3.7-flash controlling Android device:
- Task: open WeChat and find specific group chat
- Result: 5 steps, completed successfully (was failing at step 1 before fix)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser fails when using step-3.7-flash via Chat Completions API

1 participant