pilot: recover from invalid RTDL turns#88
Conversation
Retry RTDL parse/expand once with a corrective prompt, gracefully finish if retry still fails, and roll back session history when a turn errors.
There was a problem hiding this comment.
Pull request overview
This PR improves Pilot’s resilience to malformed RTDL by adding a single retry with a corrective prompt, gracefully ending the turn with final text if the retry still fails, and attempting to roll back per-session history on failed turns so later rbnx chat turns aren’t affected.
Changes:
- Add a one-time RTDL parse/expand retry that includes the exact capability names in the corrective prompt.
- On repeated invalid RTDL, emit a final user-facing message and return an empty sequence plan instead of failing the stream.
- On turn failure, attempt to restore session history to its pre-turn state.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| system/pilot/src/service.rs | Adds per-turn history rollback on run_turn error. |
| system/pilot/src/planner.rs | Implements RTDL retry + graceful fallback using an empty sequence plan and a corrective prompt. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut history = history_arc.lock().await; | ||
| let history_len_before_turn = history.len(); | ||
| if let Err(e) = planner::run_turn( | ||
| &task, | ||
| &mut history, |
| let mut p = format!( | ||
| "Your previous RTDL response could not be parsed or expanded by Pilot.\n\ | ||
| Error: {err:#}\n\ | ||
| Previous response preview: {}\n\n\ | ||
| Retry the same user request exactly once. The previous `cap` value is invalid; \ | ||
| do not repeat it. Return only a JSON object with exactly \ | ||
| `content` and `rtdl`. Use only capability_name values from this list; do not invent \ | ||
| provider names, method names, or aliases:\n", | ||
| raw_preview(raw_content) | ||
| ); | ||
| for cap in display_caps { | ||
| p.push_str("- "); | ||
| p.push_str(&cap.display_name); | ||
| p.push('\n'); | ||
| } | ||
| p.push_str( | ||
| "\nIf no further capability call is needed, use \ | ||
| {\"op\":\"sequence\",\"children\":[]} as `rtdl`. If the user's requested action cannot \ | ||
| be performed using the listed capabilities, explain the missing capability in `content` \ | ||
| and return an empty RTDL sequence instead of inventing a capability.\n", | ||
| ); | ||
| p | ||
| } |
| fn rtdl_recovery_final_text(err: &anyhow::Error) -> String { | ||
| format!( | ||
| "I could not safely continue because the planner produced an invalid capability call after retrying once. {err:#}" | ||
| ) | ||
| } |
There was a problem hiding this comment.
Copilot says right. Maybe the rtdl plan itself is wrong. Not only the capability call.
| fn rtdl_recovery_final_text(err: &anyhow::Error) -> String { | ||
| format!( | ||
| "I could not safely continue because the planner produced an invalid capability call after retrying once. {err:#}" | ||
| ) | ||
| } |
There was a problem hiding this comment.
Copilot says right. Maybe the rtdl plan itself is wrong. Not only the capability call.
| ) | ||
| .await | ||
| { | ||
| history.truncate(history_len_before_turn); |
There was a problem hiding this comment.
After a deep talk with @enkerewpo , we think history should not be truncated. Because the err only can be the collapse of atlas or error of the LLM API provider. So the previous rtdl results should preserved.
|
Updated based on the review. I removed the history rollback, made the retry prompt more general, changed the fallback final text so it does not expose internal errors, and added tests for the prompt/fallback text. Verified with |
Summary
rbnx chatturns.Motivation
Some OpenAI-compatible VLMs occasionally produce malformed RTDL or invent capability names, for example:
Before this change, Pilot aborted the stream with an internal error:
In interactive chat, a failed turn could also leave partial model/user context in the session history, making later turns fail more often.
Behavior
With this change, Pilot first retries the same planning step once using a corrective prompt.
If the retry succeeds, execution continues normally.
If the retry still fails, Pilot emits a final user-facing explanation and completes the turn gracefully instead of breaking the stream.
If the turn still returns an error, the session history is truncated back to the state before the turn started.
Testing
qwen-vl-max:what can you see?please turn around