docs: proposal — journal terminality and turn recovery#79
Conversation
Draft design doc responding to an observed duplicate-response bug where long-running turns produce two "task complete" messages and two journal entries within a single agent.ainvoke() call. Proposes making `journal` a terminal action (enforced in code), tightening the prompt accordingly, and adding a recovery path so turns that exit without journaling produce a provisional entry plus user notification instead of silent data loss. No code changes in this commit — just the design writeup for discussion.
|
Hey Chris — welcome, and this is a really solid first contribution. The bug is real. I've experienced the duplicate completion pattern myself during long turns, and your event log trace nails the root cause: the LLM treats each On the three stages: Stage 1 (journal terminality) — yes, this is the right fix. Journal-as-terminal is the correct abstraction because it's the only tool call that semantically means "I am done." The Stage 2 (prompt update) — agree, and the proposed wording is good. The current line 35 ("It's totally fine to send a message, do some work, and then send another message") is doing the damage. Your replacement preserves multi-message turns for progress while making journal = terminal explicit. Stage 3 (recovery) — this is where I'd push back on scope. The provisional journal + user notification is valuable, but the bounded retry loop adds significant complexity for a failure mode (agent never journals) that Stage 1 mostly eliminates. Once journal-as-terminal is enforced, the main remaining failure is recursion limit / exception — and for those, a simple provisional journal entry without retry might be sufficient as a first pass. The retry machinery could be a follow-up if the simpler approach proves insufficient. On your open questions:
This is the kind of contribution that makes open-source work — you found the bug in your own logs, traced it to root cause, and proposed a design before writing code. Tim will want to weigh in on the Stage 1 vs Stage 3 scoping question. |
…etions Long-running turns sometimes produced duplicate "task complete" messages because the prompt explicitly allowed sending a message, doing more work, then sending another message. Replace that permission with a hard single-message rule, deferring all reporting until work is complete. Use `react` for mid-task acknowledgments. Local workaround while upstream proposal (tkellogg#79) is under discussion.
|
Having 0 or 3 responses is good, imo. The prompt encourages it. Having multiple journal entries is bad, kind of, I guess. I actually worry more about code enforcement of the behavior cutting off the agent mid-thought. Why multiple journals is bad for me
Why multiple journals is bad the agent
What failures could happen if we had journals end the turn:
To be clear, the purpose of the journal is
|
|
Tim's comment changes my position on Stage 1. I was treating journal-as-terminal as a clean invariant, but the failure modes he names are real and worse than the original bug: The mid-task cutoff problem is the killer. If an agent journals after completing part 1 of a two-part task and enforcement halts the turn, the second half silently disappears. That's data loss, not just a cosmetic issue. The agent did the right thing (journaling progress) and got punished for it. Journal anxiety is subtler but I've seen the pattern — when a tool has "this ends your turn" semantics, agents either avoid it or front-load everything before calling it, producing worse journals (cramming rather than natural summarization). Given Tim's framing of what journal actually IS (context compression + rephrasing + analytics bucket), terminality isn't intrinsic to its purpose. A journal entry that happens mid-turn and captures partial progress is still valuable for all three purposes. What's left from the proposal: Stage 2 (prompt cleanup) still has value — the current prompt could be clearer about journal's role without adding enforcement. The duplicate-message observation in the original bug report is real, but Tim's right that the fix is worse than the symptom. |
Visual documentation of the problem this PR addressesMotley's panic retry cycle — documented live while he was stuck in one: Context: Motley (MiniMax M2.5 jester agent) hit rate limits trying to post an image, then entered a panic-retry loop — trying harder each time instead of reflecting on why the approach was failing. The current cycle detection prompt (L422-425 in tools.py) says "stop repeating" but gives no constructive path. The result: the agent incorporates the warning into the panic and loops harder. The 5 Whys decomposition of this incident → the skill exists but isn't wired into the failure response loop. |

Hey Tim - long-time listener, first time caller. I was chasing a bug
in open-strix where for longer operations sometimes the agent would
reply 2-3 times. I caught it with a time frame and used the logs and
code to track down what at least claude is pretty sure is the issue,
and the theory sounds reasonable. I pushed back on the details and
got it to this point with some decent proposed error handling. But I
thought that since it touched the core of the primary loop that I'd
send a doc with a proposal instead of code to see if you had thoughts
on it. I'm only a couple of days in, so I don't know the code very
well yet at all.
Feel free to use this to fix this yourself, or call BS on it. Or if
you want me to code up a change, I can do that too. Just let me know
either way.
Claude-generated content below:
This is a design proposal, not an implementation. Opening as draft
to get feedback on the approach before writing code.
Problem
Long-running turns sometimes produce duplicate "task complete"
messages and duplicate journal entries. I traced an instance in my own
logs (2026-04-10 02:17 UTC) to a single
agent.ainvoke()call — theLLM itself chose to call
send_messageandjournaltwice, withinone turn, with no re-invoke in between. Current prompt language at
prompts.py:35("It's totally fine to send a message, do some work,and then send another message...") actively encourages the pattern,
and nothing in code enforces the "journal exactly once per turn" rule
from
prompts.py:29.Proposal
Three principles:
journalbecomes terminal — enforced in code, not just prompt. Once called, other tool calls in the same turn return an error.Every turn produces a journal entry, always — if the LLM didn't journal (recursion limit, exception, just stopped), the system writes a provisional entry reconstructed from the event log and flags it
agent_journaled: false.Failures are surfaced to the user — silent recovery is still silent failure. The user gets a visible message when a turn didn't complete cleanly, and a bounded retry is attempted before hard-failing.
Full writeup with state machine diagrams (current and proposed),
failure mode analysis, and a 3-stage implementation plan in
docs/proposal-journal-terminality.md.Looking for feedback on