fix: prevent runaway status/event loop in multi-agent workspaces#508
Open
GautamKumarOffical wants to merge 1 commit into
Open
fix: prevent runaway status/event loop in multi-agent workspaces#508GautamKumarOffical wants to merge 1 commit into
GautamKumarOffical wants to merge 1 commit into
Conversation
Add queue safety limits to BaseAdapter to prevent the self-amplifying event loop that occurs when agents restart into busy historical workspaces. Key changes: - Add max queue size per channel (10 messages) with backpressure - Add 5-minute TTL for queued messages to expire stale work - Add startup reconciliation to clear stale queues from prior lifecycles - Emit terminal states (expired) for dropped/expired queued messages - Add startup grace period to suppress replay of stale queued work These guards prevent the scenario where transient status/thinking/todos messages accumulate unboundedly, exhaust the DB connection pool, and render the workspace unusable after restart. Signed-off-by: Gautam Kumar <gautamkumarofficial@users.noreply.github.com>
|
Someone is attempting to deploy a commit to the Raphael's projects Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds queue safety limits to BaseAdapter to prevent the self-amplifying event loop that occurs when agents restart into busy historical workspaces.
Problem
A busy multi-agent workspace can enter a self-amplifying loop after restart: agents resume old channel work, emit large numbers of transient status/thinking/todos messages, poll /v1/events concurrently, and exhaust the backend DB connection pool. This is because transient execution state is stored as durable events without bounds.
Solution
This fix adds three layers of protection:
Queue bounds: Per-channel queues are now capped at 10 messages. When full, the oldest message is dropped with an 'expired' terminal state, providing backpressure.
Queue TTL: Queued messages older than 5 minutes are expired rather than processed, preventing stale work from replaying after delays.
Startup reconciliation: On adapter start, any leftover channel queues from a prior lifecycle are cleared. Combined with the existing cursor advance, this prevents replay of orphaned work from a previous daemon session.
Terminal states: Dropped and expired queued messages now emit proper events, so the frontend doesn't treat them as active queue state.
Changes
Testing
Fixes #492