Skip to content

fix: prevent runaway status/event loop in multi-agent workspaces#508

Open
GautamKumarOffical wants to merge 1 commit into
openagents-org:developfrom
GautamKumarOffical:fix/runaway-event-loop
Open

fix: prevent runaway status/event loop in multi-agent workspaces#508
GautamKumarOffical wants to merge 1 commit into
openagents-org:developfrom
GautamKumarOffical:fix/runaway-event-loop

Conversation

@GautamKumarOffical

Copy link
Copy Markdown

Summary

Adds queue safety limits to BaseAdapter to prevent the self-amplifying event loop that occurs when agents restart into busy historical workspaces.

Problem

A busy multi-agent workspace can enter a self-amplifying loop after restart: agents resume old channel work, emit large numbers of transient status/thinking/todos messages, poll /v1/events concurrently, and exhaust the backend DB connection pool. This is because transient execution state is stored as durable events without bounds.

Solution

This fix adds three layers of protection:

  1. Queue bounds: Per-channel queues are now capped at 10 messages. When full, the oldest message is dropped with an 'expired' terminal state, providing backpressure.

  2. Queue TTL: Queued messages older than 5 minutes are expired rather than processed, preventing stale work from replaying after delays.

  3. Startup reconciliation: On adapter start, any leftover channel queues from a prior lifecycle are cleared. Combined with the existing cursor advance, this prevents replay of orphaned work from a previous daemon session.

  4. Terminal states: Dropped and expired queued messages now emit proper events, so the frontend doesn't treat them as active queue state.

Changes

  • :
    • Add , , constants
    • Add method called on startup
    • Add startup grace check in for stale queued messages
    • Add overflow backpressure in when queue is full
    • Add TTL expiry check in queue drain loop

Testing

  • Verified that queue overflow drops oldest messages with proper terminal state
  • Verified that stale queued messages are discarded on startup
  • Verified that expired queued messages are skipped during queue drain
  • No regressions to normal message processing flow

Fixes #492

Add queue safety limits to BaseAdapter to prevent the self-amplifying
event loop that occurs when agents restart into busy historical workspaces.

Key changes:
- Add max queue size per channel (10 messages) with backpressure
- Add 5-minute TTL for queued messages to expire stale work
- Add startup reconciliation to clear stale queues from prior lifecycles
- Emit terminal states (expired) for dropped/expired queued messages
- Add startup grace period to suppress replay of stale queued work

These guards prevent the scenario where transient status/thinking/todos
messages accumulate unboundedly, exhaust the DB connection pool, and
render the workspace unusable after restart.

Signed-off-by: Gautam Kumar <gautamkumarofficial@users.noreply.github.com>
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the Raphael's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-agent workspace can enter runaway status/event loop after restart

1 participant