You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gateway is positioned as the always-on front door to agents (Telegram/Discord/Slack/WhatsApp bots, web clients, schedulers). For that role, the client↔server connection lifecycle is a first-class robustness surface — and today it is thin in two ways that bite in production:
No protocol version negotiation. Client and server never exchange a wire-protocol version on connect. If the message shape changes between releases, a newer client against an older gateway (or vice-versa) fails silently or with an opaque error instead of negotiating a common version or rejecting cleanly. For a fleet that rolls out gateway and client updates independently, this is a latent breakage.
No resilient reconnecting client. The server already supports cursor-based event replay on reconnect, but no shipped client actually reconnects: integrators must hand-roll the socket loop, exponential backoff, re-join, and cursor bookkeeping themselves. There is also no gap-detection signal, so a client that reconnects after the bounded event window cannot tell it missed events.
A world-class gateway should make "connect, lose the network, transparently resume where you left off, across compatible versions" the default — not a DIY exercise for every integrator.
Note: this is complementary to and distinct from #2103. That issue covers server-side durability of the pending inbox / in-flight execution and graceful drain on shutdown. This issue covers the wire-protocol contract and client resilience: version negotiation, gap detection, and a built-in auto-reconnect/resume client.
Current behaviour
Version is informational only — never negotiated. The only "version" on the wire is a static string on the /info endpoint:
The join handshake carries no protocol version, and the core protocol contract declares none:
# src/praisonai/praisonai/gateway/server.py:954ifmsg_type=="join":
...
since_cursor=data.get("since") # cursor for replay — but no protocol version exchanged
Server-side resume exists but is incomplete. A client may pass a since cursor and the server replays missed events:
# src/praisonai/praisonai/gateway/server.py:1486-1488ifsince_cursorisnotNone:
replay_events=session.get_events_since(since_cursor)
logger.info(f"Replaying {len(replay_events)} events since cursor {since_cursor}")
However:
The per-session event history is bounded (persisted history is clamped to the last ~100 events), so a client reconnecting after the window cannot detect which events it missed — there is no gap-detection signal back to the client.
The resume payload restores message/event history only; it carries no presence/health snapshot, so a reconnecting client cannot re-sync gateway state in one round trip.
No client actually reconnects. The TypeScript gateway client is an interface plus two unused config fields — there is no reconnect loop, no backoff, no cursor tracking:
// src/praisonai-ts/src/gateway/index.ts (450 lines, interface only)connect(): Promise<void>;disconnect(): Promise<void>;
retryAttempts?: number;// declared, never used to drive reconnection
retryDelay?: number;// declared, never used
There is no equivalent reconnecting client in the Python wrapper either. (gateway/supervisor.py's reconnect is for channel bot adapters, not the gateway WebSocket client.)
Desired behaviour
Version negotiation on connect: client advertises a supported protocol range; server replies with the agreed version (or a clear, typed version_unsupported rejection). Mismatched client/server degrade or refuse cleanly instead of failing opaquely.
First-class reconnecting client (TS + Python): automatic reconnect with bounded exponential backoff + jitter, automatic re-join with the last cursor, and a connected/reconnecting/disconnected state callback — so integrators get durable connectivity for free.
Gap detection: events carry a monotonic sequence; the client surfaces an on_gap(expected, received) signal when it detects a missed range (e.g. reconnect beyond the replay window), so callers can trigger a full re-sync rather than silently dropping events.
Resume snapshot completeness: the reconnect/joined payload includes a compact presence/health snapshot alongside history, so one round trip restores client state.
Layer placement
Primary layer: core (src/praisonai-agents/praisonaiagents/gateway/protocols.py) — the version-negotiation handshake, the sequence/gap contract, and the resume-cursor shape are protocol contracts that belong in the protocol-driven core, with no heavy imports.
Why not core (the impl part): only the contract lives in core; the socket/backoff mechanics live in the wrapper (secondary touch).
Why not wrapper (as primary): if the negotiation/sequence/gap contract is defined only in the wrapper server, third-party clients have nothing portable to implement against; the contract must be core so every client (TS, Python, external) agrees on it.
Why not tools: connection lifecycle is gateway transport, never an agent-callable integration.
Why not plugins: this is intrinsic gateway runtime continuity, not an optional cross-cutting lifecycle policy.
Secondary touch: wrapper — implement the negotiated handshake + gap/sequence emission in gateway/server.py, and ship reconnecting clients in src/praisonai-ts/src/gateway/ and a Python gateway client.
3-way surface (CLI + YAML + Python): partial/no — this is a protocol + SDK contract rather than a user feature toggle; the only user-facing knobs are optional (e.g. gateway.protocol_strict to reject unsupported versions, client reconnect/backoff options).
Proposed approach
Extension point: protocol contract in core + handshake/sequence implementation in the wrapper server + a reusable reconnecting client.
# wrapper: a reconnecting client integrators can just useclient=GatewayClient(url, reconnect=True, backoff=Backoff(initial=1, max=30, jitter=0.2))
client.on_gap(lambdaexpected, received: client.resync())
awaitclient.connect() # negotiates version, auto-resumes via cursor on every reconnect
Resolution sketch
# Before (today): integrator hand-rolls everythingws=awaitwebsockets.connect(url)
awaitws.send(json.dumps({"type": "join", "since": last_cursor}))
# no version check; on socket drop the integrator must write their own# reconnect loop, backoff, re-join and cursor tracking; missed events# beyond the ~100-event window vanish with no signal.# After (proposed): durable by default, version-safeclient=GatewayClient(url, reconnect=True)
awaitclient.connect() # client/server agree a protocol version or reject cleanlyasyncforeventinclient.events():# transparently survives disconnects, resumes from cursorhandle(event) # on_gap fires if a range was missed -> client.resync()
Severity
High — for a component whose entire value proposition is being the robust, always-on entry point to agents, the connection layer currently offers no version safety across releases and forces every integrator to re-implement reconnect/resume. Both are table stakes for a world-class gateway, and the server already does the hard half (cursor replay) — the contract and client just aren't there.
Validation
gateway/server.py:547 exposes a static "version": "1.0.0" on /info only; the join handshake (server.py:954) exchanges no protocol version, and praisonaiagents/gateway/protocols.py declares no version/negotiation field (repo-wide grep for protocol_version/negotiate/min_protocol returns nothing).
Server-side cursor replay confirmed at gateway/server.py:959,966 (since accepted) and :1486-1488 (get_events_since), but the persisted event history is clamped (~100 events) and no gap-detection signal is returned to the client; the resume payload carries no presence/health snapshot.
src/praisonai-ts/src/gateway/index.ts (450 lines) is an interface declaring connect()/disconnect() plus unused retryAttempts/retryDelay; no reconnect loop, backoff, or cursor tracking exists, and there is no Python reconnecting client. gateway/supervisor.py's reconnect applies to channel bot adapters, not the gateway WebSocket client.
Summary
The gateway is positioned as the always-on front door to agents (Telegram/Discord/Slack/WhatsApp bots, web clients, schedulers). For that role, the client↔server connection lifecycle is a first-class robustness surface — and today it is thin in two ways that bite in production:
join, and cursor bookkeeping themselves. There is also no gap-detection signal, so a client that reconnects after the bounded event window cannot tell it missed events.A world-class gateway should make "connect, lose the network, transparently resume where you left off, across compatible versions" the default — not a DIY exercise for every integrator.
Current behaviour
Version is informational only — never negotiated. The only "version" on the wire is a static string on the
/infoendpoint:The
joinhandshake carries no protocol version, and the core protocol contract declares none:Server-side resume exists but is incomplete. A client may pass a
sincecursor and the server replays missed events:However:
No client actually reconnects. The TypeScript gateway client is an interface plus two unused config fields — there is no reconnect loop, no backoff, no cursor tracking:
There is no equivalent reconnecting client in the Python wrapper either. (
gateway/supervisor.py'sreconnectis for channel bot adapters, not the gateway WebSocket client.)Desired behaviour
version_unsupportedrejection). Mismatched client/server degrade or refuse cleanly instead of failing opaquely.joinwith the last cursor, and aconnected/reconnecting/disconnectedstate callback — so integrators get durable connectivity for free.on_gap(expected, received)signal when it detects a missed range (e.g. reconnect beyond the replay window), so callers can trigger a full re-sync rather than silently dropping events.joinedpayload includes a compact presence/health snapshot alongside history, so one round trip restores client state.Layer placement
src/praisonai-agents/praisonaiagents/gateway/protocols.py) — the version-negotiation handshake, the sequence/gap contract, and the resume-cursor shape are protocol contracts that belong in the protocol-driven core, with no heavy imports.gateway/server.py, and ship reconnecting clients insrc/praisonai-ts/src/gateway/and a Python gateway client.gateway.protocol_strictto reject unsupported versions, clientreconnect/backoffoptions).Proposed approach
Resolution sketch
Severity
High — for a component whose entire value proposition is being the robust, always-on entry point to agents, the connection layer currently offers no version safety across releases and forces every integrator to re-implement reconnect/resume. Both are table stakes for a world-class gateway, and the server already does the hard half (cursor replay) — the contract and client just aren't there.
Validation
gateway/server.py:547exposes a static"version": "1.0.0"on/infoonly; thejoinhandshake (server.py:954) exchanges no protocol version, andpraisonaiagents/gateway/protocols.pydeclares no version/negotiation field (repo-wide grep forprotocol_version/negotiate/min_protocolreturns nothing).gateway/server.py:959,966(sinceaccepted) and:1486-1488(get_events_since), but the persisted event history is clamped (~100 events) and no gap-detection signal is returned to the client; the resume payload carries no presence/health snapshot.src/praisonai-ts/src/gateway/index.ts(450 lines) is an interface declaringconnect()/disconnect()plus unusedretryAttempts/retryDelay; no reconnect loop, backoff, or cursor tracking exists, and there is no Python reconnecting client.gateway/supervisor.py'sreconnectapplies to channel bot adapters, not the gateway WebSocket client.