Skip to content

[IBKR] Dead Gateway connection reports health=healthy; orders accepted but never transmitted until manual reconnect (no keepalive) #294

@rudyll

Description

@rudyll

Summary

IBKR connection health is derived purely from query failure-counting (_consecutiveFailures), with no active keepalive/heartbeat. When the socket to IB Gateway dies silently (Gateway daily restart, network blip, half-open TCP), account.health stays "healthy" (false positive). Orders placed in this window are accepted at the UTA/wallet layer (get an orderId, status submitted) but never reach IBKR — they only transmit after a manual reconnect, at which point queued orders fill immediately.

Impact

  • health: healthy while the connection is actually dead → callers place orders that silently don't execute.
  • Account/position queries return stale cached data during the dead-but-"healthy" window.
  • No auto-recovery; requires a manual reconnect.

Root cause (v0.40.0-beta.2)

  1. No active heartbeatrequestCurrentTime() (services/uta/src/domain/trading/brokers/ibkr/request-bridge.ts:216) is on-demand only (e.g. getMarketClock), never on a timer. Nothing probes the idle socket.
  2. connectionClosed() only rejects in-flight requests (request-bridge.ts:377rejectAll); it does not mark the account offline or trigger a reconnect.
  3. Connectivity errors 502/504/1100 are effectively a no-op (request-bridge.ts:394-398 — comment "These will be followed by connectionClosed() which rejects all", then returns).
  4. Health = passive failure countUnifiedTradingAccount derives health from _consecutiveFailures (incremented only on a failed query, reset on success). A socket dying while idle never trips it → stays healthy. Order submits on the dead socket don't reliably trip it either.

Net: a silently-dropped connection is invisible to the health model until some query happens to fail — and order submits don't reliably trip it.

Repro

  1. Connect an IBKR paper account via IB Gateway.
  2. Let the OpenAlice↔Gateway socket drop silently while idle (e.g. Gateway daily auto-restart, or a brief network drop).
  3. GET /api/trading/uta → account still health: healthy.
  4. POST .../wallet/place-order → returns orderId, status submitted; it does not fill.
  5. POST /uta/:id/reconnect → the order fills immediately.

Observed repeatedly on 0.40.0-beta.2 with an IB Gateway paper account: orders sat submitted (no fill) until we manually reconnected, then filled instantly.

Suggested fix

  • Add an active heartbeat (periodic reqCurrentTime with timeout) that marks the account offline on failure.
  • In connectionClosed() / on error 1100, mark the account offline and trigger (or schedule) a reconnect — not only reject in-flight requests.
  • Treat error 1102 (connectivity restored) as a recovery signal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions