Skip to content

Proposal 7 — ApConnection.reconnectImplConnect(): Guard Against Dead Passive Socket #9

@scottf

Description

@scottf

Problem — Critical availability bug

ApConnection.reconnectImplConnect() guards against passive == null but not against passive.isConnected() == false. When the entire NATS cluster is stopped and restarted, both the active and passive connections are disconnected. At this point:

  • passive is not null
  • passive.dataPort is a dead, closed socket

The steal proceeds and updateStatus(Status.CONNECTED) fires — the app thinks it connected. The reader immediately fails with IOException: Read channel closed. Then newPassive() is called but passive.connect(true) fails (cluster still restarting) and throws RuntimeException. Both connections end up in a broken state. The passive never re-establishes until the app is fully restarted.

Observed in practice

Exception: Read channel closed.
    at io.nats.client.impl.NatsConnectionReader.run(NatsConnectionReader.java:186)

Proposed fix

@Override
protected void reconnectImplConnect() throws InterruptedException {
    if (passive == null) {
        return; // existing guard
    }

    if (!passive.isConnected()) {
        // Passive is also disconnected (whole cluster down).
        // Fall through to standard reconnect — the server pool will
        // find a live broker. newPassive() will run after active reconnects.
        super.reconnectImplConnect();
        return;
    }
    // ... existing steal logic unchanged ...
}

Current workaround

Applied directly in the local java-active-passive source until a release is available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions