Skip to content

Proposal 6 — ApPassiveServerPool: Resolve Hostnames Before Comparing to Active Server #8

@scottf

Description

@scottf

Problem

ApPassiveServerPool.nextServer() skips the active server using NatsUri.equivalent(), which is a pure string comparison: host.toLowerCase() + ":" + port.

This means the following URIs are all treated as different servers even when they are the same physical broker:

URI String compared Same broker?
localhost:4222 (active) "localhost:4222" ✅ active itself
127.0.0.1:4222 (gossip-discovered) "127.0.0.1:4222" ✅ same — not skipped
[2001:bb6:...]:4222 (gossip-discovered) "2001:bb6:...:4222" ✅ same — not skipped

Observed failure

We connected to a local 3-node cluster using hostnames and saw this in production logs:

[AP-metrics] startup_complete active=[nats://localhost:4222] passive=[nats://[2001:bb6:5f4c:6800:6490:85b:5286:76ac]:4222]

Both connections landed on port 4222 — the same physical broker. The AP guarantee is violated: if that broker fails, both active and passive fail simultaneously.

Root cause

When active connects to localhost:4222, the NATS server's gossip INFO message advertises all cluster member addresses — including their actual IPv4 and IPv6 addresses (127.0.0.1:4222, [2001:bb6:...]:4222). acceptDiscoveredUrls() adds those to the pool. nextServer() skips localhost:4222 but picks [2001:bb6:...]:4222 as passive — same physical broker, different URI string.

This also affects production environments using domain names. If nats.company.com resolves to multiple IPs (round-robin DNS), gossip may add each IP individually. Active may be on nats.company.com:4222 while passive lands on 10.1.2.3:4222 — same broker.

Proposed fix

When setActiveServer(NatsUri) is called, resolve the hostname to all its IP addresses and cache them. In nextServer() and peekNextServer(), skip any candidate whose resolved IPs overlap with the active server's resolved IPs (same port):

// In ApPassiveServerPool
public void setActiveServer(NatsUri activeNuri) {
    activeServerRef.set(activeNuri);
    activeResolvedIps.set(resolveToIpSet(activeNuri)); // cache resolved IPs
}

private boolean isSameBrokerAsActive(NatsUri candidate) {
    NatsUri active = activeServerRef.get();
    if (active == null) return false;
    if (candidate.equivalent(active)) return true; // fast path
    if (candidate.getPort() != active.getPort()) return false;
    Set<String> activeIps = activeResolvedIps.get();
    if (activeIps == null || activeIps.isEmpty()) return false;
    return resolveToList(candidate).stream().anyMatch(activeIps::contains);
}

Replace the equivalent(active) check in nextServer() and peekNextServer() with isSameBrokerAsActive(server).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions