Problem
ApPassiveServerPool.nextServer() skips the active server using NatsUri.equivalent(), which is a pure string comparison: host.toLowerCase() + ":" + port.
This means the following URIs are all treated as different servers even when they are the same physical broker:
| URI |
String compared |
Same broker? |
localhost:4222 (active) |
"localhost:4222" |
✅ active itself |
127.0.0.1:4222 (gossip-discovered) |
"127.0.0.1:4222" |
✅ same — not skipped |
[2001:bb6:...]:4222 (gossip-discovered) |
"2001:bb6:...:4222" |
✅ same — not skipped |
Observed failure
We connected to a local 3-node cluster using hostnames and saw this in production logs:
[AP-metrics] startup_complete active=[nats://localhost:4222] passive=[nats://[2001:bb6:5f4c:6800:6490:85b:5286:76ac]:4222]
Both connections landed on port 4222 — the same physical broker. The AP guarantee is violated: if that broker fails, both active and passive fail simultaneously.
Root cause
When active connects to localhost:4222, the NATS server's gossip INFO message advertises all cluster member addresses — including their actual IPv4 and IPv6 addresses (127.0.0.1:4222, [2001:bb6:...]:4222). acceptDiscoveredUrls() adds those to the pool. nextServer() skips localhost:4222 but picks [2001:bb6:...]:4222 as passive — same physical broker, different URI string.
This also affects production environments using domain names. If nats.company.com resolves to multiple IPs (round-robin DNS), gossip may add each IP individually. Active may be on nats.company.com:4222 while passive lands on 10.1.2.3:4222 — same broker.
Proposed fix
When setActiveServer(NatsUri) is called, resolve the hostname to all its IP addresses and cache them. In nextServer() and peekNextServer(), skip any candidate whose resolved IPs overlap with the active server's resolved IPs (same port):
// In ApPassiveServerPool
public void setActiveServer(NatsUri activeNuri) {
activeServerRef.set(activeNuri);
activeResolvedIps.set(resolveToIpSet(activeNuri)); // cache resolved IPs
}
private boolean isSameBrokerAsActive(NatsUri candidate) {
NatsUri active = activeServerRef.get();
if (active == null) return false;
if (candidate.equivalent(active)) return true; // fast path
if (candidate.getPort() != active.getPort()) return false;
Set<String> activeIps = activeResolvedIps.get();
if (activeIps == null || activeIps.isEmpty()) return false;
return resolveToList(candidate).stream().anyMatch(activeIps::contains);
}
Replace the equivalent(active) check in nextServer() and peekNextServer() with isSameBrokerAsActive(server).
Problem
ApPassiveServerPool.nextServer()skips the active server usingNatsUri.equivalent(), which is a pure string comparison:host.toLowerCase() + ":" + port.This means the following URIs are all treated as different servers even when they are the same physical broker:
localhost:4222(active)"localhost:4222"127.0.0.1:4222(gossip-discovered)"127.0.0.1:4222"[2001:bb6:...]:4222(gossip-discovered)"2001:bb6:...:4222"Observed failure
We connected to a local 3-node cluster using hostnames and saw this in production logs:
Both connections landed on port 4222 — the same physical broker. The AP guarantee is violated: if that broker fails, both active and passive fail simultaneously.
Root cause
When active connects to
localhost:4222, the NATS server's gossip INFO message advertises all cluster member addresses — including their actual IPv4 and IPv6 addresses (127.0.0.1:4222,[2001:bb6:...]:4222).acceptDiscoveredUrls()adds those to the pool.nextServer()skipslocalhost:4222but picks[2001:bb6:...]:4222as passive — same physical broker, different URI string.This also affects production environments using domain names. If
nats.company.comresolves to multiple IPs (round-robin DNS), gossip may add each IP individually. Active may be onnats.company.com:4222while passive lands on10.1.2.3:4222— same broker.Proposed fix
When
setActiveServer(NatsUri)is called, resolve the hostname to all its IP addresses and cache them. InnextServer()andpeekNextServer(), skip any candidate whose resolved IPs overlap with the active server's resolved IPs (same port):Replace the
equivalent(active)check innextServer()andpeekNextServer()withisSameBrokerAsActive(server).