Skip to content

Resolve Hostnames Before Comparing to Active Server#2

Merged
scottf merged 5 commits into
mainfrom
better-host-comparison
Jun 16, 2026
Merged

Resolve Hostnames Before Comparing to Active Server#2
scottf merged 5 commits into
mainfrom
better-host-comparison

Conversation

@scottf

@scottf scottf commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@scottf

scottf commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator Author

@claude review

@synadia-claude-reviewer

synadia-claude-reviewer Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @scottf's task in 1m 39s —— View job


Code Review

  • Gather context
  • Read changed files and diff
  • Analyze changes
  • Post review feedback

Summary

This PR improves host comparison in ApPassiveServerPool by resolving hostnames to IP addresses and using those IPs when determining if a candidate server is equivalent to the active server. The idea is sound — two hostnames that resolve to the same IP(s) should be treated as the same server. However, there are a few issues worth addressing before merging.


.gitattributes

Adding gradlew text eol=lf is correct — the Gradle wrapper must have Unix line endings to be executable on Linux/macOS. Good addition.


ApPassiveServerPool.java

🔴 Critical: NullPointerException in isEquivalent (line 99–100)

List<String> testResolved = resolvedMap.get(test.getHost());
List<String> activeResolved = resolvedMap.get(active.getHost());
for (String resolved : testResolved) {              // NPE if testResolved is null
    if (activeResolved.contains(resolved) || ...)   // NPE if activeResolved is null

resolveMap.get() returns null for any host that wasn't resolved (e.g., DNS failed, resolution returned empty, or the host is an IP address — since resolve(NatsUri) skips IPs via !nuri.hostIsIpAddress()). Both testResolved and activeResolved can be null here.

A concrete example: if active is already an IP like 10.0.0.1, resolve() skips it (because hostIsIpAddress() is true), so activeResolved is null. Then activeResolved.contains(resolved) throws NPE before the resolved.equals(activeHost) fallback is ever reached.

Fix:

if (resolveMode.resolve) {
    List<String> testResolved = resolvedMap.get(test.getHost());
    List<String> activeResolved = resolvedMap.get(active.getHost());
    if (testResolved != null) {
        for (String resolved : testResolved) {
            if (resolved.equals(activeHost)) return true;
            if (activeResolved != null && activeResolved.contains(resolved)) return true;
        }
    }
}

Fix this →


🟡 Thread Safety: resolvedMap is an unsynchronized HashMap (line 18)

resolvedMap is a plain HashMap but is written to from setActiveServer() (and initialize()) and read from isEquivalent(), which is called from peekNextServer() and nextServer(). If those can be called from different threads, this is a data race.

Consider using ConcurrentHashMap instead:

final Map<String, List<String>> resolvedMap = new ConcurrentHashMap<>();

Fix this →


🟡 resolveMode can be null when setActiveServer is called before initialize (lines 29, 33)

resolveMode is only set in initialize(), but setActiveServer() calls resolve() which dereferences resolveMode.resolve on line 33. If setActiveServer() is invoked before initialize(), this throws NPE.

A guard in resolve(NatsUri) would help:

private void resolve(NatsUri nuri) {
    if (resolveMode == null || !resolveMode.resolve) return;
    ...

🟡 Port is ignored in isEquivalent

The method only compares hostnames and their resolved IPs, not ports. The original NatsUri.equivalent() presumably compared ports too. Two servers at example.com:4222 and example.com:4223 would incorrectly be considered equivalent. If port comparison is intentionally omitted here (e.g., because the pool guarantees servers differ only by hostname in this scenario), a comment explaining the intent would help.


Minor: List.contains() is O(n) for IP lookups

activeResolved.contains(resolved) is a linear scan. For the typical case of 1–4 IPs this is fine, but using Set<String> as the map value type would make lookups O(1) and better communicate the "no duplicates" intent.


Overall

The core idea of DNS-based host equivalence is good. The main blocker is the NPE in isEquivalent — it will reliably crash when either the active server or a pool server is already specified as an IP address, which is a common NATS deployment pattern. Thread safety and the resolveMode null guard are also worth addressing.

@scottf

scottf commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author

@claude There was a bit of a re-write after the first review. Please re-review

@synadia-claude-reviewer

synadia-claude-reviewer Bot commented Jun 15, 2026

Copy link
Copy Markdown

Claude finished @scottf's task in 3m 2s —— View job


Code Re-Review

  • Gather context
  • Read changed files and diff
  • Analyze changes against previous review findings
  • Post review feedback

Summary

The rewrite is a significant improvement. The previous NPE in isEquivalent is fully fixed, and the logic is now cleaner and more correct. The new test suite is comprehensive and well-structured. Two of the four previously-flagged issues remain.


Previous Issues — Status

✅ NPE in isEquivalent — FIXED

The rewrite eliminates the resolvedMap.get() calls in isEquivalent entirely. Instead it branches on active.hostIsIpAddress() / test.hostIsIpAddress() and calls _resolveHostToIps(), which now guarantees a non-null List<String> (using Collections.emptyList() as a placeholder). Clean fix.

✅ Port comparison — INTENTIONAL (documented by tests)

Test traversal_F3 explicitly demonstrates that nats://h.example.com:6222 is treated as equivalent to nats://h.example.com:4222. Given that NATS discovery always returns IPs (never hostnames), port-only differences are handled by the direct string equality path. This is correct per the NATS server discovery model.

🟡 resolveMode null guard — STILL PRESENT

setActiveServer() can be called before initialize(), but resolve(NatsUri) still dereferences resolveMode without a null check:

private void resolve(NatsUri nuri) {
    if (!nuri.hostIsIpAddress() && !nuri.isWebsocket() && resolveMode.resolve) { // NPE if resolveMode is null

Same in isEquivalent:

if (resolveMode.resolve) { // NPE if resolveMode is null

The unit tests in ApPassiveServerPoolTests work around this by setting sp.resolveMode directly before calling setActiveServer(). The fix is a one-liner:

if (resolveMode != null && resolveMode.resolve) {

Fix this →

🟡 Thread safety — STILL PRESENT

resolvedMap is still a plain HashMap. The check-then-put in _resolveHostToIps is non-atomic:

List<String> resolved = resolvedMap.get(host);  // read
if (resolved == null) {
    resolved = pool.resolveHostToIps(...);
    resolvedMap.put(host, resolved);              // write
}

If setActiveServer() and nextServer() run concurrently, both a write (from the resolve path) and read (from isEquivalent) happen on the same map without synchronization. Switching to ConcurrentHashMap makes the individual operations safe. The double-resolution risk on first access (two threads both see null) would remain benign — worst case is two DNS lookups for the same host at startup. If that's also a concern, computeIfAbsent on a ConcurrentHashMap makes it atomic:

final Map<String, List<String>> resolvedMap = new ConcurrentHashMap<>();

// in _resolveHostToIps:
return resolvedMap.computeIfAbsent(host, h -> {
    List<String> ips = pool.resolveHostToIps(h, resolveMode.maxOneResult, resolveMode.includeIPV6);
    return (ips == null || ips.isEmpty()) ? Collections.emptyList() : ips;
});

Fix this →


New Observations

isEquivalent logic is clean and correct

The branching logic handles all four hostname/IP combinations correctly:

test active action
hostname hostname only direct-string equality (by design — discovery never produces hostname-to-hostname)
hostname IP resolve test hostname; check if active IP is in the resolved set
IP hostname resolve active hostname; check if test IP is in the resolved set
IP IP only direct-string equality

The comment in isEquivalent explains the design rationale clearly.

_resolveHostToIps lazy resolution from isEquivalent path

When isEquivalent is called for a host not yet in the cache (e.g., a peer discovered after startup), _resolveHostToIps will call pool.resolveHostToIps() synchronously from within peekNextServer()/nextServer(). A slow or timing-out DNS call here will delay the reconnection. The existing initialize() call pre-populates the cache for known servers, which is a mitigation — but newly discovered peers (which NATS reports as IPs) will skip the IP-resolution guard and also be skipped by the test.hostIsIpAddress() / active.hostIsIpAddress() branch, so this only triggers in the hostname-vs-IP case with an uncached hostname. Worth being aware of.

ApTests.java changes look correct

Adding a third server ensures the passive candidate is always distinct from the active server, and changing RECONNECTED + RESUBSCRIBED to CONNECTED fixes the event sequence to match actual reconnection behavior. Both changes address real test reliability issues.


Test Suite Quality

The new ApPassiveServerPoolTests.java is thorough — 7 sections (A–G), 19 tests, deterministic FakeServerPool. Good coverage of the cache, traversal loop-around, and pass-through delegation. One gap: no concurrent-access test, which would be the natural companion to a ConcurrentHashMap fix.


Overall

Blocking concern: the resolveMode null guard (minor risk but cheap fix). The thread-safety issue is a pre-existing risk worth addressing before merge. Everything else looks good.

@scottf

scottf commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author

@claude

setActiveServer() can be called before initialize(), but resolve(NatsUri) still dereferences resolveMode without a null check:

initialize is called directly after init (create) because it's the only way to initialize a promised no-op contructor This is a NON ISSUE.

@synadia-claude-reviewer

synadia-claude-reviewer Bot commented Jun 15, 2026

Copy link
Copy Markdown

Claude finished @scottf's task in 50s —— View job


Response to resolveMode null guard clarification

  • Read source code to verify initialization order
  • Post response

Confirmed — you're right, this is a non-issue. Looking at ApConnection.java:

// connect() static factory:
ApConnection apc = new ApConnection(apOptions, activeBuilder.build());
apc.connect();   // ← setActiveServer() only called inside here, after super.connect() succeeds

// ApConnection constructor:
private ApConnection(ApOptions apOptions, Options activeOptions) {
    super(activeOptions);   // ← NatsConnection constructor calls initialize() on the pool
    ...
}

super(activeOptions) (the NatsConnection constructor) calls initialize() on the server pool, and setActiveServer() is only ever called after that — either from connect() at line 70 or from reconnectImplConnect() at line 166. There is no path where setActiveServer() is reachable before initialize() runs.

The previous review treated ApPassiveServerPool as if it were a standalone public API where lifecycle ordering was the caller's responsibility. Given that it's an internal class whose lifecycle is fully controlled by ApConnection, the null guard is unnecessary.

Withdrawing that finding. The remaining open item from the previous review is the thread safety one (resolvedMap as a plain HashMap with concurrent read/write from isEquivalent vs setActiveServer/initialize), which stands independently.


@scottf scottf merged commit 9dec6a3 into main Jun 16, 2026
10 checks passed
@scottf scottf deleted the better-host-comparison branch June 16, 2026 11:20
@scottf scottf changed the title Better host comparison Resolve Hostnames Before Comparing to Active Server Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant