Proposal 1 — ApPassiveServerPool: Add setLameDuckServer() API

### Problem

When the passive connection's broker sends a Lame Duck Mode (LDM) signal, we force the passive to reconnect to a healthy broker before the active connection needs to steal its socket.

However, `ApPassiveServerPool.nextServer()` only skips the active server — it does not skip the current passive server (the one that just sent LDM). During the broker's drain window the LDM broker continues to accept new connections, so `nextServer()` may return it again, causing a redundant reconnect cycle — worst case, the active steals a socket that is already dying.

### Concrete failure sequence (3 nodes: B1=active, B3=passive+LDM)

1. `connectSucceeded(B3)` → pool randomizes → `entryList = [B3, B1, B2]`
2. `ApPassiveServerPool.nextServer()` → B3 ≠ active(B1) → returns B3 (the dying broker)
3. Passive reconnects to the same draining broker → receives LDM again
4. If active steals passive's socket during this window, it inherits a dying connection

### Proposed API addition

```java
// In ApPassiveServerPool
/**
 * Marks the given URI as a lame-duck server.
 * nextServer() and peekNextServer() will skip this URI (in addition to the active server)
 * until clearLameDuckServer() is called.
 */
public void setLameDuckServer(NatsUri uri) { ... }

/**
 * Clears any previously-set lame-duck server so it becomes eligible for selection again.
 */
public void clearLameDuckServer() { ... }
```

Update `nextServer()` and `peekNextServer()` to skip both `activeServerRef` AND `lameDuckServerRef`.

### Current workaround

We demote the LDM server in pool ordering before triggering the passive reconnect, which reduces the chance of it being selected again. However, this is a positional heuristic — it relies on ordering behaviour rather than an explicit skip rule. If the pool ordering changes (e.g. due to a shuffle on the next `connectSucceeded()`), the protection may not hold. A first-class `setLameDuckServer()` API that guarantees the server is skipped until explicitly cleared is the correct and reliable fix.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal 1 — ApPassiveServerPool: Add setLameDuckServer() API #3

Problem

Concrete failure sequence (3 nodes: B1=active, B3=passive+LDM)

Proposed API addition

Current workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal 1 — ApPassiveServerPool: Add setLameDuckServer() API #3

Description

Problem

Concrete failure sequence (3 nodes: B1=active, B3=passive+LDM)

Proposed API addition

Current workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions