[Bug][client] Regression in #25929: broker-assigned redirect URL is permanently pinned by ConnectionHandler, breaking lookup fallback on reconnect

### Search before reporting

- [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.

### Read release policy

- [x] I understand that [unsupported versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions) don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

### User environment

- master branch (unreleased), regression introduced by #25929 (merged 2026-06-04, commit 17ead1d691c)
- Java client (`pulsar-client`), affects any client embedding it — observed with the broker's internal client
- Reproduced on GitHub Actions CI (ubuntu) and locally on macOS (Darwin 24.6, JDK 21)

### Issue Description

#25929 (PIP-473 P5.3) added an `explicitHostURI` field to `ConnectionHandler` so the v5 transaction coordinator's metadata-store discovery can re-dial the known leader on retry instead of falling back to the service URL. However, the change also persists the broker-assigned redirect URL from PIP-307 unload notifications into the same field:

- `connectionClosed(cnx, initialConnectionDelayMs, hostUrl)` stores `hostUrl` (the `assignedBrokerServiceUrl` from `CommandCloseProducer`/`CommandCloseConsumer`) into `explicitHostURI`: https://github.com/apache/pulsar/blob/67ec614b8be/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionHandler.java#L227-L233
- `reconnectLater()` then re-dials `explicitHostURI` directly on **every** subsequent retry, bypassing topic lookup: https://github.com/apache/pulsar/blob/67ec614b8be/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionHandler.java#L205-L217
- Nothing ever clears the field on failure, so a producer/consumer that was redirected once is pinned to that broker address permanently.

Before #25929, the redirect URL was honored for the immediate reconnect attempt only; if connecting to the assigned broker failed, the next retry went through a normal topic lookup and recovered.

The assigned broker in a PIP-307 redirect can be wrong or stale. Concrete case that now fails: when an `ExtensibleLoadManagerImpl` leader steps down (`playFollower()` → `closeInternalTopics()`), the disconnects it sends for the internal topics carry `assignedBrokerLookupData` pointing at **itself** (the channel-table assignment for the `pulsar/system` bundle is stale at that moment). Pre-#25929 this was harmless — one failed reconnect, then lookup found the new owner. Post-#25929 every internal-topic client (service-unit-state table view readers/writers, load-data store clients) is pinned to the old leader and rejects forever with `ServiceUnitNotReadyException`, until topic ownership coincidentally returns to the pinned broker.

This is the cause of the new `ExtensibleLoadManagerImplTest.testRoleChange` flakiness on master, e.g. https://github.com/apache/pulsar/actions/runs/27284412071/job/80587641224 — the test's 30s await times out while the clients are wedged. The regression is not test-specific: any broker-initiated redirect to a broker that fails or refuses the topic (unload races, stale ownership views, broker restarts) now wedges the client instead of recovering via lookup.

### Error messages

```
2026-06-10T14:59:27,294 - WARN - [pulsar-io-17-4:BrokerService] - Namespace bundle (pulsar/system/0x00000000_0xffffffff) for topic (persistent://pulsar/system/loadbalancer-service-unit-state) not served by this instance:localhost:38741. Please redo the lookup.
2026-06-10T14:59:27,294 - WARN - [pulsar-io-17-2:ClientCnx] - Received error from server {channel=[id: 0xcd4919c7, L:/127.0.0.1:59288 - R:localhost/127.0.0.1:44799], message=...ServiceUnitNotReadyException...}
2026-06-10T14:59:27,294 - WARN - [pulsar-io-17-2:ConnectionHandler] - Could not get connection to broker - Will try again {delaySec=0.096, ...}
(repeats for the full 30s window — 226 rejections, every retry re-dialing the same stale address on the same pooled connection, no lookup in between)
```

The client-side pinning is initiated by the redirect:

```
00:19:10,925 - INFO - [pulsar-load-manager-15-1:Producer] - Disconnecting producer {assignedBrokerLookupData=Optional[BrokerLookupData[brokerId=localhost:51736, ...]]}   <- the stepping-down leader redirects clients to itself
00:19:10,929 - INFO - [pulsar-io-17-4:ConnectionHandler] - Closed connection - Will try again {hostUrl=pulsar://localhost:51734, ...}
```

### Reproducing the issue

On current master, run `ExtensibleLoadManagerImplTest.testRoleChange` repeatedly, e.g. temporarily set `invocationCount = 20` on the test and run:

```
./gradlew :pulsar-broker:test --tests "org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImplTest.testRoleChange" -PexcludedTestGroups=''
```

Locally this fails 1-2 invocations per 20 (each failing invocation shows the 30s wedge described above). With the `connectionClosed()` persistence of `hostUrl` into `explicitHostURI` removed (restoring the pre-#25929 lookup fallback while keeping the one-shot redirect and the TC discovery re-dial, which sets `explicitHostURI` via `grabCnx(URI, boolean)`), the same runs pass 80/80 invocations.

A focused client-level reproduction: connect a `ConnectionHandler`-based producer/consumer normally, then deliver a broker disconnect carrying an unreachable `assignedBrokerServiceUrl` — post-#25929 the handler never recovers; pre-#25929 it recovers on the next retry via lookup.

### Additional information

- `git blame` attributes the `explicitHostURI` field, the `reconnectLater()` re-dial, and the `connectionClosed()` persistence to 17ead1d691c (#25929).
- The TC discovery use case does not need the `connectionClosed()` persistence: `TransactionMetaStoreHandler` sets `explicitHostURI` itself via `grabCnx(URI, boolean)` on (re)discovery, and TC connections never receive `CloseProducer`/`CloseConsumer` redirects.
- Separately from this client regression, the broker-side behavior of redirecting internal-topic clients to the stepping-down leader itself (stale `assignedBrokerLookupData` for the `pulsar/system` bundle in `closeInternalTopics()`) is questionable on its own and could be improved (e.g. omit the lookup data when it equals the local broker), but it predates #25929 and was harmless with the lookup fallback in place.

### Are you willing to submit a PR?

- [x] I'm willing to submit a PR\!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][client] Regression in #25929: broker-assigned redirect URL is permanently pinned by ConnectionHandler, breaking lookup fallback on reconnect #25997

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Additional information

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug][client] Regression in #25929: broker-assigned redirect URL is permanently pinned by ConnectionHandler, breaking lookup fallback on reconnect #25997

Description

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Additional information

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions