Skip to content

feat: self-heal backplane subscriptions after a cluster reconnect#171

Open
d-barker wants to merge 1 commit into
OrleansContrib:masterfrom
d-barker:feat/backplane-self-heal-on-reconnect
Open

feat: self-heal backplane subscriptions after a cluster reconnect#171
d-barker wants to merge 1 commit into
OrleansContrib:masterfrom
d-barker:feat/backplane-self-heal-on-reconnect

Conversation

@d-barker

@d-barker d-barker commented Jun 20, 2026

Copy link
Copy Markdown

Summary

Adds an opt-in mechanism to automatically restore the SignalR backplane's stream subscriptions after the Orleans cluster connection drops and recovers.

Problem

When the Orleans client loses its connection to the cluster and later reconnects (for example, a full cluster recycle such as scaling the silo from one replica to many), the backplane's implicit stream subscriptions are not re-established. The client reconnects to a new cluster generation, but the hub server's in-memory SERVER_STREAM / ALL_STREAM observers are orphaned — so affected servers silently stop receiving backplane messages until the hub server process is restarted.

Change

  • OrleansSignalRConnectionMonitor — an IClientConnectionRetryFilter that doubles as a connection-loss signal. On every failed connection attempt it raises ConnectionLost; OrleansHubLifetimeManager subscribes and re-establishes its stream subscriptions as soon as the cluster is reachable again.
  • OrleansSignalRConnectionMonitorOptions — configurable MaxRetryAttempts (0 = retry indefinitely, the default), RetryDelay (default 5s), and an optional InnerRetryFilter to compose with an existing client retry/back-off policy.
  • AddSignalRBackplaneSelfHealing() — client-builder extension to register the monitor.
  • OrleansHubLifetimeManager — wires in the re-subscription path.
  • Unit tests covering retry attempts, retry exhaustion, and inner-filter composition.

Client usage

Opt in on the Orleans client (the hub server's client builder), alongside the existing UseSignalR() call:

var client = new ClientBuilder()
    .UseSignalR()
    .AddSignalRBackplaneSelfHealing();

Tune the retry policy via options:

clientBuilder
    .UseSignalR()
    .AddSignalRBackplaneSelfHealing(options =>
    {
        options.RetryDelay = TimeSpan.FromSeconds(2);  // back-off between failed attempts (default: 5s)
        options.MaxRetryAttempts = 0;                  // 0 = retry forever (default); >0 gives up after N
    });

Compose with an application's existing IClientConnectionRetryFilter — the monitor still emits the reconnect signal, but defers the keep-retrying decision (and its back-off) to your filter:

clientBuilder
    .UseSignalR()
    .AddSignalRBackplaneSelfHealing(options =>
    {
        options.InnerRetryFilter = new MyExistingRetryFilter();  // RetryDelay / MaxRetryAttempts are ignored when set
    });

Note: AddSignalRBackplaneSelfHealing() registers the monitor as the client's IClientConnectionRetryFilter. If you already register your own retry filter, pass it via InnerRetryFilter so its policy is preserved rather than replaced.

Notes

🤖 Generated with Claude Code

When the Orleans cluster connection drops and later recovers, the SignalR
backplane's implicit stream subscriptions are not automatically restored, so
servers can stop receiving messages until they are restarted.

This adds an opt-in OrleansSignalRConnectionMonitor that observes cluster
connection lifecycle events and re-establishes the backplane subscriptions on
reconnect, with a configurable retry/back-off policy.

- OrleansSignalRConnectionMonitor + OrleansSignalRConnectionMonitorOptions:
  retry count, delay, and an optional InnerRetryFilter to compose with an
  existing client retry policy.
- AddSignalRBackplaneSelfHealing() client-builder extension to register it.
- OrleansHubLifetimeManager wires in the re-subscription path.
- Unit tests covering retry attempts, exhaustion, and inner-filter composition.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant