Skip to content

add FanOut recovery: Subscriber.Bounce and consecutive-poll tracking#251

Open
hugowetterberg wants to merge 1 commit into
mainfrom
feature/fanout-poll-bounce
Open

add FanOut recovery: Subscriber.Bounce and consecutive-poll tracking#251
hugowetterberg wants to merge 1 commit into
mainfrom
feature/fanout-poll-bounce

Conversation

@hugowetterberg
Copy link
Copy Markdown
Contributor

When a Postgres LISTEN connection goes silently bad — PgBouncer eating the LISTEN re-issue, a network partition that survives the keepalive, etc. — the existing ping-based health check stays green while application notifications stop flowing. Consumers paper over the gap with a fallback poll but have no way to flag that polling is the only thing keeping them caught up, and no way to force a reconnect short of restarting the process.

Subscriber.Bounce signals the listen loop to tear down the current connection and reconnect on the next iteration. The channel is buffered to depth 1 so multiple consumers' bounces during one outage coalesce, and the loop drains any pending signal at the start of each fresh connection so stale bounces don't immediately tear down a healthy reconnect.

FanOut.EnableRecovery folds the recovery bookkeeping into the FanOut itself: consumers report fallback-poll findings via FanOut.Polled, the FanOut auto-resets the streak inside NotifyWithPayload on each wire-side delivery, and the FanOut bounces the Subscriber once the streak crosses a configurable threshold. The streak counts consecutive non-empty calls, not items, so a single backlog-clearing poll on a busy pipeline does not trip the bounce. EnableRecovery registers per- channel poll-saved counter and streak gauge metrics through a MetricsHelper.

See docs/fanout-recovery.md for the full pattern and example wiring.

When a Postgres LISTEN connection goes silently bad — PgBouncer eating
the LISTEN re-issue, a network partition that survives the keepalive,
etc. — the existing ping-based health check stays green while
application notifications stop flowing. Consumers paper over the gap
with a fallback poll but have no way to flag that polling is the only
thing keeping them caught up, and no way to force a reconnect short of
restarting the process.

Subscriber.Bounce signals the listen loop to tear down the current
connection and reconnect on the next iteration. The channel is buffered
to depth 1 so multiple consumers' bounces during one outage coalesce,
and the loop drains any pending signal at the start of each fresh
connection so stale bounces don't immediately tear down a healthy
reconnect.

FanOut.EnableRecovery folds the recovery bookkeeping into the FanOut
itself: consumers report fallback-poll findings via FanOut.Polled, the
FanOut auto-resets the streak inside NotifyWithPayload on each
wire-side delivery, and the FanOut bounces the Subscriber once the
streak crosses a configurable threshold. The streak counts consecutive
non-empty *calls*, not items, so a single backlog-clearing poll on a
busy pipeline does not trip the bounce. EnableRecovery registers per-
channel poll-saved counter and streak gauge metrics through a
MetricsHelper.

See docs/fanout-recovery.md for the full pattern and example wiring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant