add FanOut recovery: Subscriber.Bounce and consecutive-poll tracking by hugowetterberg · Pull Request #251 · ttab/elephantine

hugowetterberg · 2026-05-28T08:43:05Z

When a Postgres LISTEN connection goes silently bad — PgBouncer eating the LISTEN re-issue, a network partition that survives the keepalive, etc. — the existing ping-based health check stays green while application notifications stop flowing. Consumers paper over the gap with a fallback poll but have no way to flag that polling is the only thing keeping them caught up, and no way to force a reconnect short of restarting the process.

Subscriber.Bounce signals the listen loop to tear down the current connection and reconnect on the next iteration. The channel is buffered to depth 1 so multiple consumers' bounces during one outage coalesce, and the loop drains any pending signal at the start of each fresh connection so stale bounces don't immediately tear down a healthy reconnect.

FanOut.EnableRecovery folds the recovery bookkeeping into the FanOut itself: consumers report fallback-poll findings via FanOut.Polled, the FanOut auto-resets the streak inside NotifyWithPayload on each wire-side delivery, and the FanOut bounces the Subscriber once the streak crosses a configurable threshold. The streak counts consecutive non-empty calls, not items, so a single backlog-clearing poll on a busy pipeline does not trip the bounce. EnableRecovery registers per- channel poll-saved counter and streak gauge metrics through a MetricsHelper.

See docs/fanout-recovery.md for the full pattern and example wiring.

When a Postgres LISTEN connection goes silently bad — PgBouncer eating the LISTEN re-issue, a network partition that survives the keepalive, etc. — the existing ping-based health check stays green while application notifications stop flowing. Consumers paper over the gap with a fallback poll but have no way to flag that polling is the only thing keeping them caught up, and no way to force a reconnect short of restarting the process. Subscriber.Bounce signals the listen loop to tear down the current connection and reconnect on the next iteration. The channel is buffered to depth 1 so multiple consumers' bounces during one outage coalesce, and the loop drains any pending signal at the start of each fresh connection so stale bounces don't immediately tear down a healthy reconnect. FanOut.EnableRecovery folds the recovery bookkeeping into the FanOut itself: consumers report fallback-poll findings via FanOut.Polled, the FanOut auto-resets the streak inside NotifyWithPayload on each wire-side delivery, and the FanOut bounces the Subscriber once the streak crosses a configurable threshold. The streak counts consecutive non-empty *calls*, not items, so a single backlog-clearing poll on a busy pipeline does not trip the bounce. EnableRecovery registers per- channel poll-saved counter and streak gauge metrics through a MetricsHelper. See docs/fanout-recovery.md for the full pattern and example wiring.

hugowetterberg requested review from danijelvukoje and fred-o May 28, 2026 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add FanOut recovery: Subscriber.Bounce and consecutive-poll tracking#251

add FanOut recovery: Subscriber.Bounce and consecutive-poll tracking#251
hugowetterberg wants to merge 1 commit into
mainfrom
feature/fanout-poll-bounce

hugowetterberg commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hugowetterberg commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant