Skip to content

fix(realtime): cap WebSocket clients + plug writer-goroutine leaks#67

Merged
aksOps merged 1 commit into
mainfrom
fix/websocket-hub-robustness
Apr 28, 2026
Merged

fix(realtime): cap WebSocket clients + plug writer-goroutine leaks#67
aksOps merged 1 commit into
mainfrom
fix/websocket-hub-robustness

Conversation

@aksOps

@aksOps aksOps commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

PR E of 6. Closes two production-readiness gaps in the WebSocket hub:

  1. Max-clients cap (codex round 2 finding B): no admission limit before — 10k connecting clients would exhaust FDs and per-client send-channel memory. New `WS_MAX_CLIENTS` env var with atomic reservation; over-cap returns HTTP 503.

  2. Writer-goroutine leaks on idle disconnect / Stop (exposed by the cap test): writer blocks on `for msg := range c.send` and never wakes up if the client disconnects without traffic OR if `Stop()` is called with idle clients. Two CAS-guarded close sites added:

    • `HandleWebSocket` after reader loop exits — wakes writer on idle disconnect, releasing the admission slot
    • `Hub.Run` stopCh handler — closes all client send channels so `writerWg.Wait()` returns

Test plan

  • `go test -race -count=1 ./internal/realtime/...` → 3 passed (was 1, +2 new)
  • Cap rejection: 3rd connect past cap returns 503, slot is NOT consumed
  • Slot release: after `c1.Close`, ActiveClients drops to 1 within 2s, new connect succeeds
  • Unlimited path: 10 concurrent connects all succeed when cap unset
  • Negative cap coerced to 0 (unlimited)

🤖 Generated with Claude Code

Closes two production-readiness gaps:

1. **Max-clients cap** (codex round 2 finding B): Hub had no admission
   limit, so 10k connecting clients would exhaust file descriptors and
   per-client send-channel memory before any backpressure kicked in.
   New WS_MAX_CLIENTS env var (0 = unlimited, default) and atomic
   reservation in HandleWebSocket — over-cap connects return HTTP 503.

2. **Writer-goroutine leak on idle disconnect / Stop**: Pre-existing
   bug exposed by the cap test. The writer goroutine blocks on
   `for msg := range c.send` and never wakes up when:
   - The client disconnects without traffic (no conn.Write to fail on)
   - Hub.Stop() is called with idle clients (stopCh return doesn't
     close any send channels, so writerWg.Wait() hangs forever)

   Fixed by force-closing c.send (CAS-guarded) at two sites:
   - HandleWebSocket after reader loop exits — wakes writer on
     idle-disconnect so the admission slot releases promptly
   - Hub.Run stopCh handler — wakes all writers on Stop so
     writerWg.Wait() returns

Tests cover: cap rejection with 503, slot release after disconnect,
unlimited path when cap unset, negative cap coerced to 0, race-detector
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

@aksOps aksOps merged commit 9200463 into main Apr 28, 2026
17 checks passed
@aksOps aksOps deleted the fix/websocket-hub-robustness branch April 28, 2026 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant