Skip to content

fix realtime sticky sessions protection x2#19

Merged
vi70x4 merged 15 commits into
mainfrom
fix/realtime-sticky
Jun 5, 2026
Merged

fix realtime sticky sessions protection x2#19
vi70x4 merged 15 commits into
mainfrom
fix/realtime-sticky

Conversation

@vi70x4

@vi70x4 vi70x4 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator
  • feat(providers): intercept wrapped error payloads on HTTP 200 responses
  • feat(proxy): replace hardcoded LongCat/Owl Alpha cooldowns with generalized thread protection scanner
  • chore: temporary commit before switching branch
  • feat(thread-protection): implement rules engine and replace hardcoded longcat branches
  • refactor(proxy): replace flat cooldown with real-time active request tracking for LongCat and Owl Alpha
  • fix(proxy): address code review findings for active request tracking
  • chore: standardize on pnpm as package manager
  • chore: rename CLAUE.md to AGENTS.md
  • chore: cleanup repo
  • fix(proxy): address PR feat: generalized thread protection scanner integration #13 code review bugs BUG-05, BUG-06, BUG-10
  • fix: address PR review comments — stream abort guard, SSE error scope, TTL refresh, collapsible pools, doc paths, cleanup
  • fix: address PR Feat/realtime sticky #16 review — sticky key check, SSE headers, keepalive pre-stream, cooldown gating, timer cleanup, a11y, log clarity
  • fix: address PR Feat/realtime sticky #16 review — sticky key check, SSE headers, keepalive pre-stream, cooldown gating, timer cleanup, a11y, log clarity, test fixes
  • ci: switch from npm to pnpm in GitHub Actions workflow
  • fix: address PR Fix/realtime sticky #18 review — pre-stream stall gen.throw, activeRequests concurrency, wrapped-error consistency, heartbeat fake timers, cooldown test accuracy

Summary by Sourcery

Generalize and harden provider thread protection, streaming, and routing behavior while introducing transient model cooldowns and pool-aware fallback UI, along with CI migration to pnpm.

New Features:

  • Introduce a thread-protection rules engine that centralizes provider-ban vs model-skip decisions and integrates with proxy error handling.
  • Add transient model cooldowns shared across requests to temporarily skip unstable models after 5xx or connection failures.
  • Expose model pool metadata (fast, balanced, smart) in the API and group models by pool in the fallback dashboard with collapsible sections.
  • Implement SSE stream heartbeat and stall detection to keep long-lived streams alive and terminate stalled connections gracefully.
  • Detect and surface wrapped error payloads returned with HTTP 200 from multiple providers.

Bug Fixes:

  • Fix multiple PR review issues around sticky sessions, active request tracking, cooldown gating, and retry behavior for provider bans.
  • Ensure balanced-mode routing still respects sticky preferences when the preferred model is otherwise excluded.
  • Correct router analytics to use recency-weighted stats while reporting raw totals for dashboard display.
  • Prevent wrapped provider error payloads from being silently swallowed in streaming paths.
  • Tighten tests and mocks around provider streaming and cooldown behavior to reflect new safeguards.

Enhancements:

  • Refine provider selection by weighting recent request outcomes more heavily in Thompson sampling-based routing.
  • Unify provider skip-model logic to use fallback chains instead of all enabled models, aligning cooldown behavior with actual routing.
  • Improve accessibility and keyboard interaction for the fallback pool UI via collapsible, focusable sections.
  • Standardize sticky session handling in balanced mode to use real session keys and allow bans and stickies to operate consistently across modes.
  • Add detailed specs and design docs for thread protection, transient cooldowns, SSE heartbeats, and wrapped error interception to align implementation with architecture.

Build:

  • Declare pnpm as the project package manager and update scripts to use pnpm for workspace dev, test, and build commands.

CI:

  • Update GitHub Actions workflow to install dependencies and run builds/tests via pnpm instead of npm.

Tests:

  • Add comprehensive tests for transient model cooldowns, SSE stream heartbeat and stall protection, fallback API pool values, and router behavior under recency-weighted analytics.
  • Adjust existing tests to account for new cooldown maps, analytics refresh behavior, and thread protection logging.

vi70x3 added 15 commits June 5, 2026 14:26
- Change activeRequests from Map to Set to allow concurrent requests from same session
- Add stale active request cleanup with 10-minute TTL
- Cache owl-alpha model ID to avoid repeated DB lookups
- Fix active request iteration to use Set-compatible syntax
- Remove package-lock.json (npm lockfile)
- Add packageManager field to package.json
- Create .npmrc with pnpm configuration
BUG-05: Abort upstream provider stream on stall detection by breaking
the for-await loop and calling gen.return() when the keepalive timer
detects MAX_STREAM_STALL_MS has elapsed without data.

BUG-06: Fix cooldown guard to use the actual routable fallback chain
(fallback_config JOIN models) instead of all enabled models, ensuring
transient cooldowns only skip models that would actually be routed to.

BUG-10: Remove double semicolon in proxy.ts.

Also adds SSE keep-alive comments during idle periods, transient model
cooldown injection before retry loops, and LongCat sticky session
cooldown support in balanced routing mode.
…, TTL refresh, collapsible pools, doc paths, cleanup
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity, test fixes
…ts concurrency, wrapped-error consistency, heartbeat fake timers, cooldown test accuracy
@mergeguards

mergeguards Bot commented Jun 5, 2026

Copy link
Copy Markdown

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

@sourcery-ai

sourcery-ai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Reviewer's Guide

Refactors proxy sticky-session and thread protection logic into a generalized rules engine, introduces shared transient model cooldowns and active-request tracking for provider-ban platforms, adds SSE stream heartbeat/stall protection, and surfaces model pools in the fallback UI while standardizing pnpm usage in CI.

Sequence diagram for proxy thread protection, active requests, and transient cooldowns

sequenceDiagram
  actor Client
  participant Proxy as proxy.handleChatCompletion
  participant Router as routeRequest
  participant Provider as route.provider
  participant ThreadProtection as evaluateThreadProtection
  participant ActiveRequests as activeRequests
  participant Cooldowns as transientModelCooldowns

  Client->>Proxy: POST /proxy (chat request)
  Proxy->>Router: routeRequest(messages, routingMode, skipModels, skipKeys)
  Router-->>Proxy: route

  alt stream=true
    Proxy->>ActiveRequests: add({sessionKey, platform, modelId, startTime})
    Proxy->>Provider: streamChatCompletion(apiKey, messages, modelId, options)
    loop streaming chunks
      Provider-->>Proxy: ChatCompletionChunk
      Proxy->>Proxy: writeResponseStreamStart / writeResponseStreamChunk
    end
    Proxy->>ActiveRequests: delete({sessionKey, platform, modelId})
  else stream stalled (no chunks)
    Proxy->>Proxy: streamKeepaliveConfig / keepaliveTimer
    Proxy->>Proxy: [MAX_STREAM_STALL_MS exceeded]
    Proxy->>Proxy: writeResponseStreamEvent OR res.write error
    Proxy->>ActiveRequests: delete({sessionKey, platform, modelId})
  end

  alt provider returns 5xx / truncation / retryable error
    Proxy->>ThreadProtection: evaluateThreadProtection({platform, kind, midStream, modelDbId, error})
    ThreadProtection-->>Proxy: ThreadProtectionAction
    alt action.banProvider
      Proxy->>Proxy: banPlatformFromSession(messages, routingMode, platform, modelDbId)
      Proxy->>Proxy: addProviderModelsToSkipModels(skipModels, platform)
    end
    alt action.skipModel
      Proxy->>Proxy: skipModels.add(modelDbId)
    end
    alt !isRetryableError(err) and isTransientCooldownEligible
      Proxy->>Cooldowns: set(modelDbId, now + TRANSIENT_COOLDOWN_MS)
    end
  end

  Note over Proxy,Cooldowns: On next attempts, proxy injects transientModelCooldowns into skipModels before calling routeRequest again
Loading

File-Level Changes

Change Details Files
Generalize provider thread protection and sticky cooldown behavior in the proxy, including active-request safeguards and transient model cooldowns.
  • Replace LongCat/Owl Alpha-specific sticky cooldowns with a provider-ban aware cooldown keyed by platform protection level.
  • Introduce activeRequests tracking and use it to gate bandit routing for provider-ban platforms when another session is actively using them.
  • Add transientModelCooldowns map and inject its entries into skipModels with automatic expiry pruning and sticky override.
  • Route truncation, 5xx, and retryable errors through evaluateThreadProtection(), enabling provider-ban vs model-skip decisions per platform.
  • Align balanced-mode filtering to allow a sticky preferred model even when its platform is excluded from balanced routing.
  • Update provider-session-ban tests to treat balanced mode as a first-class sticky mode and validate new behaviors.
server/src/routes/proxy.ts
server/src/services/threadProtection.ts
server/src/__tests__/routes/provider-session-ban.test.ts
server/src/__tests__/routes/transient-cooldown.test.ts
.roo/specs/generalized-thread-protection/*
.roo/specs/transient-model-cooldown/*
.roo/specs/pr13-code-review-fixes/*
.roo/specs/sse-stream-heartbeat-stall-protection/*
Add SSE stream heartbeat and stall protection around streaming responses to avoid hung connections and provide explicit timeout errors.
  • Introduce streamKeepaliveConfig with heartbeat and stall thresholds and use it to drive a keepalive interval.
  • Wrap streamChatCompletion consumption with a keepalive timer that writes SSE comments and aborts on stalls, including pre-stream stalls.
  • Ensure upstream generators are cleaned up and sessions deregistered on normal completion, errors, stalls, or client disconnects.
server/src/routes/proxy.ts
server/src/__tests__/routes/stream-heartbeat-stall.test.ts
.roo/specs/sse-stream-heartbeat-stall-protection/*
Detect wrapped error payloads in HTTP 200 responses across providers and surface them as ProviderApiError into the existing retry logic.
  • Expose extractErrorMessage() as protected and add isWrappedError() / throwWrappedError() helpers on BaseProvider.
  • Invoke wrapped-error detection in chatCompletion() and streamChatCompletion() paths for OpenAI-compatible, Cloudflare, Cohere, and Google providers.
  • Adjust streaming parsers to ignore malformed chunks, detect first wrapped-error chunks, and throw ProviderApiError with appropriate status codes.
server/src/providers/base.ts
server/src/providers/openai-compat.ts
server/src/providers/cloudflare.ts
server/src/providers/cohere.ts
server/src/providers/google.ts
.roo/specs/wrapped-error-interception/*
Introduce recency-biased analytics for Thompson sampling and expose model pools and speeds in fallback APIs and UI.
  • Change router stats aggregation to use recency-weighted totals while preserving raw counts for dashboard display.
  • Ensure analytics scores refresh cache on access and use rawTotal for visible totals while successRate uses weighted stats.
  • Assign models to logical pools (fast/balanced/smart) in fallback API responses and validate pool enums in tests.
server/src/services/router.ts
server/src/routes/fallback.ts
server/src/__tests__/services/router.test.ts
server/src/__tests__/routes/fallback.test.ts
shared/types.ts
.roo/specs/recency-biased-thompson-sampling/*
Group fallback models into collapsible pools in the client and improve accessibility of the pools UI.
  • Extend fallback entries with pool metadata and order models by pool, then metrics.
  • Introduce PoolSection and PoolBadge components to render collapsible, labeled sections for fast/balanced/smart pools.
  • Add keyboard-accessible controls and ARIA attributes for expanding/collapsing pool sections.
client/src/pages/FallbackPage.tsx
client/src/components/pool-section.tsx
client/src/components/pool-badge.tsx
Standardize pnpm usage and CI configuration while cleaning up specs and design docs for routing and thread protection.
  • Switch GitHub Actions workflow and root scripts to pnpm, including workspace builds and tests.
  • Fix design documents (ASCII diagrams, relative paths) for Owl Alpha/LongCat routing and thread protection, and remove npm lockfile.
.github/workflows/ci.yml
package.json
.roo/specs/owl-alpha-longcat-model-routing/*
.roo/specs/*
.npmrc

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepsource-io

deepsource-io Bot commented Jun 5, 2026

Copy link
Copy Markdown

DeepSource Code Review

We reviewed changes in 1336542...ed5d381 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
JavaScript Jun 5, 2026 7:37p.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Generalized thread protection with real-time tracking, stream heartbeat, transient cooldowns, and wrapped error interception

✨ Enhancement 🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
  **Core Features:**
• Implemented generalized thread protection rules engine (threadProtection.ts) replacing hardcoded
  LongCat/Owl Alpha logic with configurable protection levels (provider-ban, model-skip, off)
• Added real-time active request tracking via activeRequests Set to prevent concurrent sessions
  from overwhelming provider-ban platforms
• Introduced SSE stream heartbeat (15s interval) and stall detection (60s timeout) with graceful
  error recovery and client-disconnect cleanup
• Implemented transient model cooldowns (15s) for non-retryable errors, automatically injected into
  skipModels to temporarily exclude problematic models during outages
  **Provider Enhancements:**
• Added wrapped error detection and handling across all providers (OpenAI, Cohere, Cloudflare,
  Google) to intercept error payloads returned with HTTP 200 status
• Made extractErrorMessage() protected in BaseProvider for subclass access
• Fixed malformed chunk handling to continue instead of silently skipping
  **Analytics & Routing:**
• Implemented recency-weighted analytics using 7-day decay formula for more responsive model
  selection
• Updated balanced mode to support real session hashing instead of empty strings
• Added model pool classification (Fast, Balanced, Smart) to fallback API and UI
  **Testing & Documentation:**
• Added comprehensive test suites for transient cooldowns (539 lines), stream heartbeat/stall
  detection (347 lines), and balanced mode behavior
• Created design specifications for all major features with architecture diagrams and implementation
  guidance
• Fixed test expectations for balanced mode session key behavior and mock fetch handling
  **Infrastructure:**
• Migrated from npm to pnpm as package manager across CI/CD, configuration, and workspace management
• Updated GitHub Actions workflow to use pnpm with frozen lockfile for reproducible builds
• Added .npmrc configuration with pnpm-specific settings
  **Documentation Fixes:**
• Corrected relative file path references in specification documents
• Added PR #13 code review bug verification and fix tracking
Diagram
flowchart LR
  A["HTTP 200<br/>Wrapped Errors"] -->|"isWrappedError()"| B["BaseProvider<br/>Detection"]
  B -->|"throwWrappedError()"| C["ProviderApiError"]
  
  D["Concurrent<br/>Requests"] -->|"Active Request<br/>Tracking"| E["ThreadProtection<br/>Rules Engine"]
  E -->|"Provider-ban<br/>Model-skip<br/>Off"| F["Protection<br/>Decision"]
  
  G["Delayed<br/>Chunks"] -->|"15s Heartbeat<br/>60s Stall"| H["SSE Stream<br/>Protection"]
  H -->|"Keep-alive<br/>Comments"| I["Graceful<br/>Recovery"]
  
  J["5xx/Connection<br/>Errors"] -->|"15s Cooldown"| K["Transient<br/>Cooldowns"]
  K -->|"skipModels<br/>Injection"| L["Model<br/>Exclusion"]
  
  M["Historical<br/>Stats"] -->|"7-day Decay<br/>Weighting"| N["Recency-biased<br/>Analytics"]
  N -->|"Responsive<br/>Selection"| O["Router<br/>Optimization"]

Loading

Grey Divider

File Changes

1. server/src/routes/proxy.ts ✨ Enhancement +333/-214

Generalized thread protection with heartbeat, stall detection, and transient cooldowns

• Replaced hardcoded LongCat/Owl Alpha cooldown logic with generalized threadProtection rules
 engine supporting configurable protection levels (provider-ban, model-skip, off)
• Implemented real-time active request tracking via activeRequests Set to prevent concurrent
 sessions from overwhelming provider-ban platforms
• Added stream keepalive heartbeat (15s interval) and stall detection (60s timeout) to prevent
 hanging streams and enable graceful error recovery
• Introduced transient model cooldowns (15s) for non-retryable errors, automatically injected into
 skipModels to temporarily exclude problematic models
• Refactored session key logic to support balanced mode with real hashing instead of empty strings
• Updated addProviderModelsToSkipModels() to query fallback chain instead of all enabled models
 for consistency

server/src/routes/proxy.ts


2. server/src/services/threadProtection.ts ✨ Enhancement +119/-0

New thread protection rules engine service module

• New service module implementing configurable thread protection rules engine
• Provides getProtectionLevel() to look up platform protection levels from environment config
• Implements evaluateThreadProtection() decision matrix for 5xx, truncation, and retryable errors
• Supports three protection levels: provider-ban (exclude entire platform), model-skip (skip single
 model), off (no action)
• Defaults to provider-ban for LongCat and model-skip for all other platforms for backward
 compatibility

server/src/services/threadProtection.ts


3. server/src/__tests__/routes/transient-cooldown.test.ts 🧪 Tests +539/-0

Transient model cooldown test suite with error scenarios

• Comprehensive test suite for transient model cooldown functionality (539 lines)
• Tests cooldown map basics, injection/pruning logic, auto-recovery after expiry
• Validates sticky session override behavior when models are on global cooldown
• Tests cooldown registration via actual proxy errors (501 non-retryable vs 502/429 retryable)
• Verifies integration with addProviderModelsToSkipModels() and session bans

server/src/tests/routes/transient-cooldown.test.ts


View more (40)
4. server/src/__tests__/routes/stream-heartbeat-stall.test.ts 🧪 Tests +347/-0

Stream heartbeat and stall detection test suite

• New test suite for SSE stream heartbeat and stall protection (347 lines)
• Tests keep-alive comment emission during delayed first chunks
• Validates stream termination with stream_timeout error on stall detection
• Tests pre-stream stall detection and error retry behavior
• Verifies cleanup on client disconnect and normal streaming with heartbeat enabled

server/src/tests/routes/stream-heartbeat-stall.test.ts


5. server/src/services/router.ts ✨ Enhancement +21/-10

Recency-weighted analytics and balanced mode sticky exception

• Added recency weighting to analytics stats cache using 7-day decay formula
• Tracks both weighted and raw unweighted totals for accurate success rate calculation
• Fixed getAnalyticsScores() to call refreshStatsCache() and use rawTotal for display
• Updated balanced mode filtering to allow preferred sticky models through exclusion list

server/src/services/router.ts


6. server/src/__tests__/routes/proxy-tools.test.ts 🧪 Tests +16/-11

Test fixes for mock fetch and stream encoding

• Updated mock fetch to handle localhost requests and generic /chat/completions paths
• Fixed stream chunk encoding to batch all data before enqueueing (prevents timing issues)
• Added transientModelCooldowns import and cleanup in beforeEach
• Updated LongCat cooldown test expectation to match new lowercase log message format

server/src/tests/routes/proxy-tools.test.ts


7. server/src/__tests__/routes/provider-session-ban.test.ts 🧪 Tests +14/-14

Balanced mode session key behavior test updates

• Updated balanced mode tests to reflect new behavior where getSessionKey() returns real hash
 instead of empty string
• Changed test expectations: balanced mode now creates sticky session entries instead of skipping
 them
• Updated test descriptions to clarify that balanced mode uses real keys but separate from smart
 mode

server/src/tests/routes/provider-session-ban.test.ts


8. server/src/providers/base.ts ✨ Enhancement +30/-1

Wrapped error detection and handling in base provider

• Made extractErrorMessage() protected instead of private for subclass access
• Added isWrappedError() method to detect error payloads returned with HTTP 200 status
• Added throwWrappedError() method to throw ProviderApiError from wrapped error payloads

server/src/providers/base.ts


9. server/src/providers/openai-compat.ts ✨ Enhancement +14/-1

Wrapped error detection for OpenAI-compatible providers

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() before yielding first chunk
• Fixed malformed chunk handling to continue instead of silently skipping

server/src/providers/openai-compat.ts


10. server/src/providers/cohere.ts ✨ Enhancement +12/-1

Wrapped error detection for Cohere provider

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() before yielding chunks
• Fixed malformed chunk handling to continue instead of silently skipping

server/src/providers/cohere.ts


11. server/src/providers/cloudflare.ts ✨ Enhancement +12/-1

Wrapped error detection for Cloudflare provider

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() before yielding chunks
• Fixed malformed chunk handling to continue instead of silently skipping

server/src/providers/cloudflare.ts


12. server/src/providers/google.ts ✨ Enhancement +10/-0

Wrapped error detection for Google provider

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() before yielding chunks

server/src/providers/google.ts


13. server/src/routes/fallback.ts ✨ Enhancement +10/-0

Model pool classification in fallback API

• Added getModelPool() function to classify models into Fast, Balanced, or Smart pools
• LongCat and Owl Alpha models assigned to Smart pool; all others to Balanced pool
• Added pool field to fallback API response for each model

server/src/routes/fallback.ts


14. server/src/__tests__/routes/fallback.test.ts 🧪 Tests +11/-0

Fallback API pool property tests

• Added test to verify fallback API response includes pool property
• Added test to validate all pool values are valid ModelPool enum values

server/src/tests/routes/fallback.test.ts


15. shared/types.ts ✨ Enhancement +8/-0

Model pool type definition

• Added ModelPool enum with three values: Fast, Balanced, Smart
• Exported ModelPool type for use in client and server code

shared/types.ts


16. .roo/specs/sse-stream-heartbeat-stall-protection/design.md 📝 Documentation +330/-0

Design specification for stream heartbeat and stall protection

• New comprehensive design document for SSE stream heartbeat and stall protection feature
• Includes architecture diagram, implementation details, edge cases, and file modification guide
• Documents interaction with existing code paths and Responses API streams

.roo/specs/sse-stream-heartbeat-stall-protection/design.md


17. client/src/pages/FallbackPage.tsx ✨ Enhancement +49/-30

Fallback page UI grouped by model pool

• Refactored model display to group entries by pool (Fast, Balanced, Smart)
• Added PoolSection component wrapper for each pool group
• Added pool-specific titles and ordering
• Updated FallbackEntry interface to include pool field

client/src/pages/FallbackPage.tsx


18. .roo/specs/owl-alpha-longcat-model-routing/requirements.md 📝 Documentation +5/-5

Documentation path reference corrections

• Fixed relative file path references to use correct depth (../../../ instead of ../)

.roo/specs/owl-alpha-longcat-model-routing/requirements.md


19. .npmrc ⚙️ Configuration changes +4/-0

pnpm package manager configuration

• New pnpm configuration file with shamefully-hoist, strict-peer-dependencies, and
 auto-install-peers settings

.npmrc


20. .roo/specs/wrapped-error-interception/design.md 📝 Documentation +337/-0

Design for HTTP 200 wrapped error payload interception

• Comprehensive design document for intercepting wrapped error payloads returned with HTTP 200
 status
• Defines isWrappedError() predicate and throwWrappedError() helper methods on BaseProvider
• Specifies integration points across four provider implementations (OpenAI, Cohere, Cloudflare,
 Google)
• Documents error detection flow, wrapped error formats, and edge case handling

.roo/specs/wrapped-error-interception/design.md


21. .roo/specs/pr13-code-review-fixes/requirements.md 📝 Documentation +268/-0

PR #13 code review bug verification and fix requirements

• Documents 10 verified bugs from PR #13 code review organized by priority (P0, P1, P2)
• Critical bugs include SQL parenthesis mismatch, wrapped error swallowing in streaming, and NaN
 conversion risks
• High-priority issues cover hardcoded platform references, stall detection, and cooldown guard
 logic
• Includes acceptance criteria and priority ordering for fixes

.roo/specs/pr13-code-review-fixes/requirements.md


22. .roo/specs/transient-model-cooldown/design.md 📝 Documentation +197/-0

Design for shared temporary model cooldowns during outages

• Introduces module-level in-memory circuit breaker for shared transient failure state across
 concurrent requests
• Defines transientModelCooldowns Map with 15-second cooldown window for 5xx/connection failures
• Specifies integration points: pre-routing injection, sticky session override, cooldown
 registration, mid-stream handling
• Documents error classification matrix and test strategy with auto-recovery mechanism

.roo/specs/transient-model-cooldown/design.md


23. .roo/specs/recency-biased-thompson-sampling/design.md 📝 Documentation +238/-0

Design for recency-biased Thompson Sampling with time decay

• Replaces flat request aggregation with linear time-decay weighting in Thompson Sampling router
• Implements SQL CTE with MIN(1.0, MAX(0.0, 1.0 - age_in_days / 7.0)) recency weight formula
• Extends ModelStats interface with rawSuccesses and rawTotal fields for dashboard
 transparency
• Adds Math.max(0.1, ...) safety guards to prevent non-positive Beta distribution parameters

.roo/specs/recency-biased-thompson-sampling/design.md


24. .roo/specs/pr13-code-review-fixes/design.md 📝 Documentation +210/-0

Design and implementation strategy for PR #13 code review fixes

• Provides implementation guidance for fixing 10 verified bugs from PR #13
• Details decision rationale for each bug category (SQL fix, wrapped error propagation, NaN guard,
 hardcoded refs, stall abort, cooldown guard)
• Outlines data flow and risk assessment for each fix
• Emphasizes incremental approach for large refactoring (BUG-04 hardcoded platform references)

.roo/specs/pr13-code-review-fixes/design.md


25. .roo/specs/sse-stream-heartbeat-stall-protection/requirements.md 📝 Documentation +132/-0

Requirements for SSE stream heartbeats and stall protection

• Specifies SSE keep-alive heartbeats (15-second interval) and stall detection (45-second threshold)
• Defines heartbeat format (: keep-alive\n\n SSE comments) and stall behavior (error frame +
 socket close)
• Documents client-disconnect cleanup, heartbeat write failure handling, and pre/mid-stream stall
 paths
• Includes constants configuration, backward compatibility, and race condition safety requirements

.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md


26. .roo/specs/generalized-thread-protection/requirements.md 📝 Documentation +109/-0

Requirements for generalized thread protection scanner

• Addresses hardcoded longcat and owl-alpha platform checks scattered across proxy.ts
• Defines user stories for configurable thread protection via environment variable
• Specifies rules engine API (getProtectionLevel(), evaluateThreadProtection()) and
 configuration format
• Outlines migration plan with four phases and comprehensive test requirements

.roo/specs/generalized-thread-protection/requirements.md


27. .roo/specs/generalized-thread-protection/design.md 📝 Documentation +152/-0

Design for generalized thread protection scanner

• Introduces dynamic, provider-agnostic thread protection decision engine in new
 threadProtection.ts module
• Defines protection levels (provider-ban, model-skip, off) and error context kinds (5xx,
 truncation, retryable)
• Specifies decision matrix mapping protection levels to actions across error types
• Documents six integration points in proxy.ts and sticky cooldown generalization

.roo/specs/generalized-thread-protection/design.md


28. .roo/specs/owl-alpha-longcat-model-routing/design.md 📝 Documentation +49/-49

Formatting improvements to routing design documentation

• Minor formatting improvements: converts code blocks from implicit to explicit text language
 markers
• Adjusts ASCII diagram indentation for consistency and readability
• No functional changes to the routing architecture or decision flows

.roo/specs/owl-alpha-longcat-model-routing/design.md


29. .roo/specs/wrapped-error-interception/tasks.md 📝 Documentation +69/-0

Implementation tasks for wrapped error interception

• Provides 13-step implementation checklist for wrapped error interception feature
• Details method additions to BaseProvider (isWrappedError(), throwWrappedError())
• Specifies integration points across four provider implementations with streaming-specific notes
• Includes TypeScript compilation check and test suite verification steps

.roo/specs/wrapped-error-interception/tasks.md


30. .roo/specs/wrapped-error-interception/requirements.md 📝 Documentation +53/-0

Requirements for HTTP 200 wrapped error payload handling

• Defines requirements for detecting and handling error payloads returned with HTTP 200 status
• Specifies detection logic (root-level error field check) and error handling (throw
 ProviderApiError)
• Documents functional requirements (FR-1 through FR-8) and non-functional requirements (NFR-1
 through NFR-5)
• Clarifies out-of-scope items and integration with existing retry loop

.roo/specs/wrapped-error-interception/requirements.md


31. .roo/specs/recency-biased-thompson-sampling/tasks.md 📝 Documentation +17/-0

Implementation tasks for recency-biased Thompson Sampling

• Breaks down recency-biased Thompson Sampling into 13 implementation tasks
• Tasks T1-T4 cover constant definition, interface extension, SQL rewrite, and cache population
• Tasks T5-T9 add safety guards and dashboard display updates
• Tasks T10-T13 specify test cases for outage sensitivity, fractional evaluation, clock drift, and
 regression testing

.roo/specs/recency-biased-thompson-sampling/tasks.md


32. .roo/specs/recency-biased-thompson-sampling/requirements.md 📝 Documentation +76/-0

Requirements for recency-biased Thompson Sampling

• Specifies linear time-decay weighting formula for historical request aggregation
• Requires Math.max(0.1, ...) guards for Beta distribution parameter safety
• Documents constraints (no schema changes, no new dependencies) and test cases (T-1 outage
 sensitivity, T-2 fractional evaluation)
• Includes edge case and risk mitigation table

.roo/specs/recency-biased-thompson-sampling/requirements.md


33. .roo/specs/transient-model-cooldown/requirements.md 📝 Documentation +38/-0

Requirements for shared temporary model cooldowns

• Defines problem of concurrent requests independently attempting failing models during outages
• Specifies cross-request transient failure state with 15-second global cooldown window
• Documents integration with existing routing logic and sticky session precedence
• Includes auto-recovery via expiry and acceptance criteria for validation

.roo/specs/transient-model-cooldown/requirements.md


34. .roo/specs/sse-stream-heartbeat-stall-protection/tasks.md 📝 Documentation +20/-0

Implementation tasks for SSE heartbeat and stall protection

• Provides 10-step implementation checklist for SSE heartbeat and stall protection
• Specifies constant definitions, state variable setup, and cleanupStream() function
• Details heartbeat interval logic, stall detection paths (pre-stream and mid-stream), and
 client-disconnect handling
• Includes unit test requirements for heartbeat emission, stall detection, and cleanup scenarios

.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md


35. .roo/specs/transient-model-cooldown/tasks.md 📝 Documentation +16/-0

Implementation tasks for transient model cooldowns

• Outlines 8 implementation tasks for transient model cooldown feature
• Tasks T-1 to T-2 cover module-level declarations and exports
• Tasks T-3 to T-6 specify integration points in handleChatCompletion() and error handlers
• Tasks T-7 to T-8 define test file creation and regression testing

.roo/specs/transient-model-cooldown/tasks.md


36. .roo/specs/generalized-thread-protection/tasks.md 📝 Documentation +12/-0

Implementation tasks for generalized thread protection

• Provides 8-step implementation plan for generalizing thread protection scanner
• Tasks T-1 to T-3 involve renaming and removing hardcoded LongCat/Owl Alpha blocks
• Task T-4 inserts generalized thread protection scanner logic
• Tasks T-5 to T-8 cover execution order verification, test creation, regression testing, and smoke
 testing

.roo/specs/generalized-thread-protection/tasks.md


37. client/src/components/pool-section.tsx ✨ Enhancement +41/-0

Add collapsible pool section component with accessibility

• New collapsible pool section component with expand/collapse toggle functionality
• Implements keyboard accessibility with Enter and Space key support
• Uses ARIA attributes (aria-expanded, aria-label, role="button") for screen reader
 compatibility
• Displays visual indicator (▼/▶) and integrates PoolBadge component for pool type display

client/src/components/pool-section.tsx


38. client/src/components/pool-badge.tsx ✨ Enhancement +16/-0

Create pool badge component with theme styling

• New badge component for displaying pool types (fast, balanced, smart)
• Defines PoolType type and poolStyles configuration with theme-aware colors
• Renders styled badge with pool-specific labels and dark mode support

client/src/components/pool-badge.tsx


39. .roo/specs/pr13-code-review-fixes/tasks.md 📝 Documentation +14/-0

Add PR

• New task tracking document for PR #13 code review fixes
• Lists 10 bugs with completion status (3 marked as completed: BUG-05, BUG-06, BUG-10)
• Covers SQL fixes, error propagation, thread protection, and code cleanup tasks

.roo/specs/pr13-code-review-fixes/tasks.md


40. .github/workflows/ci.yml ⚙️ Configuration changes +9/-5

Migrate GitHub Actions CI workflow from npm to pnpm

• Added pnpm/action-setup@v4 step to install pnpm version 10
• Replaced all npm commands with pnpm equivalents throughout workflow
• Updated Node cache from npm to pnpm and changed install command to use --frozen-lockfile
• Updated build and test commands to use pnpm --filter syntax for workspace management

.github/workflows/ci.yml


41. package.json ⚙️ Configuration changes +1/-0

Specify pnpm as required package manager

• Added packageManager field specifying pnpm@11.1.3 to enforce pnpm usage

package.json


42. AGENTS.md Additional files +0/-0

...

AGENTS.md


43. server/src/__tests__/services/router.test.ts Additional files +2/-27

...

server/src/tests/services/router.test.ts


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Jun 5, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

Grey Divider


Action required

1. Premature keepalive writes 🐞 Bug ☼ Reliability
Description
In handleChatCompletion() streaming mode, the keepalive timer writes : keep-alive to res even
when streamStarted is still false, which can commit the HTTP response before SSE headers are set.
This can produce malformed SSE (wrong/missing headers) and undermines the code’s stated “pre-stream
errors stay retryable” behavior.
Code

server/src/routes/proxy.ts[R1379-1381]

+            if (!stalled && elapsed >= streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS) {
+              try { res.write(': keep-alive\n\n'); } catch { /* socket gone */ }
            }
Evidence
The proxy currently writes keepalive comments without checking streamStarted, while SSE headers
are only set inside the first-chunk branch. The design spec in this PR states keepalives must only
be written after SSE headers are sent to avoid malformed output.

server/src/routes/proxy.ts[1309-1402]
.roo/specs/sse-stream-heartbeat-stall-protection/design.md[110-129]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`proxy.ts` writes SSE keepalive comments before SSE headers are sent. This can commit the response prematurely, making later `res.setHeader(...)` calls invalid/ineffective and breaking pre-stream retry semantics.

## Issue Context
The design spec explicitly says keepalive comments should only be written after SSE headers are sent (`streamStarted === true`). The current implementation does not check `streamStarted`.

## Fix Focus Areas
- server/src/routes/proxy.ts[1335-1405]
- .roo/specs/sse-stream-heartbeat-stall-protection/design.md[110-129]

### Implementation direction
- Change the keepalive write to only occur when `streamStarted === true` (or `res.headersSent === true` after you’ve set SSE headers).
- Ensure no `res.write(...)` happens before the first time you set `Content-Type: text/event-stream`, `Cache-Control`, `Connection`, etc.
- If you still need pre-first-chunk liveness, choose one:
 - set SSE headers immediately (accepting that pre-stream errors are no longer “retryable” via normal JSON responses), or
 - don’t emit keepalives until after headers, relying on stall timeout to abort/retry.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. ActiveRequests not fully cleared 🐞 Bug ≡ Correctness
Description
activeRequests stores per-request objects in a Set, but the deregistration logic deletes only
the first matching entry and breaks, leaving duplicates when a session has concurrent requests to
the same platform/model. These stale entries can incorrectly trigger the active-request safeguard
and exclude provider-ban platforms from routing after the real requests have finished.
Code

server/src/routes/proxy.ts[R1631-1636]

+          if (sessionKey) {
+            for (const active of activeRequests) {
+              if (active.sessionKey === sessionKey && active.platform === route.platform && active.modelId === route.modelId) {
+                activeRequests.delete(active);
+                break;
+              }
Evidence
The code adds a fresh object to the Set per request, then removes only a single matching entry due
to break. The safeguard consumes activeRequests to exclude provider-ban platforms, so leftover
entries change routing decisions.

server/src/routes/proxy.ts[24-29]
server/src/routes/proxy.ts[1244-1252]
server/src/routes/proxy.ts[1319-1328]
server/src/routes/proxy.ts[1629-1638]
server/src/routes/proxy.ts[1641-1692]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`activeRequests` uses a `Set` of object literals; concurrent requests with identical `{sessionKey, platform, modelId}` create multiple distinct entries (identity-based). The cleanup loop deletes only one matching entry due to `break`, leaving stale entries.

## Issue Context
The active-request safeguard iterates `activeRequests` to exclude provider-ban platforms from routing, so stale entries can affect unrelated sessions.

## Fix Focus Areas
- server/src/routes/proxy.ts[24-29]
- server/src/routes/proxy.ts[1319-1328]
- server/src/routes/proxy.ts[1629-1638]
- server/src/routes/proxy.ts[1641-1692]
- server/src/routes/proxy.ts[1244-1252]

### Implementation direction
Pick one robust approach:
1) **Store and delete by reference**
  - `const active = { ... }` before `activeRequests.add(active)`
  - In `finally`, call `activeRequests.delete(active)`
  - Works correctly for duplicates/concurrency.

2) **Use a counting map**
  - `Map<string, number>` keyed by `${sessionKey}:${platform}:${modelId}`
  - Increment on start, decrement on finish; delete key when count hits 0.

Avoid value-scanning + `break`, which is brittle under concurrency.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Wrapped error string ignored 🐞 Bug ◔ Observability
Description
BaseProvider.isWrappedError() treats { error: string } as an error payload, but
throwWrappedError() calls extractErrorMessage() which does not extract string-valued error,
causing the thrown message to fall back to 'Unknown wrapped error'. This loses the upstream error
text and reduces debuggability when providers wrap errors in HTTP 200 responses.
Code

server/src/providers/base.ts[R126-150]

+  protected isWrappedError(body: unknown): boolean {
+    if (body === null || typeof body !== 'object' || Array.isArray(body)) return false;
+    const obj = body as Record<string, unknown>;
+    if (!('error' in obj) || obj.error === null) return false;
+    return typeof obj.error === 'string' || typeof obj.error === 'object';
+  }
+
+  /** Throw a ProviderApiError from a detected wrapped error payload.
+   *  Called after isWrappedError() returns true. */
+  protected throwWrappedError(body: unknown): void {
+    const obj = body as Record<string, unknown>;
+    const errPayload = obj.error;
+    const message = this.extractErrorMessage(body, 'Unknown wrapped error');
+    const error = new Error(
+      `${this.name} API error (wrapped in 200): ${message}`,
+    ) as ProviderApiError;
+    const rawCode = (errPayload as Record<string, unknown>).code;
+    const parsedCode = typeof rawCode === 'number' ? rawCode : Number(rawCode);
+    error.status =
+      typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>)
+        ? (Number.isFinite(parsedCode) ? parsedCode : 200)
+        : 200;
+    error.provider = this.name;
+    error.responseBody = body;
+    throw error;
Evidence
isWrappedError() explicitly accepts string-valued error, but extractErrorMessage() only reads
nested error.message (object form), so {error: "..."} ends up using the fallback message inside
throwWrappedError().

server/src/providers/base.ts[112-118]
server/src/providers/base.ts[124-150]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When a wrapped error payload has the shape `{ error: "some string" }`, `isWrappedError()` returns true, but `throwWrappedError()` builds the message via `extractErrorMessage(body, ...)`, which only reads `error.message` and therefore drops the string.

## Issue Context
This PR adds wrapped-error interception in multiple providers (Cloudflare/Cohere/Google/OpenAI-compat). Losing the error message makes these new failures hard to diagnose.

## Fix Focus Areas
- server/src/providers/base.ts[112-151]

### Implementation direction
- In `throwWrappedError()`, prefer the actual `errPayload` when it’s a string:
 - `const message = typeof errPayload === 'string' ? errPayload : this.extractErrorMessage(body, 'Unknown wrapped error');`
- Optionally extend `extractErrorMessage()` to return `err.error` when it is a string.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In the SSE keepalive / stall code, the pre-stream stall path currently throws from inside the setInterval callback, which won’t be caught by the surrounding try/catch and can surface as an unhandled exception; instead, consider signalling the stall via shared state (e.g. setting a flag or storing an error) and handling the retry/504 logic in the main async function, rather than throwing directly from the timer callback.
  • The active-request tracking and transient cooldown logic now walks the entire activeRequests/transientModelCooldowns maps on every request; if you expect many concurrent sessions, you might want to index these by a composite key (e.g. ${sessionKey}:${platform}:${modelId}) or cache per-platform model lists to avoid repeated O(n) scans in hot paths.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the SSE keepalive / stall code, the pre-stream stall path currently throws from inside the `setInterval` callback, which won’t be caught by the surrounding `try`/`catch` and can surface as an unhandled exception; instead, consider signalling the stall via shared state (e.g. setting a flag or storing an error) and handling the retry/504 logic in the main async function, rather than throwing directly from the timer callback.
- The active-request tracking and transient cooldown logic now walks the entire `activeRequests`/`transientModelCooldowns` maps on every request; if you expect many concurrent sessions, you might want to index these by a composite key (e.g. `${sessionKey}:${platform}:${modelId}`) or cache per-platform model lists to avoid repeated O(n) scans in hot paths.

## Individual Comments

### Comment 1
<location path="client/src/components/pool-section.tsx" line_range="16" />
<code_context>
+}) {
+  const [isExpanded, setIsExpanded] = useState(true);
+
+  const handleKeyDown = (e: React.KeyboardEvent) => {
+    if (e.key === 'Enter' || e.key === ' ') {
+      e.preventDefault();
</code_context>
<issue_to_address>
**issue:** Using React.KeyboardEvent requires importing the React type, which currently isn't in scope.

In this file only `useState` and `type ReactNode` are imported, so `React` isn’t defined as a namespace. To fix this, either import the event type directly:

```ts
import { useState, type ReactNode, type KeyboardEvent } from 'react';

const handleKeyDown = (e: KeyboardEvent) => { ... };
```

or add a React namespace type import if you prefer `React.*`:

```ts
import * as React from 'react';
```

and keep `React.KeyboardEvent`.
</issue_to_address>

### Comment 2
<location path="server/src/providers/openai-compat.ts" line_range="115-119" />
<code_context>

     const decoder = new TextDecoder();
     let buffer = '';
+    let hasYielded = false;

</code_context>
<issue_to_address>
**question (bug_risk):** Wrapped-error detection only runs on the first parsed SSE data line; later error payloads may be treated as partial chunks.

Because `hasYielded` is set after the first successful event, only that event is ever checked with `isWrappedError(parsed)`. Any later wrapped error will be treated as a normal chunk.

To support error payloads appearing later in the stream, consider either:
- Removing the `hasYielded` guard and relying only on `isWrappedError`, or
- Limiting the guard to some early window (e.g., until the first non-empty `choices`).

If you keep the current behavior, it would help to document that only “error as first event” is supported.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

}) {
const [isExpanded, setIsExpanded] = useState(true);

const handleKeyDown = (e: React.KeyboardEvent) => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Using React.KeyboardEvent requires importing the React type, which currently isn't in scope.

In this file only useState and type ReactNode are imported, so React isn’t defined as a namespace. To fix this, either import the event type directly:

import { useState, type ReactNode, type KeyboardEvent } from 'react';

const handleKeyDown = (e: KeyboardEvent) => { ... };

or add a React namespace type import if you prefer React.*:

import * as React from 'react';

and keep React.KeyboardEvent.

Comment on lines 115 to 119
let buffer = '';
let hasYielded = false;

while (true) {
const { done, value } = await reader.read();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Wrapped-error detection only runs on the first parsed SSE data line; later error payloads may be treated as partial chunks.

Because hasYielded is set after the first successful event, only that event is ever checked with isWrappedError(parsed). Any later wrapped error will be treated as a normal chunk.

To support error payloads appearing later in the stream, consider either:

  • Removing the hasYielded guard and relying only on isWrappedError, or
  • Limiting the guard to some early window (e.g., until the first non-empty choices).

If you keep the current behavior, it would help to document that only “error as first event” is supported.

Comment on lines +1379 to 1381
if (!stalled && elapsed >= streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS) {
try { res.write(': keep-alive\n\n'); } catch { /* socket gone */ }
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Premature keepalive writes 🐞 Bug ☼ Reliability

In handleChatCompletion() streaming mode, the keepalive timer writes : keep-alive to res even
when streamStarted is still false, which can commit the HTTP response before SSE headers are set.
This can produce malformed SSE (wrong/missing headers) and undermines the code’s stated “pre-stream
errors stay retryable” behavior.
Agent Prompt
## Issue description
`proxy.ts` writes SSE keepalive comments before SSE headers are sent. This can commit the response prematurely, making later `res.setHeader(...)` calls invalid/ineffective and breaking pre-stream retry semantics.

## Issue Context
The design spec explicitly says keepalive comments should only be written after SSE headers are sent (`streamStarted === true`). The current implementation does not check `streamStarted`.

## Fix Focus Areas
- server/src/routes/proxy.ts[1335-1405]
- .roo/specs/sse-stream-heartbeat-stall-protection/design.md[110-129]

### Implementation direction
- Change the keepalive write to only occur when `streamStarted === true` (or `res.headersSent === true` after you’ve set SSE headers).
- Ensure no `res.write(...)` happens before the first time you set `Content-Type: text/event-stream`, `Cache-Control`, `Connection`, etc.
- If you still need pre-first-chunk liveness, choose one:
  - set SSE headers immediately (accepting that pre-stream errors are no longer “retryable” via normal JSON responses), or
  - don’t emit keepalives until after headers, relying on stall timeout to abort/retry.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@vi70x4 vi70x4 merged commit 7913684 into main Jun 5, 2026
1 of 3 checks passed
@vi70x4 vi70x4 deleted the fix/realtime-sticky branch June 5, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants