Skip to content

Fix/realtime sticky#18

Closed
vi70x3 wants to merge 14 commits into
mainfrom
fix/realtime-sticky
Closed

Fix/realtime sticky#18
vi70x3 wants to merge 14 commits into
mainfrom
fix/realtime-sticky

Conversation

@vi70x3

@vi70x3 vi70x3 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary by Sourcery

Refine routing resilience and admin UX by introducing generalized thread protection, transient model cooldowns, and SSE stream heartbeat/stall handling, while surfacing model pools in the fallback dashboard and tightening analytics and error handling across providers.

New Features:

  • Expose model pool metadata via the fallback API and group models into Fast/Balanced/Smart pools with collapsible sections in the admin fallback page.
  • Introduce a generalized thread protection service that centralizes provider/model ban decisions based on error context and configurable protection levels.
  • Add a shared transient model cooldown mechanism so recent 5xx failures temporarily suppress problematic models across concurrent requests.
  • Implement SSE stream keepalive heartbeats and stall detection to protect long-running streams from upstream hangs and intermediary timeouts.
  • Support detection and propagation of wrapped error payloads returned with HTTP 200 across OpenAI-compatible, Cohere, Cloudflare, and Google providers.

Bug Fixes:

  • Correct and harden recency-weighted analytics aggregation for routing decisions to avoid miscomputed statistics.
  • Ensure balanced-mode routing respects sticky preferences by allowing preferred models from otherwise excluded platforms.
  • Fix tests and specs that referenced outdated paths, behaviors, or expectations around sticky sessions, routing, and fallback metadata.

Enhancements:

  • Bias Thompson-sampling-based routing toward recent behavior using time-decayed success/total counts while keeping raw counts for observability.
  • Track active requests per session/platform and apply safeguards for provider-ban platforms to prevent concurrent overload from multiple sessions.
  • Normalize provider error handling so 5xx, truncation, and retryable errors flow through the thread protection rules instead of hardcoded LongCat/Owl Alpha logic.
  • Export and reuse base provider helpers for richer error classification, including wrapped error parsing and safer status code derivation.
  • Improve fallback API validation and tests to assert that pool values conform to the shared ModelPool enum.

Documentation:

  • Document the designs and requirements for generalized thread protection, transient cooldowns, recency-biased Thompson sampling, SSE stall protection, and wrapped error interception in the .roo spec suite.

Tests:

  • Add comprehensive tests for transient model cooldown behavior, including registration, pruning, sticky override, and integration with provider bans.
  • Add SSE streaming tests that cover heartbeats, stall timeouts, pre-stream stall retries, client disconnect cleanup, and normal streaming with keepalive enabled.
  • Extend routing, provider-ban, and fallback API test coverage to reflect balanced-mode stickiness, updated analytics fields, and model pool metadata.

vi70x3 added 13 commits June 5, 2026 14:26
- Change activeRequests from Map to Set to allow concurrent requests from same session
- Add stale active request cleanup with 10-minute TTL
- Cache owl-alpha model ID to avoid repeated DB lookups
- Fix active request iteration to use Set-compatible syntax
- Remove package-lock.json (npm lockfile)
- Add packageManager field to package.json
- Create .npmrc with pnpm configuration
BUG-05: Abort upstream provider stream on stall detection by breaking
the for-await loop and calling gen.return() when the keepalive timer
detects MAX_STREAM_STALL_MS has elapsed without data.

BUG-06: Fix cooldown guard to use the actual routable fallback chain
(fallback_config JOIN models) instead of all enabled models, ensuring
transient cooldowns only skip models that would actually be routed to.

BUG-10: Remove double semicolon in proxy.ts.

Also adds SSE keep-alive comments during idle periods, transient model
cooldown injection before retry loops, and LongCat sticky session
cooldown support in balanced routing mode.
…, TTL refresh, collapsible pools, doc paths, cleanup
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity, test fixes
@mergeguards

mergeguards Bot commented Jun 5, 2026

Copy link
Copy Markdown

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

@sourcery-ai

sourcery-ai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Reviewer's Guide

Introduces generalized provider/thread protection for streaming and non-streaming chat completions, adds transient model cooldowns and SSE heartbeat/stall protection on the server, and restructures fallback routing analytics and UI by adding model pools with grouping, while also improving wrapped-error handling and fixing several design/spec/test issues.

Sequence diagram for SSE streaming with stall protection and thread protection

sequenceDiagram
  actor Client
  participant Proxy as handleChatCompletion
  participant Provider as route.provider
  participant ThreadProtection as evaluateThreadProtection

  Client->>Proxy: handleChatCompletion
  Proxy->>Provider: streamChatCompletion(apiKey, messages, modelId, options)
  Proxy->>Proxy: activeRequests.add
  Proxy->>Proxy: setInterval(keepaliveTimer)

  loop for each chunk
    Provider-->>Proxy: ChatCompletionChunk
    Proxy->>Proxy: writeResponseStreamChunk / res.write
  end

  alt [stream stalls]
    Proxy->>Proxy: cleanup
    alt [streamStarted]
      Proxy-->>Client: writeResponseStreamEvent / res.write timeout
      Proxy-->>Client: res.end
    else [pre-stream stall]
      Proxy->>Proxy: throw Error(status=504)
    end
  else [mid-stream 5xx]
    Proxy->>ThreadProtection: evaluateThreadProtection({ platform, kind: '5xx', midStream: true })
    ThreadProtection-->>Proxy: ThreadProtectionAction
    alt [action.banProvider]
      Proxy->>Proxy: banPlatformFromSession
      Proxy->>Proxy: addProviderModelsToSkipModels
    end
    alt [action.skipModel]
      Proxy->>Proxy: skipModels.add(route.modelDbId)
    end
    Proxy->>Proxy: transientModelCooldowns.set(route.modelDbId, expiry)
  end

  Proxy->>Proxy: activeRequests.delete(sessionKey, platform, modelId)
  Proxy-->>Client: stream completes / response
Loading

File-Level Changes

Change Details Files
Generalized provider-ban/thread protection, transient model cooldowns, active-request safeguards, and SSE heartbeat/stall handling in chat completion proxy routing.
  • Import and use threadProtection service (getProtectionLevel, evaluateThreadProtection) to replace hardcoded LongCat/Owl Alpha error-handling branches.
  • Introduce activeRequests tracking per session/platform/model and use it to exclude provider-ban platforms when another session is actively streaming from them.
  • Add transientModelCooldowns map and associated TRANSIENT_COOLDOWN_MS, inject cooled-down models into skipModels, and register cooldowns on mid-stream and pre-stream 5xx/connection failures.
  • Implement streamKeepaliveConfig with keepalive/stall intervals, plus SSE heartbeat comments and stall timeout handling that either retries pre-stream or emits structured stream_timeout errors mid-stream.
  • Ensure active requests and stream generators are cleaned up on completion, error, or client disconnect to avoid leaks.
  • Adjust provider-ban sticky cooldown to rely on provider protection level instead of specific platforms, and make balanced mode respect sticky preferred models for excluded platforms.
  • Update and extend tests for provider session bans and proxy tools to reflect the new balanced-mode sticky behavior and transient cooldown logic.
  • Document thread protection, transient cooldowns, stall protection, and PR13 fix plan in new design/requirements specs.
server/src/routes/proxy.ts
server/src/services/threadProtection.ts
server/src/__tests__/routes/proxy-tools.test.ts
server/src/__tests__/routes/provider-session-ban.test.ts
server/src/__tests__/routes/stream-heartbeat-stall.test.ts
.roo/specs/sse-stream-heartbeat-stall-protection/design.md
.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md
.roo/specs/transient-model-cooldown/design.md
.roo/specs/transient-model-cooldown/requirements.md
.roo/specs/generalized-thread-protection/design.md
.roo/specs/generalized-thread-protection/requirements.md
.roo/specs/pr13-code-review-fixes/design.md
.roo/specs/pr13-code-review-fixes/requirements.md
.roo/specs/pr13-code-review-fixes/tasks.md
Improve provider adapters to detect wrapped error payloads on HTTP 200 responses and surface them as ProviderApiError, plus expose shared error helpers.
  • Make BaseProvider.extractErrorMessage protected and add isWrappedError/throwWrappedError helpers for detecting root-level error objects in parsed JSON bodies.
  • Invoke wrapped error detection in chatCompletion and streamChatCompletion for OpenAI-compatible, Cloudflare, Cohere, and Google providers, ensuring wrapped 200 errors throw ProviderApiError before normalization.
  • Handle wrapped errors in streaming by checking the first parsed SSE chunk and aborting with throwWrappedError, while ignoring malformed chunks.
  • Set error.status from error.code when numeric, with a fallback and NaN guard, so downstream retry logic can classify errors (e.g., rate limits).
  • Capture and route provider/wrapped errors into existing proxy retry and cooldown logic without modifying router interfaces.
server/src/providers/base.ts
server/src/providers/openai-compat.ts
server/src/providers/cloudflare.ts
server/src/providers/cohere.ts
server/src/providers/google.ts
.roo/specs/wrapped-error-interception/design.md
.roo/specs/wrapped-error-interception/requirements.md
.roo/specs/wrapped-error-interception/tasks.md
Add recency-biased analytics and model pools, and expose pool metadata in the fallback API and UI grouped sections.
  • Change router analytics aggregation to use recency-weighted successes/total plus rawTotal for display, and ensure analytics cache is refreshed before computing scores.
  • Ensure balanced routing excludes LongCat and Owl Alpha by default but allows them when they are the sticky preferred model for a session.
  • Expose ModelPool enum (Fast/Balanced/Smart) in shared types and compute pool from platform/model in fallback route, adding pool/speedRank to the fallback API response with validation tests.
  • Refactor fallback page to group models by pool using new PoolSection/PoolBadge components and show pool-specific titles while keeping sorting and toggling intact.
  • Simplify router tests, ensure enabled/invalid keys are handled correctly, and assert routed API key content.
server/src/services/router.ts
server/src/routes/fallback.ts
server/src/__tests__/services/router.test.ts
server/src/__tests__/routes/fallback.test.ts
shared/types.ts
client/src/pages/FallbackPage.tsx
client/src/components/pool-section.tsx
client/src/components/pool-badge.tsx
.roo/specs/recency-biased-thompson-sampling/design.md
.roo/specs/recency-biased-thompson-sampling/requirements.md
.roo/specs/recency-biased-thompson-sampling/tasks.md
Add and update design/spec/task documents and test scaffolding for PR13 fixes, thread protection, cooldowns, and SSE stall protection.
  • Clean up and reformat existing owl-alpha/longcat routing design docs (code fences, relative paths).
  • Add detailed requirements/design/tasks specs for SSE heartbeat/stall protection, transient model cooldown, generalized thread protection, and PR13 follow-up fixes.
  • Introduce dedicated test suites for stream heartbeat/stall behavior and transient cooldown behavior, including various edge cases and classification tests.
  • Track pnpm as the package manager in package.json and add .npmrc to align tooling.
.roo/specs/owl-alpha-longcat-model-routing/design.md
.roo/specs/owl-alpha-longcat-model-routing/requirements.md
.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md
.roo/specs/transient-model-cooldown/tasks.md
.roo/specs/generalized-thread-protection/tasks.md
.roo/specs/pr13-code-review-fixes/tasks.md
.roo/specs/pr13-code-review-fixes/design.md
.roo/specs/pr13-code-review-fixes/requirements.md
server/src/__tests__/routes/stream-heartbeat-stall.test.ts
server/src/__tests__/routes/transient-cooldown.test.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Realtime Sticky Session Improvements: Thread Protection, Stream Stall Detection, Transient Cooldowns, and Wrapped Error Handling

✨ Enhancement 🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
  **Core Features:**
• Generalized thread protection system replacing hardcoded platform checks with configurable
  getProtectionLevel() and evaluateThreadProtection() functions supporting provider-ban,
  model-skip, and off modes
• Stream keepalive heartbeat (15s interval) and stall detection (45s threshold) to prevent hanging
  SSE streams with graceful termination
• Transient model cooldowns (15s global window) for 5xx errors shared across concurrent requests
  with sticky session override logic
• Wrapped error detection for HTTP 200 responses with root-level error field across all provider
  adapters (OpenAI-compat, Cohere, Cloudflare, Google)
• Recency-weighted analytics using 7-day decay function to prioritize recent performance data in
  routing decisions
  **Implementation Details:**
• New threadProtection.ts service module with environment-driven configuration via
  THREAD_PROTECTION_PLATFORMS
• Active request tracking to prevent concurrent sessions from overwhelming provider-ban platforms
• Fallback chain for model skipping instead of all enabled models
• Model pool classification (Fast, Balanced, Smart) in fallback API and UI with collapsible pool
  sections
• Updated balanced mode routing to allow preferred sticky models through exclusion filters
  **Testing & Documentation:**
• Comprehensive test suites for transient cooldowns (30+ cases), stream heartbeat/stall detection (5
  cases), and pool classification
• Design specifications for all major features with architecture diagrams and edge case analysis
• Requirements and task documentation for implementation tracking
• Bug fix documentation addressing 10 verified issues from code review
  **Configuration:**
• Added pnpm package manager specification (v11.1.3) with npm configuration file
• Protected error message extraction visibility for provider reuse
Diagram
flowchart LR
  A["Incoming Request"] --> B["Thread Protection Scanner"]
  B --> C["getProtectionLevel()"]
  C --> D["evaluateThreadProtection()"]
  D --> E["Action: Ban/Skip/Clear"]
  
  A --> F["Transient Cooldown Check"]
  F --> G["Skip Cooled Models"]
  G --> H["Route to Provider"]
  
  H --> I["Stream Keepalive"]
  I --> J["Heartbeat 15s"]
  J --> K["Stall Detection 45s"]
  K --> L["Graceful Termination"]
  
  H --> M["Wrapped Error Detection"]
  M --> N["HTTP 200 + error field"]
  N --> O["Register Cooldown"]
  O --> P["Retry with Next Model"]

Loading

Grey Divider

File Changes

1. server/src/routes/proxy.ts ✨ Enhancement +333/-214

Generalized thread protection, stream stall detection, and transient cooldowns

• Refactored sticky session cooldown logic to use generalized getProtectionLevel() and
 evaluateThreadProtection() functions instead of hardcoded LongCat/Owl Alpha checks
• Added stream keepalive heartbeat and stall detection with configurable intervals to prevent
 hanging streams
• Implemented active request tracking to prevent concurrent sessions from overwhelming provider-ban
 platforms
• Added transient model cooldowns (15s global cooldown) for models returning 5xx errors, shared
 across all concurrent requests
• Replaced provider-specific error handling with unified evaluateThreadProtection() decision
 matrix for 5xx, truncation, and retryable errors
• Updated addProviderModelsToSkipModels() to use fallback chain instead of all enabled models

server/src/routes/proxy.ts


2. server/src/services/threadProtection.ts ✨ Enhancement +119/-0

New thread protection service with configurable platform policies

• New service module implementing configurable thread protection levels per platform (provider-ban,
 model-skip, off)
• Provides getProtectionLevel() to look up protection configuration and
 evaluateThreadProtection() to determine actions (ban provider, skip model, clear sticky)
• Parses THREAD_PROTECTION_PLATFORMS environment variable for runtime configuration with
 backward-compatible defaults
• Centralizes error response decision logic previously scattered across proxy.ts

server/src/services/threadProtection.ts


3. server/src/__tests__/routes/transient-cooldown.test.ts 🧪 Tests +415/-0

Test suite for transient model cooldown system

• Comprehensive test suite for transient model cooldown functionality with 6 test suites covering
 30+ test cases
• Tests cooldown map basics, injection/pruning logic, auto-recovery after expiry, sticky session
 override behavior
• Validates cooldown registration eligibility (5xx and connection failures only, not auth/rate-limit
 errors)
• Tests integration with addProviderModelsToSkipModels() for combined session-ban and
 global-cooldown scenarios

server/src/tests/routes/transient-cooldown.test.ts


View more (39)
4. server/src/__tests__/routes/stream-heartbeat-stall.test.ts 🧪 Tests +330/-0

Test suite for stream heartbeat and stall detection

• New test suite validating SSE stream heartbeat and stall protection with 5 test cases
• Tests keep-alive comment emission during idle periods, stream termination on stall detection,
 pre-stream stall handling
• Validates client disconnect cleanup and normal streaming operation with heartbeat enabled
• Uses configurable streamKeepaliveConfig for test-friendly timing

server/src/tests/routes/stream-heartbeat-stall.test.ts


5. server/src/services/router.ts ✨ Enhancement +21/-10

Recency-weighted analytics and sticky model routing exceptions

• Added recency weighting to analytics stats calculation using 7-day decay function to prioritize
 recent performance data
• Added rawTotal field to ModelStats to track unweighted request count for analytics reporting
• Updated getAnalyticsScores() to refresh stats cache and return rawTotal instead of weighted
 total
• Modified balanced mode routing to allow preferred sticky models through exclusion filters
 (LongCat, Owl Alpha)

server/src/services/router.ts


6. server/src/__tests__/services/router.test.ts 🧪 Tests +2/-27

Minor router test cleanup and imports

• Removed test for invalid key status (no longer relevant)
• Added import for refreshStatsCache and getAnalyticsScores functions
• Simplified test setup by removing redundant comments

server/src/tests/services/router.test.ts


7. server/src/__tests__/routes/provider-session-ban.test.ts 🧪 Tests +14/-14

Update balanced mode session tests for new key behavior

• Updated balanced mode tests to reflect new behavior where getSessionKey() returns real hash
 instead of empty string
• Changed test expectations for banPlatformFromSession() and setStickyModel() to expect entries
 in balanced mode
• Updated test descriptions to clarify that balanced mode now uses real session keys

server/src/tests/routes/provider-session-ban.test.ts


8. server/src/providers/base.ts ✨ Enhancement +30/-1

Add wrapped error detection and handling to base provider

• Changed extractErrorMessage() visibility from private to protected for reuse in error
 handling
• Added isWrappedError() predicate to detect root-level error field in JSON responses (HTTP 200
 with error payload)
• Added throwWrappedError() helper to construct and throw ProviderApiError from wrapped error
 payloads with proper status code extraction

server/src/providers/base.ts


9. server/src/providers/openai-compat.ts ✨ Enhancement +14/-1

Add wrapped error detection to OpenAI-compatible provider

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() for first SSE chunk before yielding
• Improved error handling to skip malformed chunks and detect wrapped errors in streaming responses

server/src/providers/openai-compat.ts


10. server/src/providers/cohere.ts ✨ Enhancement +12/-1

Add wrapped error detection to Cohere provider

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() for SSE chunks before yielding

server/src/providers/cohere.ts


11. server/src/providers/cloudflare.ts ✨ Enhancement +12/-1

Add wrapped error detection to Cloudflare provider

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() for SSE chunks before yielding

server/src/providers/cloudflare.ts


12. server/src/providers/google.ts ✨ Enhancement +10/-0

Add wrapped error detection to Google provider

• Added wrapped error detection in chatCompletion() after JSON parsing and before candidate access
• Added wrapped error detection in streamChatCompletion() after parsing each Gemini response chunk

server/src/providers/google.ts


13. server/src/routes/fallback.ts ✨ Enhancement +10/-0

Add model pool classification to fallback API

• Added getModelPool() function to classify models into Smart or Balanced pools based on
 platform/model ID
• Added pool field to fallback API response indicating which pool each model belongs to
• Imported ModelPool type from shared types

server/src/routes/fallback.ts


14. server/src/__tests__/routes/fallback.test.ts 🧪 Tests +11/-0

Add pool property validation tests to fallback API

• Added test to verify fallback API response includes pool property
• Added test to validate all returned pool values are valid ModelPool enum values
• Imported ModelPool type for test validation

server/src/tests/routes/fallback.test.ts


15. server/src/__tests__/routes/proxy-tools.test.ts 🧪 Tests +4/-3
 Minor updates to proxy tools tests for transient cooldown support

server/src/tests/routes/proxy-tools.test.ts


16. shared/types.ts ✨ Enhancement +8/-0

Add ModelPool enum type to shared types

• Added ModelPool constant object with three pool types: Fast, Balanced, Smart
• Added ModelPool type alias for the union of pool type values

shared/types.ts


17. .roo/specs/sse-stream-heartbeat-stall-protection/design.md 📝 Documentation +330/-0

Design specification for stream heartbeat and stall protection

• Comprehensive design document for stream heartbeat and stall detection feature
• Includes architecture diagrams, implementation details, edge case handling, and file modification
 guide
• Documents interaction with existing code paths and Responses API streams

.roo/specs/sse-stream-heartbeat-stall-protection/design.md


18. .roo/specs/wrapped-error-interception/design.md 📝 Documentation +337/-0

Design specification for wrapped error interception

• Comprehensive design document for wrapped error payload detection on HTTP 200 responses
• Includes architecture overview, component changes for each provider, error detection flow, and
 edge case analysis
• Documents wrapped error formats and integration with existing retry/cooldown logic

.roo/specs/wrapped-error-interception/design.md


19. .roo/specs/transient-model-cooldown/requirements.md 📝 Documentation +38/-0

Requirements specification for transient model cooldowns

• Requirements document for shared temporary cooldowns to mitigate concurrent failure impact
• Defines problem statement, 6 key requirements, scope, and acceptance criteria
• Specifies 15-second cooldown window and integration with existing routing logic

.roo/specs/transient-model-cooldown/requirements.md


20. package.json ⚙️ Configuration changes +1/-0

Specify pnpm package manager version

• Added packageManager field specifying pnpm@11.1.3 as the required package manager

package.json


21. .npmrc ⚙️ Configuration changes +4/-0

Add pnpm configuration file

• New npm configuration file with pnpm-specific settings
• Enables shamefully-hoist, disables strict peer dependencies, enables auto-install-peers

.npmrc


22. .roo/specs/pr13-code-review-fixes/requirements.md 📝 Documentation +268/-0

PR #13 Code Review Bugs Documentation and Fix Plan

• Comprehensive documentation of 10 verified bugs found in PR #13, organized by severity (P0
 critical, P1 behavioral, P2 code quality)
• Detailed problem statements, code examples, and impact analysis for each bug including SQL
 parenthesis mismatch, wrapped error swallowing, NaN validation, hardcoded platform references, stall
 detection, and cooldown guard issues
• Acceptance criteria and priority matrix provided for tracking fixes

.roo/specs/pr13-code-review-fixes/requirements.md


23. .roo/specs/transient-model-cooldown/design.md Design +197/-0

Transient Model Cooldown Circuit Breaker Design

• Architecture for shared in-memory circuit breaker using module-level transientModelCooldowns Map
 to track temporary model failures across concurrent requests
• Integration points for cooldown injection, sticky session override, failure registration, and
 mid-stream error handling
• Error classification matrix distinguishing between 5xx/connection failures (trigger cooldown) vs
 rate limits/auth/client errors (do not trigger)
• Test strategy and risk mitigation for preventing all-models-on-cooldown scenarios

.roo/specs/transient-model-cooldown/design.md


24. .roo/specs/recency-biased-thompson-sampling/design.md Design +238/-0

Recency-Biased Thompson Sampling Time-Decay Architecture

• Time-decay weighting mechanism for analytics aggregation using linear decay formula over 7-day
 window
• SQL CTE-based query with MIN(1.0, MAX(0.0, ...)) bounds to protect against clock drift
• Beta parameter safety guards using Math.max(0.1, ...) to prevent non-positive values
• Dashboard display updates showing weighted success rates alongside raw request counts

.roo/specs/recency-biased-thompson-sampling/design.md


25. .roo/specs/pr13-code-review-fixes/design.md Design +210/-0

PR #13 Code Review Fixes Design and Implementation

• Design decisions for fixing 10 identified bugs with specific code patterns and examples
• Detailed fixes for SQL parenthesis, wrapped error propagation, NaN validation, and hardcoded
 platform references
• Stall upstream abort mechanism using AbortController and cooldown guard model set correction
• Risk assessment matrix mapping each fix to risk level and mitigation strategy

.roo/specs/pr13-code-review-fixes/design.md


26. .roo/specs/sse-stream-heartbeat-stall-protection/requirements.md Requirements +132/-0

SSE Stream Heartbeat and Stall Protection Requirements

• SSE keep-alive heartbeat mechanism (15-second interval) to prevent intermediate proxy idle
 timeouts
• Stream stall detection (45-second threshold) with graceful termination and structured error frames
• Client-disconnect cleanup and heartbeat write failure handling with idempotent cleanup routine
• Pre-stream and mid-stream stall behavior differentiation with appropriate error responses

.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md


27. .roo/specs/generalized-thread-protection/requirements.md Requirements +109/-0

Generalized Thread Protection Scanner Requirements

• Problem statement addressing 6+ hardcoded longcat platform checks scattered across proxy.ts
• User stories for configurable thread protection via environment variables and platform-agnostic
 rules engine
• Acceptance criteria requiring zero hardcoded platform names and unified
 evaluateThreadProtection() decision point
• Technical requirements for rules engine API, configuration format, and proxy refactoring with
 migration plan

.roo/specs/generalized-thread-protection/requirements.md


28. client/src/pages/FallbackPage.tsx ✨ Enhancement +49/-30

Fallback Page Pool-Based Model Grouping UI

• Added PoolSection component import and PoolType type import for pool-based grouping
• Added pool field to FallbackEntry interface to support pool categorization
• Refactored model display to group entries by pool (fast, balanced, smart) with collapsible
 sections
• Pool groups filtered to show only non-empty pools with descriptive titles

client/src/pages/FallbackPage.tsx


29. .roo/specs/generalized-thread-protection/design.md Design +152/-0

Generalized Thread Protection Scanner Architecture

• Architecture overview showing thread protection scanner replacing hardcoded platform checks with
 dynamic rules engine
• Protection rules matrix defining behavior for provider-ban, model-skip, and off levels
 across error types
• Scanner API with ErrorContext and ThreadProtectionAction interfaces for decision matrix
 implementation
• Integration points replacing 6 hardcoded longcat blocks and sticky cooldown generalization

.roo/specs/generalized-thread-protection/design.md


30. .roo/specs/wrapped-error-interception/tasks.md Tasks +69/-0

Wrapped Error Interception Implementation Tasks

• Implementation tasks for adding isWrappedError() and throwWrappedError() methods to
 BaseProvider
• Tasks for integrating wrapped error checks in all four provider implementations (OpenAI-compat,
 Cohere, Cloudflare, Google)
• Visibility change for extractErrorMessage() from private to protected for reuse
• TypeScript compilation and test verification tasks

.roo/specs/wrapped-error-interception/tasks.md


31. .roo/specs/wrapped-error-interception/requirements.md Requirements +53/-0

Wrapped Error Payloads on HTTP 200 Responses Requirements

• Critical edge case handling for upstream providers returning error payloads with HTTP 200 status
• Detection layer for root-level error field in JSON responses before normalization
• Functional requirements for all provider adapters to inspect parsed JSON and throw
 ProviderApiError
• Non-functional requirements for backward compatibility, minimal performance impact, and existing
 retry loop integration

.roo/specs/wrapped-error-interception/requirements.md


32. .roo/specs/recency-biased-thompson-sampling/tasks.md Tasks +17/-0

Recency-Biased Thompson Sampling Implementation Tasks

• Task breakdown for implementing time-decay aggregation including ANALYTICS_WINDOW_DAYS constant
 and ModelStats interface extension
• SQL query rewrite with CTE-based weighted aggregation and Math.max(0.1, ...) guards in scoring
 functions
• Dashboard display updates to show rawTotal instead of weighted totals
• Test cases for outage sensitivity, safe fractional evaluation, and clock drift safety

.roo/specs/recency-biased-thompson-sampling/tasks.md


33. .roo/specs/recency-biased-thompson-sampling/requirements.md Requirements +76/-0

Recency-Biased Thompson Sampling Requirements

• Linear time-decay weighting formula for historical request aggregation over 7-day window
• Backward compatibility requirements with Beta sampling using Math.max(0.1, ...) guards
• Zero-extension portability using standard SQLite julianday() function
• Test cases for outage sensitivity and safe fractional evaluation with edge case risk mitigation

.roo/specs/recency-biased-thompson-sampling/requirements.md


34. .roo/specs/sse-stream-heartbeat-stall-protection/tasks.md Tasks +20/-0

SSE Stream Heartbeat and Stall Protection Tasks

• Task list for adding KEEPALIVE_INTERVAL_MS and MAX_STREAM_STALL_MS constants
• Implementation of heartbeat interval with stall detection logic and cleanupStream() function
• Pre-stream and mid-stream stall handling with appropriate error frames and response termination
• Unit tests for heartbeat emission, stall detection, client-disconnect cleanup, and write failure
 handling

.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md


35. .roo/specs/transient-model-cooldown/tasks.md Tasks +16/-0

Transient Model Cooldown Implementation Tasks

• Implementation tasks for declaring transientModelCooldowns Map and cooldown constant at module
 level
• Pre-routing cooldown injection with expired entry pruning and sticky session override logic
• Global cooldown registration on 5xx and connection failures in retry loop and mid-stream error
 handlers
• Unit test coverage for cooldown injection, registration, sticky override, and auto-recovery

.roo/specs/transient-model-cooldown/tasks.md


36. client/src/components/pool-section.tsx ✨ Enhancement +41/-0

Pool Section Collapsible Component

• New collapsible section component for grouping models by pool type (fast, balanced, smart)
• Expandable/collapsible UI with keyboard accessibility (Enter/Space keys) and ARIA labels
• Renders pool badge and title with visual indicator (▼/▶) for expanded/collapsed state

client/src/components/pool-section.tsx


37. .roo/specs/owl-alpha-longcat-model-routing/design.md Formatting +49/-49

Owl Alpha LongCat Model Routing Design Formatting

• Fixed markdown code block formatting from bare backticks to text language specification
• Corrected ASCII diagram indentation and alignment for better readability
• Maintained all architectural content describing smart preference flow, sticky cooldown, and error
 handling

.roo/specs/owl-alpha-longcat-model-routing/design.md


38. client/src/components/pool-badge.tsx ✨ Enhancement +16/-0

Pool Badge Component with Color Coding

• New badge component for displaying pool type (fast, balanced, smart) with color-coded styling
• Exports PoolType type definition for use across components
• Provides dark mode support with appropriate color schemes for each pool type

client/src/components/pool-badge.tsx


39. .roo/specs/pr13-code-review-fixes/tasks.md Tasks +14/-0

PR #13 Code Review Fixes Task Checklist

• Task checklist for fixing 10 identified bugs with completion status indicators
• Tasks organized by bug ID (BUG-01 through BUG-10) with specific file locations and fix
 descriptions
• Tracks completion status with checkboxes for SQL fix, wrapped error propagation, NaN guard,
 hardcoded refs, stall abort, cooldown guard, debug script removal, spec completion, test SQL, and
 semicolon removal

.roo/specs/pr13-code-review-fixes/tasks.md


40. .roo/specs/generalized-thread-protection/tasks.md Tasks +12/-0

Generalized Thread Protection Implementation Tasks

• Implementation tasks for renaming LONGCAT_STICKY_COOLDOWN_MS to THREAD_COOLDOWN_MS with
 reference updates
• Tasks for removing hardcoded LongCat and Owl Alpha cooldown blocks and inserting generalized
 thread protection scanner
• Execution order verification for skipModels pipeline and test file creation
• Manual smoke test task for concurrent request thread protection validation

.roo/specs/generalized-thread-protection/tasks.md


41. .roo/specs/owl-alpha-longcat-model-routing/requirements.md 📝 Documentation +5/-5

Owl Alpha LongCat Model Routing Requirements Path Fixes

• Fixed relative file path references from ../ to ../../../ to correctly point to server source
 files
• Updated all four dependency links to use correct path depth for router.ts, proxy.ts, and
 db/index.ts

.roo/specs/owl-alpha-longcat-model-routing/requirements.md


42. AGENTS.md Additional files +0/-0

...

AGENTS.md


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Jun 5, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

Grey Divider


Action required

1. Timer throw crashes process 🐞 Bug ☼ Reliability
Description
In server/src/routes/proxy.ts, the pre-stream stall path throws from inside the setInterval
keepalive callback, which is outside the request handler’s try/catch and can surface as an uncaught
exception (crashing the server) instead of returning a 504 or retrying fallback.
Code

server/src/routes/proxy.ts[R1343-1375]

+          const keepaliveTimer = setInterval(() => {
+            if (stalled) {
+              clearInterval(keepaliveTimer);
+              return;
+            }
+            const elapsed = Date.now() - lastChunkTime;
+            if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
+              stalled = true;
+              cleanup();
+              if (streamStarted) {
+                const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } };
+                try {
+                  if (responseStreamContext) {
+                    writeResponseStreamEvent(res, {
+                      type: 'response.failed',
+                      response: {
+                        id: responseStreamContext.responseId,
+                        status: 'failed',
+                        error: payload.error,
+                      },
+                    });
+                  } else {
+                    res.write(`data: ${JSON.stringify(payload)}\n\n`);
+                    res.write('data: [DONE]\n\n');
+                  }
+                  res.end();
+                } catch { /* socket gone */ }
+              } else {
+                // Pre-stream stall: throw so the outer catch can retry fallback models
+                throw Object.assign(
+                  new Error(`Stream timed out: no data received from provider ${route.displayName}`),
+                  { status: 504 }
+                );
Evidence
The code explicitly throws inside setInterval() when !streamStarted, but the only surrounding
try/finally is around the async generator iteration; the timer callback runs on a separate call
stack and won’t be caught there. The newly added test expects a clean 504 response for this
scenario, which this implementation cannot reliably produce (it will instead surface as an uncaught
exception).

server/src/routes/proxy.ts[1343-1416]
server/src/tests/routes/stream-heartbeat-stall.test.ts[178-210]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`handleChatCompletion()` throws an error from inside the `setInterval()` keepalive callback when a stream stalls before the first chunk. That throw is not catchable by the surrounding request/stream try/catch and can crash the Node process.

### Issue Context
The intention (also asserted by tests) is to return HTTP 504 on pre-stream stall (no SSE headers sent yet) or to allow the outer retry loop to fall back.

### Fix Focus Areas
- server/src/routes/proxy.ts[1343-1416]

### Implementation notes
- Do **not** `throw` inside the timer callback.
- Instead, set a `stallError` variable (or resolve/reject a Promise) and stop the generator (`cleanup()`), then after the `for await` loop ends, `throw stallError` from the main async function flow so the existing outer `catch`/retry logic can handle it.
- Ensure the pre-stream stall path results in `status=504` and `error.type='stream_timeout'` as the test expects.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. ActiveRequests stale entries 🐞 Bug ☼ Reliability
Description
activeRequests stores per-request objects, but cleanup deletes only the first matching entry and
breaks; concurrent requests for the same session/platform/model can leave stale entries that keep
provider-ban platforms excluded until the 10-minute TTL cleanup runs.
Code

server/src/routes/proxy.ts[R1629-1636]

+        } finally {
+          // Ensure the session is deregistered immediately on end/abort/fail
+          if (sessionKey) {
+            for (const active of activeRequests) {
+              if (active.sessionKey === sessionKey && active.platform === route.platform && active.modelId === route.modelId) {
+                activeRequests.delete(active);
+                break;
+              }
Evidence
activeRequests is a Set of object literals and is explicitly described as supporting concurrent
requests; however, the cleanup logic deletes only one matching object (with break), so additional
concurrent entries remain and can incorrectly influence routing safeguards.

server/src/routes/proxy.ts[22-25]
server/src/routes/proxy.ts[1321-1327]
server/src/routes/proxy.ts[1629-1636]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`activeRequests` is a `Set` of newly-created objects. On cleanup, the code searches the set and deletes only the first matching entry, which can leak additional entries if multiple concurrent requests exist with the same `sessionKey/platform/modelId`.

### Issue Context
The code comment says the Set is used to “allow concurrent requests from the same session”. If concurrency is allowed, cleanup must remove the specific entry added by that request (or decrement a reference count).

### Fix Focus Areas
- server/src/routes/proxy.ts[1320-1328]
- server/src/routes/proxy.ts[1629-1638]
- server/src/routes/proxy.ts[1643-1651]

### Implementation notes
Prefer one of:
1) Store the created object in a local `const active = { ... }` and remove it via `activeRequests.delete(active)` in `finally` (no iteration, no ambiguity).
2) Replace the Set with a `Map<string, number>` keyed by `sessionKey|platform|modelId` and increment/decrement counts.
3) If keeping the current structure, delete *all* matches (remove the `break`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@mergeguards

mergeguards Bot commented Jun 5, 2026

Copy link
Copy Markdown

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In the new stream stall handler, the pre-stream timeout path throws an error directly from inside the setInterval callback, which won’t be caught by the outer try/catch and can surface as an unhandled exception; consider signaling the outer flow (e.g., via a flag or abort controller) instead of throwing inside the timer so the retry logic can handle the timeout deterministically.
  • The transient model cooldown handling in the non-stream error path has overlapping branches (ban-eligible 5xx vs isTransientCooldownEligible) that both register cooldowns on non-retryable errors; it would be clearer and less error-prone to consolidate this into a single decision path so it’s obvious exactly when a model enters cooldown and you avoid duplicated set/log behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the new stream stall handler, the pre-stream timeout path throws an error directly from inside the `setInterval` callback, which won’t be caught by the outer `try/catch` and can surface as an unhandled exception; consider signaling the outer flow (e.g., via a flag or abort controller) instead of throwing inside the timer so the retry logic can handle the timeout deterministically.
- The transient model cooldown handling in the non-stream error path has overlapping branches (ban-eligible 5xx vs `isTransientCooldownEligible`) that both register cooldowns on non-retryable errors; it would be clearer and less error-prone to consolidate this into a single decision path so it’s obvious exactly when a model enters cooldown and you avoid duplicated `set`/log behavior.

## Individual Comments

### Comment 1
<location path="server/src/routes/proxy.ts" line_range="1338-1343" />
<code_context>
+          let lastChunkTime = Date.now();
+          let stalled = false;
+
+          const cleanup = () => {
+            clearInterval(keepaliveTimer);
+            try { gen.return(undefined); } catch { /* already closed */ }
+          };
+
+          const keepaliveTimer = setInterval(() => {
+            if (stalled) {
+              clearInterval(keepaliveTimer);
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid referencing keepaliveTimer before initialization and throwing inside the interval callback

Two issues to address:

1) `cleanup` closes over `keepaliveTimer` before it’s initialized with `const`. If `cleanup` runs early (e.g. `req.on('close')` fires immediately), `clearInterval(keepaliveTimer)` will hit the temporal dead zone and throw a `ReferenceError`. Declare `let keepaliveTimer: NodeJS.Timeout | undefined` before `cleanup`, assign it after, and guard `clearInterval` with `if (keepaliveTimer)`.

2) Throwing from inside the `setInterval` callback won’t be caught by the outer `try/catch` around the streaming loop and can crash the process. Instead, set a flag and have the main loop handle the error, or call a rejection/abort handler from the timer callback rather than throwing directly.
</issue_to_address>

### Comment 2
<location path="server/src/providers/openai-compat.ts" line_range="115-116" />
<code_context>

     const decoder = new TextDecoder();
     let buffer = '';
+    let hasYielded = false;

     while (true) {
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Wrapped-error detection is inconsistent between providers and depends on hasYielded flag

For OpenAI-compatible streaming you only call `isWrappedError(parsed)` before the first yielded chunk (`!hasYielded`), while Cloudflare/Cohere/Google check every chunk. This means wrapped error payloads later in the stream won’t be detected. Consider either checking `isWrappedError` on every chunk for consistency, or explicitly documenting that only first-chunk errors are treated as wrapped and confirming upstream behavior matches that assumption.

Suggested implementation:

```typescript
        let parsed: ChatCompletionChunk;
        try {
          parsed = JSON.parse(data) as ChatCompletionChunk;
        } catch {
          // Skip malformed chunks
        }

        // Detect wrapped errors consistently on every chunk
        if (this.isWrappedError(parsed)) {
          this.throwWrappedError(parsed);
        }

```

I assumed that the existing code only called `this.isWrappedError(parsed)` conditionally, something like `if (!hasYielded && this.isWrappedError(parsed))`. If that condition still exists elsewhere in the file, you should remove the `!hasYielded &&` part so that `isWrappedError` is checked unconditionally:

- Replace `if (!hasYielded && this.isWrappedError(parsed)) {` with `if (this.isWrappedError(parsed)) {`.

If `hasYielded` is no longer used for anything else after this change, you should also remove the `let hasYielded = false;` declaration to avoid an unused variable.
</issue_to_address>

### Comment 3
<location path="server/src/__tests__/routes/stream-heartbeat-stall.test.ts" line_range="34-43" />
<code_context>
+describe('SSE stream heartbeat and stall protection', () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Time-based stream heartbeat tests are at risk of flakiness; consider using fake timers.

These tests cover key behavior but depend on real `setTimeout` delays and tweaked global config, which can be flaky in CI and slow.

Consider:
- Using `vi.useFakeTimers()` / `vi.setSystemTime()` and advancing timers instead of real waits.
- Driving heartbeat and stall detection via `vi.advanceTimersByTime()` so you can assert exact timeout/error points.
- Keeping at most one end-to-end test with real timing, and moving other timing-sensitive checks to a fully fake-timer setup.

That should keep coverage of the heartbeat/stall logic while making the suite faster and more reliable.
</issue_to_address>

### Comment 4
<location path="server/src/__tests__/routes/transient-cooldown.test.ts" line_range="316-325" />
<code_context>
+  describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Cooldown registration tests re-encode implementation conditions rather than exercising the actual routing paths.

These tests largely restate the implementation’s boolean conditions instead of exercising the real code that mutates `transientModelCooldowns`, making them fragile and tightly coupled to current logic.

Prefer tests that:
- Invoke the actual error-handling path (or a thin wrapper) with simulated statuses/errors.
- Assert on `transientModelCooldowns` contents for cases like 5xx, 429, 401, 404, and undefined status.

That way the tests validate observable behavior and remain stable even if the internal conditions or eligible statuses change.

Suggested implementation:

```typescript
  // ---------- Test Suite 5: Cooldown registration Error Classification ----------
  describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
    beforeEach(() => {
      // Ensure we start from a clean cooldown state for each test
      transientModelCooldowns.clear();
    });

    function invokeCooldownErrorPath(options: {
      status?: number;
      error?: Error;
      modelId?: string;
    }) {
      /**
       * Thin wrapper around the real error-handling / routing code that is
       * responsible for registering cooldowns for transient models.
       *
       * This MUST call into the same path the router uses when a transient
       * model request fails (e.g. something like `handleTransientModelError`),
       * so that these tests exercise observable behavior rather than
       * re-encoding implementation details.
       */
      return handleTransientModelErrorForTest(options);
    }

    it('registers a cooldown for 5xx upstream errors (500-504)', () => {
      const eligibleStatuses = [500, 502, 503, 504];

      for (const status of eligibleStatuses) {
        transientModelCooldowns.clear();

        invokeCooldownErrorPath({
          status,
          error: new Error(`upstream ${status}`),
          modelId: 'test-model',
        });

        // Assert based on observable cooldown state rather than status checks
        expect(transientModelCooldowns.has('test-model')).toBe(true);
      }
    });

    it('does not register a cooldown for 429 rate limit errors', () => {
      invokeCooldownErrorPath({
        status: 429,
        error: new Error('rate limited'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('does not register a cooldown for 401 unauthorized errors', () => {
      invokeCooldownErrorPath({
        status: 401,
        error: new Error('unauthorized'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('does not register a cooldown for 404 not found errors', () => {
      invokeCooldownErrorPath({
        status: 404,
        error: new Error('not found'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('registers a cooldown when there is a connection failure (no status)', () => {
      invokeCooldownErrorPath({
        status: undefined,
        error: new Error('ECONNRESET'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(true);
    });

```

1. Implement `handleTransientModelErrorForTest` in this test file (or import it) so that it calls the **same** error-handling path the router uses to register cooldowns for transient models. For example, it might delegate to something like `handleTransientModelError({ status, error, modelId })` exported from the route module.
2. Ensure `transientModelCooldowns` is imported/accessible in this test file and supports `.clear()` and `.has(modelId)` (e.g. a `Map` or similar). If the underlying structure differs (e.g. a `Map` keyed by provider+model, or a plain object), adjust the assertions to check the appropriate key and API.
3. Remove or update any remaining tests inside this `describe` block that still restate implementation conditions (e.g. any leftover `it('429 rate limit is NOT eligible...` that only checks booleans) so that all tests in this suite go through `invokeCooldownErrorPath`.
4. If your production code uses a different identifier than `'test-model'` (e.g. includes provider or route info), update the `modelId` and corresponding `has(...)` checks to match the real key shape used in `transientModelCooldowns`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1338 to +1343
const cleanup = () => {
clearInterval(keepaliveTimer);
try { gen.return(undefined); } catch { /* already closed */ }
};

const keepaliveTimer = setInterval(() => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Avoid referencing keepaliveTimer before initialization and throwing inside the interval callback

Two issues to address:

  1. cleanup closes over keepaliveTimer before it’s initialized with const. If cleanup runs early (e.g. req.on('close') fires immediately), clearInterval(keepaliveTimer) will hit the temporal dead zone and throw a ReferenceError. Declare let keepaliveTimer: NodeJS.Timeout | undefined before cleanup, assign it after, and guard clearInterval with if (keepaliveTimer).

  2. Throwing from inside the setInterval callback won’t be caught by the outer try/catch around the streaming loop and can crash the process. Instead, set a flag and have the main loop handle the error, or call a rejection/abort handler from the timer callback rather than throwing directly.

Comment on lines 115 to +116
let buffer = '';
let hasYielded = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Wrapped-error detection is inconsistent between providers and depends on hasYielded flag

For OpenAI-compatible streaming you only call isWrappedError(parsed) before the first yielded chunk (!hasYielded), while Cloudflare/Cohere/Google check every chunk. This means wrapped error payloads later in the stream won’t be detected. Consider either checking isWrappedError on every chunk for consistency, or explicitly documenting that only first-chunk errors are treated as wrapped and confirming upstream behavior matches that assumption.

Suggested implementation:

        let parsed: ChatCompletionChunk;
        try {
          parsed = JSON.parse(data) as ChatCompletionChunk;
        } catch {
          // Skip malformed chunks
        }

        // Detect wrapped errors consistently on every chunk
        if (this.isWrappedError(parsed)) {
          this.throwWrappedError(parsed);
        }

I assumed that the existing code only called this.isWrappedError(parsed) conditionally, something like if (!hasYielded && this.isWrappedError(parsed)). If that condition still exists elsewhere in the file, you should remove the !hasYielded && part so that isWrappedError is checked unconditionally:

  • Replace if (!hasYielded && this.isWrappedError(parsed)) { with if (this.isWrappedError(parsed)) {.

If hasYielded is no longer used for anything else after this change, you should also remove the let hasYielded = false; declaration to avoid an unused variable.

Comment on lines +34 to +43
describe('SSE stream heartbeat and stall protection', () => {
let app: Express;
let origKeepaliveInterval: number;
let origMaxStall: number;

beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
process.env.ADMIN_DASHBOARD_KEY = 'test-admin-key-that-is-long-enough';
process.env.NODE_ENV = 'test';
initDb(':memory:');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Time-based stream heartbeat tests are at risk of flakiness; consider using fake timers.

These tests cover key behavior but depend on real setTimeout delays and tweaked global config, which can be flaky in CI and slow.

Consider:

  • Using vi.useFakeTimers() / vi.setSystemTime() and advancing timers instead of real waits.
  • Driving heartbeat and stall detection via vi.advanceTimersByTime() so you can assert exact timeout/error points.
  • Keeping at most one end-to-end test with real timing, and moving other timing-sensitive checks to a fully fake-timer setup.

That should keep coverage of the heartbeat/stall logic while making the suite faster and more reliable.

Comment on lines +316 to +325
describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
it('5xx status codes (500-504) are eligible for cooldown registration', () => {
// Simulate the condition: (errStatus >= 500 && errStatus < 600)
const eligibleStatuses = [500, 502, 503, 504];
for (const status of eligibleStatuses) {
const condition = status !== undefined && status >= 500 && status < 600;
expect(condition).toBe(true);
}
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Cooldown registration tests re-encode implementation conditions rather than exercising the actual routing paths.

These tests largely restate the implementation’s boolean conditions instead of exercising the real code that mutates transientModelCooldowns, making them fragile and tightly coupled to current logic.

Prefer tests that:

  • Invoke the actual error-handling path (or a thin wrapper) with simulated statuses/errors.
  • Assert on transientModelCooldowns contents for cases like 5xx, 429, 401, 404, and undefined status.

That way the tests validate observable behavior and remain stable even if the internal conditions or eligible statuses change.

Suggested implementation:

  // ---------- Test Suite 5: Cooldown registration Error Classification ----------
  describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
    beforeEach(() => {
      // Ensure we start from a clean cooldown state for each test
      transientModelCooldowns.clear();
    });

    function invokeCooldownErrorPath(options: {
      status?: number;
      error?: Error;
      modelId?: string;
    }) {
      /**
       * Thin wrapper around the real error-handling / routing code that is
       * responsible for registering cooldowns for transient models.
       *
       * This MUST call into the same path the router uses when a transient
       * model request fails (e.g. something like `handleTransientModelError`),
       * so that these tests exercise observable behavior rather than
       * re-encoding implementation details.
       */
      return handleTransientModelErrorForTest(options);
    }

    it('registers a cooldown for 5xx upstream errors (500-504)', () => {
      const eligibleStatuses = [500, 502, 503, 504];

      for (const status of eligibleStatuses) {
        transientModelCooldowns.clear();

        invokeCooldownErrorPath({
          status,
          error: new Error(`upstream ${status}`),
          modelId: 'test-model',
        });

        // Assert based on observable cooldown state rather than status checks
        expect(transientModelCooldowns.has('test-model')).toBe(true);
      }
    });

    it('does not register a cooldown for 429 rate limit errors', () => {
      invokeCooldownErrorPath({
        status: 429,
        error: new Error('rate limited'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('does not register a cooldown for 401 unauthorized errors', () => {
      invokeCooldownErrorPath({
        status: 401,
        error: new Error('unauthorized'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('does not register a cooldown for 404 not found errors', () => {
      invokeCooldownErrorPath({
        status: 404,
        error: new Error('not found'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('registers a cooldown when there is a connection failure (no status)', () => {
      invokeCooldownErrorPath({
        status: undefined,
        error: new Error('ECONNRESET'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(true);
    });
  1. Implement handleTransientModelErrorForTest in this test file (or import it) so that it calls the same error-handling path the router uses to register cooldowns for transient models. For example, it might delegate to something like handleTransientModelError({ status, error, modelId }) exported from the route module.
  2. Ensure transientModelCooldowns is imported/accessible in this test file and supports .clear() and .has(modelId) (e.g. a Map or similar). If the underlying structure differs (e.g. a Map keyed by provider+model, or a plain object), adjust the assertions to check the appropriate key and API.
  3. Remove or update any remaining tests inside this describe block that still restate implementation conditions (e.g. any leftover it('429 rate limit is NOT eligible... that only checks booleans) so that all tests in this suite go through invokeCooldownErrorPath.
  4. If your production code uses a different identifier than 'test-model' (e.g. includes provider or route info), update the modelId and corresponding has(...) checks to match the real key shape used in transientModelCooldowns.

Comment on lines +1343 to +1375
const keepaliveTimer = setInterval(() => {
if (stalled) {
clearInterval(keepaliveTimer);
return;
}
const elapsed = Date.now() - lastChunkTime;
if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
stalled = true;
cleanup();
if (streamStarted) {
const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } };
try {
if (responseStreamContext) {
writeResponseStreamEvent(res, {
type: 'response.failed',
response: {
id: responseStreamContext.responseId,
status: 'failed',
error: payload.error,
},
});
} else {
res.write(`data: ${JSON.stringify(payload)}\n\n`);
res.write('data: [DONE]\n\n');
}
res.end();
} catch { /* socket gone */ }
} else {
// Pre-stream stall: throw so the outer catch can retry fallback models
throw Object.assign(
new Error(`Stream timed out: no data received from provider ${route.displayName}`),
{ status: 504 }
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Timer throw crashes process 🐞 Bug ☼ Reliability

In server/src/routes/proxy.ts, the pre-stream stall path throws from inside the setInterval
keepalive callback, which is outside the request handler’s try/catch and can surface as an uncaught
exception (crashing the server) instead of returning a 504 or retrying fallback.
Agent Prompt
### Issue description
`handleChatCompletion()` throws an error from inside the `setInterval()` keepalive callback when a stream stalls before the first chunk. That throw is not catchable by the surrounding request/stream try/catch and can crash the Node process.

### Issue Context
The intention (also asserted by tests) is to return HTTP 504 on pre-stream stall (no SSE headers sent yet) or to allow the outer retry loop to fall back.

### Fix Focus Areas
- server/src/routes/proxy.ts[1343-1416]

### Implementation notes
- Do **not** `throw` inside the timer callback.
- Instead, set a `stallError` variable (or resolve/reject a Promise) and stop the generator (`cleanup()`), then after the `for await` loop ends, `throw stallError` from the main async function flow so the existing outer `catch`/retry logic can handle it.
- Ensure the pre-stream stall path results in `status=504` and `error.type='stream_timeout'` as the test expects.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +1629 to +1636
} finally {
// Ensure the session is deregistered immediately on end/abort/fail
if (sessionKey) {
for (const active of activeRequests) {
if (active.sessionKey === sessionKey && active.platform === route.platform && active.modelId === route.modelId) {
activeRequests.delete(active);
break;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remediation recommended

2. Activerequests stale entries 🐞 Bug ☼ Reliability

activeRequests stores per-request objects, but cleanup deletes only the first matching entry and
breaks; concurrent requests for the same session/platform/model can leave stale entries that keep
provider-ban platforms excluded until the 10-minute TTL cleanup runs.
Agent Prompt
### Issue description
`activeRequests` is a `Set` of newly-created objects. On cleanup, the code searches the set and deletes only the first matching entry, which can leak additional entries if multiple concurrent requests exist with the same `sessionKey/platform/modelId`.

### Issue Context
The code comment says the Set is used to “allow concurrent requests from the same session”. If concurrency is allowed, cleanup must remove the specific entry added by that request (or decrement a reference count).

### Fix Focus Areas
- server/src/routes/proxy.ts[1320-1328]
- server/src/routes/proxy.ts[1629-1638]
- server/src/routes/proxy.ts[1643-1651]

### Implementation notes
Prefer one of:
1) Store the created object in a local `const active = { ... }` and remove it via `activeRequests.delete(active)` in `finally` (no iteration, no ambiguity).
2) Replace the Set with a `Map<string, number>` keyed by `sessionKey|platform|modelId` and increment/decrement counts.
3) If keeping the current structure, delete *all* matches (remove the `break`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@vi70x3 vi70x3 closed this Jun 5, 2026
vi70x4 pushed a commit that referenced this pull request Jun 5, 2026
…ts concurrency, wrapped-error consistency, heartbeat fake timers, cooldown test accuracy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant