Skip to content

Feat/realtime sticky#16

Closed
vi70x3 wants to merge 11 commits into
mainfrom
feat/realtime-sticky
Closed

Feat/realtime sticky#16
vi70x3 wants to merge 11 commits into
mainfrom
feat/realtime-sticky

Conversation

@vi70x3

@vi70x3 vi70x3 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary by Sourcery

Introduce provider-agnostic thread protection, transient model cooldowns, and SSE heartbeat/stall safeguards while refining routing logic and surfacing model pools in the fallback UI.

New Features:

  • Add generalized thread protection service and configuration to control provider-ban vs model-skip behavior per platform.
  • Introduce transient model cooldowns shared across requests to temporarily skip unstable models after 5xx or connection failures.
  • Add SSE stream heartbeat and stall detection to keep connections alive and abort stalled streams with structured errors.
  • Expose model pool information (fast, balanced, smart) in the fallback API and group models by pool in the admin UI.

Bug Fixes:

  • Ensure wrapped error payloads returned with HTTP 200 from providers are detected and surfaced as ProviderApiError instead of causing silent failures.
  • Fix router analytics query to use recency-weighted stats and correct raw vs weighted totals, avoiding miscomputed routing scores.
  • Correct balanced-mode behavior so sticky sessions and provider bans only apply to smart mode, preventing unintended pinning in balanced routing.
  • Prevent key selection from using disabled or invalid API keys and ensure capacity checks respect cooldowns and rate limits in routing.
  • Avoid resource leaks by cleaning up stream timers and deregistering active requests on all streaming termination paths.

Enhancements:

  • Refine routing heuristics to exclude LongCat and Owl Alpha from balanced auto-routing while preferring them in smart mode when capacity allows.
  • Unify error handling so truncation and mid-stream 5xx/retryable errors apply model-level or provider-level bans based on thread protection rules instead of hardcoded platform checks.
  • Improve analytics by weighting recent requests more heavily in Thompson sampling and exposing raw counts alongside weighted stats.
  • Tighten provider selection by using fallback_config-linked models when skipping providers, aligning skip lists with actual fallback chains.
  • Add new tests and design specs covering thread protection, transient cooldowns, heartbeat/stall behavior, and routing edge cases to guard against regressions.

Tests:

  • Add comprehensive tests for provider session bans, transient model cooldown behavior, and balanced vs smart sticky session semantics.
  • Add streaming tests to verify SSE keep-alive comments, stall timeouts, pre-stream 504 handling, and cleanup on client disconnect.
  • Extend router tests to cover recency-biased analytics, key filtering, and correct routing decisions when multiple platforms and keys are present.
  • Add fallback API tests to validate model pool enums and ensure pool metadata is returned correctly.

Summary by CodeRabbit

  • New Features

    • Added UI organization for models grouped by pool type (Fast, Balanced, Smart) with expandable/collapsible sections.
    • Implemented SSE stream keepalive heartbeats and stall detection to prevent hanging connections.
    • Added automatic model cooldown handling for transient failures.
  • Bug Fixes

    • Fixed handling of wrapped error payloads from providers returned with HTTP 200 responses.
    • Improved stream timeout detection and recovery.
  • Documentation

    • Added specifications for model routing modes, thread protection rules, and cooldown behavior.

vi70x3 added 11 commits June 2, 2026 15:12
- Change activeRequests from Map to Set to allow concurrent requests from same session
- Add stale active request cleanup with 10-minute TTL
- Cache owl-alpha model ID to avoid repeated DB lookups
- Fix active request iteration to use Set-compatible syntax
- Remove package-lock.json (npm lockfile)
- Add packageManager field to package.json
- Create .npmrc with pnpm configuration
BUG-05: Abort upstream provider stream on stall detection by breaking
the for-await loop and calling gen.return() when the keepalive timer
detects MAX_STREAM_STALL_MS has elapsed without data.

BUG-06: Fix cooldown guard to use the actual routable fallback chain
(fallback_config JOIN models) instead of all enabled models, ensuring
transient cooldowns only skip models that would actually be routed to.

BUG-10: Remove double semicolon in proxy.ts.

Also adds SSE keep-alive comments during idle periods, transient model
cooldown injection before retry loops, and LongCat sticky session
cooldown support in balanced routing mode.
…, TTL refresh, collapsible pools, doc paths, cleanup
@sourcery-ai

sourcery-ai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Reviewer's Guide

Implements generalized provider/thread protection and streaming robustness: adds transient model cooldowns and active-request safeguards, introduces SSE stream heartbeat and stall detection, refines routing analytics with recency-weighted stats and balanced/smart pool separation, centralizes wrapped-error handling in providers, and updates UI and tests to reflect new routing behavior and pools.

Sequence diagram for SSE streaming with heartbeat, stall detection, and active-request tracking

sequenceDiagram
  actor Client
  participant Proxy as proxy.handleChatCompletion
  participant Provider as provider.streamChatCompletion

  Client->>Proxy: handleChatCompletion
  Proxy->>Proxy: routeRequest
  Proxy->>Proxy: activeRequests.add
  Proxy->>Provider: streamChatCompletion

  loop [SSE chunks]
    Provider-->>Proxy: streamChatCompletion (chunk)
    Proxy->>Proxy: writeResponseStreamStart
    Proxy-->>Client: writeResponseStreamEvent
  end

  opt [keepalive timer]
    Proxy-->>Client: res.write(: keep-alive)
  end

  alt [stream stalled before first chunk]
    Proxy->>Proxy: streamAborted = true
    Proxy-->>Client: res.status(504).json
    Proxy->>Proxy: logRequest
  else [mid-stream 5xx or truncation]
    Proxy->>Proxy: getErrorStatus
    Proxy->>Proxy: isBanEligibleStatus
    Proxy->>Proxy: skipModels.add
    Proxy->>Proxy: transientModelCooldowns.set
  end

  Proxy->>Proxy: activeRequests.delete
  Proxy-->>Client: res.end
Loading

Flow diagram for routing modes, pools, and key capacity

flowchart LR
  A[routeRequest] --> B{routingMode}

  B -- balanced --> C[filter chain\nEXCLUDED_FROM_BALANCED]
  C --> D[filteredChain]

  B -- smart --> E[compute effectiveScore]
  E --> F[sorted]
  F --> G{hasValidKeys\nlongcat}
  G -- yes --> H[move longcat entries to front]
  H --> I
  G -- no --> I[keep order]
  I --> J{hasValidKeys\nowl-alpha}
  J -- yes --> K[move owl-alpha to front]
  J -- no --> L[use existing order]

  subgraph Analytics
    M[refreshStatsCache]
    M --> N[ModelStats\nsuccesses,total,rawTotal]
  end

  subgraph Pools
    O[getModelPool]
    O --> P[ModelPool.Balanced]
    O --> Q[ModelPool.Smart]
  end
Loading

File-Level Changes

Change Details Files
Add transient model cooldowns and active-request safeguards to avoid reusing recently failing or overloaded models across concurrent sessions.
  • Introduce a module-level transientModelCooldowns map and TRANSIENT_COOLDOWN_MS to track short-lived model bans after 5xx/connection errors.
  • Inject transient cooldowns into per-request skipModels and prune expired entries before routing, including sticky override when preferredModel is cooled down.
  • Register cooldowns on 5xx/connection failures in both streaming and non-streaming paths, and log affected models for observability.
  • Add an activeRequests set and safeguards to avoid routing new sessions to provider-ban platforms currently in use by another session.
server/src/routes/proxy.ts
server/src/__tests__/routes/transient-cooldown.test.ts
Add SSE stream heartbeat and stall protection to streaming responses so clients and upstreams don’t hang indefinitely.
  • Introduce streamKeepaliveConfig with configurable KEEPALIVE_INTERVAL_MS and MAX_STREAM_STALL_MS and use it to send periodic SSE keep-alive comments.
  • Track active streaming sessions with last-chunk timestamps and a heartbeat timer that detects stalls and emits structured stream_timeout errors.
  • Handle pre-stream stalls by returning a 504 error to trigger retry/fallback, and mid-stream stalls by gracefully ending the SSE with an error event and [DONE].
  • Ensure activeRequests and heartbeat timers are cleaned up on completion, errors, or client disconnect, with dedicated tests for heartbeat, stalls, and disconnects.
server/src/routes/proxy.ts
server/src/__tests__/routes/stream-heartbeat-stall.test.ts
Refine routing analytics and model pools to bias toward recent performance and distinct balanced/smart pools, and wire this into routing and the admin UI.
  • Change stats aggregation in router to use recency-weighted totals/successes while tracking raw totals separately, and refresh cache before analytics exports.
  • Exclude LongCat and Owl Alpha from balanced routing except when explicitly preferred, and add smart-mode preferences for them when keys have capacity via a hasValidKeys helper.
  • Introduce ModelPool enum and assign pools (Fast/Balanced/Smart) in fallback API; group fallback entries by pool in the client with PoolBadge/PoolSection components.
  • Adjust tests to validate pool values, routing behavior, and that disabled/invalid keys are skipped correctly with decrypted apiKey values.
server/src/services/router.ts
server/src/routes/fallback.ts
server/src/__tests__/services/router.test.ts
server/src/__tests__/routes/fallback.test.ts
client/src/pages/FallbackPage.tsx
client/src/components/pool-badge.tsx
client/src/components/pool-section.tsx
shared/types.ts
Centralize wrapped-error detection for providers that return error payloads with HTTP 200, ensuring errors propagate through existing retry logic.
  • Promote BaseProvider.extractErrorMessage to protected, and add isWrappedError and throwWrappedError helpers to detect root-level error fields and throw ProviderApiError with sane status codes.
  • Invoke wrapped-error detection in OpenAI-compatible, Cloudflare, Cohere, and Google providers for both chatCompletion and streamChatCompletion, guarding streamed SSE parsing to throw before yielding error chunks.
  • Add design documentation for wrapped-error interception to specify architecture, edge cases, and files to modify.
  • Ensure that first streamed chunk detection treats wrapped errors as failures so they do not emit partial responses.
server/src/providers/base.ts
server/src/providers/openai-compat.ts
server/src/providers/cloudflare.ts
server/src/providers/cohere.ts
server/src/providers/google.ts
.roo/specs/wrapped-error-interception/design.md
.roo/specs/wrapped-error-interception/requirements.md
.roo/specs/wrapped-error-interception/tasks.md
Disable sticky sessions for balanced mode and generalize thread protection configuration for provider-ban vs model-skip behavior.
  • Change getSessionKey behavior so balanced routing mode uses real keys and relies on tests to ensure sticky operations behave correctly per mode.
  • Add generalized thread protection config via getProtectionLevel and integrate it into proxy error-handling paths for retryable and mid-stream errors.
  • Adjust provider-session-ban tests to use smart routing mode by default and add a new suite that verifies balanced mode behavior for sticky-related operations.
  • Document generalized thread protection requirements and design, including tasks and code review fixes specs.
server/src/routes/proxy.ts
server/src/services/threadProtection.ts
server/src/__tests__/routes/provider-session-ban.test.ts
.roo/specs/generalized-thread-protection/design.md
.roo/specs/generalized-thread-protection/requirements.md
.roo/specs/generalized-thread-protection/tasks.md
.roo/specs/pr13-code-review-fixes/design.md
.roo/specs/pr13-code-review-fixes/requirements.md
.roo/specs/pr13-code-review-fixes/tasks.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive set of features for LLM routing and error handling: generalized thread protection rules, wrapped error detection across providers, model-level routing with balanced/smart modes, transient failure cooldowns, sticky session behavior separation, SSE stream keepalive and stall protection, recency-biased analytics, and frontend pool display grouping. The changes span specs, backend services, providers, routes, tests, and UI components.

Changes

Routing, Error Handling, Session Management, and Streaming

Layer / File(s) Summary
Specifications, Configuration, and Package Setup
.npmrc, package.json, .roo/specs/*
Comprehensive markdown specs document feature requirements, design, and tasks for wrapped errors, thread protection generalization, Owl Alpha/LongCat model routing, transient cooldowns, sticky session disabling, SSE streaming heartbeat/stall protection, recency-biased Thompson sampling, and PR #13 code review fixes. pnpm 11.1.3 is pinned via packageManager.
Shared Type Definitions and ModelPool
shared/types.ts
ModelPool constant and derived type union define three pool options (fast, balanced, smart) used throughout routing decisions and UI display.
Provider Wrapped Error Detection
server/src/providers/base.ts, cloudflare.ts, cohere.ts, google.ts, openai-compat.ts
BaseProvider adds isWrappedError() and throwWrappedError() helpers to detect and throw on HTTP 200 responses containing root-level error payloads; extractErrorMessage() visibility changed to protected. Four providers integrate wrapped-error checks after JSON parsing in both non-streaming and streaming paths, throwing ProviderApiError before candidate/chunk normalization.
Generalized Thread Protection Service
server/src/services/threadProtection.ts
New service exports configurable per-platform protection levels (provider-ban, model-skip, off), parses THREAD_PROTECTION_PLATFORMS env var with backward-compatible LongCat defaults, and provides evaluateThreadProtection(ctx) decision engine that maps error kinds and mid-stream state into action booleans (ban provider, skip model, clear sticky pin) with reason strings.
Router Stats Caching and Model-Level Routing
server/src/services/router.ts
ModelStats gains rawTotal for unweighted counts. refreshStatsCache() uses SQL CTE for recency-weighted time-decay aggregation. routeRequest() filters balanced-mode exclusions (LongCat, Owl Alpha) except when pinned via preferredModelDbId. New hasValidKeys() helper validates keys for smart-mode preference ordering. getAnalyticsScores() refreshes cache and reports raw totals for dashboard display.
Fallback Route with Pool Categorization
server/src/routes/fallback.ts
New getModelPool() helper assigns LongCat and Owl Alpha to Smart pool, all others to Balanced. Fallback API response objects include computed pool field.
Proxy Setup: Session Tracking and Transient Cooldowns
server/src/routes/proxy.ts (constants and setup)
Exports streamKeepaliveConfig (heartbeat/stall durations), transientModelCooldowns map (model ID → expiry timestamp), and activeRequests session tracking. Imports getProtectionLevel for thread protection decisions.
Proxy Sticky Session and Transient Cooldown Logic
server/src/routes/proxy.ts (sticky/cooldown handling)
getStickyModel() uses strict undefined check on session key for balanced-mode separation. Provider-ban sticky cooldown logic replaces LongCat-only checks. Transient cooldowns are registered on 5xx/connection failures and injected into skipModels during pre-routing, with sticky preference cleared when on cooldown.
Proxy Streaming: Keepalive and Stall Protection
server/src/routes/proxy.ts (streaming refactor)
Streaming loop registers session as active, implements interval-driven keep-alive comments with stall detection, handles pre-headers 504 timeout vs mid-stream error frame + [DONE], and clears heartbeat interval on client disconnect or stall.
Proxy Model-Level Error Handling and Cleanup
server/src/routes/proxy.ts (error handling)
Truncation and mid-stream errors apply model-level skipping via skipModels.add(modelDbId) instead of platform banning. Protection-level-driven logic chooses provider ban vs model skip. Sticky preference is cleared when pinned to failing platform. finally block deregisters active session entries for cleanup.
Frontend: Pool Badge and Section Components
client/src/components/pool-badge.tsx, pool-section.tsx
New PoolBadge component renders styled badges for pool type. PoolSection wraps content with collapsible header, pool badge, and title.
Fallback Page Pool Grouping and Display
client/src/pages/FallbackPage.tsx
Imports PoolSection and PoolType. Extends FallbackEntry with pool field. Groups models by pool type (poolOrder, poolTitles) and renders separate collapsible sections per pool, preserving existing sort/toggle within each group.
Fallback Route Tests: Pool Field Validation
server/src/__tests__/routes/fallback.test.ts
Adds ModelPool import and validates fallback responses include speedRank and pool fields. New test asserts all entries have valid pool enum values.
Session Ban Tests: Smart Mode and Balanced Separation
server/src/__tests__/routes/provider-session-ban.test.ts
Updates ban tests to use smart mode for session keys throughout unit and integration suites. Adds comprehensive balanced-mode coverage verifying sticky operations skip or use different session key hashing, including getSessionKey real-key assertion, getStickyModel undefined return, and balanced-mode ban/pin entry creation.
Streaming Heartbeat and Stall Protection Tests
server/src/__tests__/routes/stream-heartbeat-stall.test.ts
New test module covering SSE keep-alive comment emission during idle gaps, mid-stream stall with partial content delivery, pre-stream 504 timeout, client disconnect cleanup, and fast streaming with heartbeat enabled.
Transient Cooldown Tests: Map Operations and Integration
server/src/__tests__/routes/transient-cooldown.test.ts
New test suite verifies map basics, pruning/injection, auto-recovery on expiry, sticky overrides, status code classification (5xx eligible, 4xx/429 ineligible), and integration with provider session bans via skipModels merging.
Proxy Tools and Router Tests: Casing and Setup Updates
server/src/__tests__/routes/proxy-tools.test.ts, server/src/__tests__/services/router.test.ts
Proxy-tools test clears transientModelCooldowns and updates cooldown log assertion casing ([Sticky] longcat cooldown active). Router test imports refreshStatsCache/getAnalyticsScores and simplifies setup; multi-platform key test validates either platform is routed, and skip-disabled-keys test asserts returned API key.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • vi70x3/freellmapi#9: Implements Owl Alpha + LongCat model-level routing with balanced-mode exclusions and smart-mode preferences, directly overlapping the main PR's model routing implementation.
  • vi70x3/freellmapi#8: Updates sticky-session behavior in proxy.ts for LongCat cooldown handling, which intersects the main PR's sticky-session disabling and transient cooldown logic.
  • vi70x3/freellmapi#2: Extends sticky-session machinery to store keyId via setStickyModel, which relates to the main PR's sticky-session guard changes and how balanced mode short-circuits sticky operations.

Poem

🐰 A rabbit's ode to routing grace:

Through wrapped errors caught mid-race,
Sticky threads now know their place,
Owl Alpha soars, LongCat dreams—
Balanced mode splits the seams.
Heartbeats pulse where stalls once dwelled,
And pools of wisdom, now compelled! 🌟

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/realtime-sticky
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feat/realtime-sticky

@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Realtime sticky sessions with heartbeat protection, transient cooldowns, and generalized thread protection

✨ Enhancement 🧪 Tests 📝 Documentation

Grey Divider

Walkthroughs

Description
• **Realtime sticky sessions with heartbeat protection**: Implemented SSE stream keep-alive
  heartbeat (15s interval) and stall detection (45s timeout) to prevent hanging streams and
  intermediate proxy timeouts, with automatic recovery and client disconnect cleanup
• **Transient model cooldowns**: Added shared in-memory circuit breaker for models experiencing 5xx
  or connection failures with 15-second global cooldown window visible across all concurrent requests,
  overriding sticky session preferences when active
• **Configurable thread protection levels**: Generalized LongCat-specific provider banning logic to
  support environment-variable-configurable protection levels (provider-ban, model-skip, off)
  for different platforms, eliminating hardcoded platform checks
• **Wrapped error payload detection**: Implemented detection and handling of error payloads returned
  with HTTP 200 status codes across all provider types (OpenAI, Cohere, Cloudflare, Google) with
  proper ProviderApiError propagation in both streaming and non-streaming paths
• **Recency-weighted analytics**: Added 7-day linear time-decay weighting to Thompson Sampling
  router to prioritize recent model performance data, with NaN safety guards and backward-compatible
  dashboard reporting
• **Model pool classification and routing**: Introduced ModelPool enum (Fast, Balanced, Smart)
  with balanced mode exclusions for LongCat and Owl Alpha models, only reachable via explicit request
  or smart mode, plus Owl Alpha smart-mode preference logic
• **Disabled sticky sessions on balanced endpoint**: Single-point guard via getSessionKey()
  returning empty string cascades through all sticky functions as no-ops for balanced mode while
  keeping smart mode active
• **Enhanced error handling**: Improved truncation detection across all providers, fixed session key
  comparison from falsy check to explicit undefined check, and added active request tracking to
  prevent concurrent session overload
• **Comprehensive test coverage**: Added test suites for transient cooldowns, stream heartbeat/stall
  protection, balanced mode sticky disabling, and model pool validation; updated existing tests for
  new routing modes
• **UI enhancements**: Added pool-based model grouping to fallback page with new PoolBadge and
  PoolSection components for visual pool organization
• **Configuration and documentation**: Added pnpm package manager specification, comprehensive
  design documents for all major features, and implementation task tracking
Diagram
flowchart LR
  A["Request arrives"] --> B["Check transient cooldowns"]
  B --> C["Get session key<br/>balanced vs smart"]
  C --> D{Routing mode}
  D -->|Smart| E["Apply sticky preference<br/>+ Owl Alpha logic"]
  D -->|Balanced| F["Free routing<br/>no sticky"]
  E --> G["Route to model"]
  F --> G
  G --> H["Stream response"]
  H --> I["Emit heartbeat<br/>every 15s"]
  I --> J{Stall detected<br/>45s timeout?}
  J -->|Yes| K["Send error<br/>cleanup stream"]
  J -->|No| L["Continue streaming"]
  K --> M["Clear sticky if pinned"]
  L --> N["On 5xx error"]
  N --> O["Register 15s<br/>transient cooldown"]
  O --> P["Evaluate thread<br/>protection level"]
  P --> Q{Protection level}
  Q -->|provider-ban| R["Ban provider"]
  Q -->|model-skip| S["Skip model"]
  Q -->|off| T["No action"]

Loading

Grey Divider

File Changes

1. server/src/routes/proxy.ts ✨ Enhancement +280/-135

Realtime sticky sessions with heartbeat protection and transient cooldowns

• Generalized LongCat-specific provider banning logic to support configurable protection levels via
 getProtectionLevel() from new threadProtection.ts service
• Added stream keepalive heartbeat and stall detection to prevent hanging streams, with configurable
 intervals and automatic timeout recovery
• Implemented transient model cooldowns (15s) for 5xx errors to temporarily skip problematic models
 across all sessions
• Added active request tracking to prevent concurrent sessions from overwhelming provider-ban
 platforms
• Enhanced error handling to clear sticky preferences when pinned models fail, and improved
 truncation detection across all providers
• Fixed session key comparison from falsy check to explicit undefined check for proper
 empty-string handling

server/src/routes/proxy.ts


2. server/src/services/threadProtection.ts ✨ Enhancement +119/-0

Configurable thread protection levels for platform-specific error handling

• New service module defining configurable protection levels (provider-ban, model-skip, off)
 for different platforms
• Parses THREAD_PROTECTION_PLATFORMS environment variable to customize per-platform error handling
 behavior
• Provides getProtectionLevel() and evaluateThreadProtection() functions to determine
 appropriate session ban vs model-skip actions
• Defaults to LongCat as provider-ban for backward compatibility, all others as model-skip

server/src/services/threadProtection.ts


3. server/src/services/router.ts ✨ Enhancement +90/-32

Recency-weighted analytics and balanced mode model exclusions

• Added recency-weighted analytics scoring using 7-day decay function to prioritize recent model
 performance
• Implemented balanced mode exclusions for LongCat and Owl Alpha models, only reachable via explicit
 request or smart mode
• Added Owl Alpha smart-mode preference logic alongside existing LongCat preference
• Extracted hasValidKeys() helper to check key capacity for rate-limit and token validation
• Fixed getAnalyticsScores() to use unweighted rawTotal for reporting while maintaining weighted
 calculations internally

server/src/services/router.ts


View more (43)
4. server/src/providers/base.ts ✨ Enhancement +30/-1
 Error detection and handling for wrapped error payloads

server/src/providers/base.ts


5. server/src/providers/openai-compat.ts ✨ Enhancement +14/-1

Wrapped error detection for OpenAI-compatible providers

• Added wrapped error detection in non-streaming response path after JSON parsing
• Added wrapped error detection in streaming path with hasYielded flag to catch errors in first
 chunk
• Improved malformed chunk handling with explicit continue statement

server/src/providers/openai-compat.ts


6. server/src/providers/cohere.ts ✨ Enhancement +12/-1

Wrapped error detection for Cohere provider

• Added wrapped error detection in non-streaming response path
• Added wrapped error detection in streaming path with proper error handling before yielding
• Improved malformed chunk handling consistency

server/src/providers/cohere.ts


7. server/src/providers/cloudflare.ts ✨ Enhancement +12/-1

Wrapped error detection for Cloudflare provider

• Added wrapped error detection in non-streaming response path
• Added wrapped error detection in streaming path with proper error handling
• Improved malformed chunk handling consistency

server/src/providers/cloudflare.ts


8. server/src/providers/google.ts ✨ Enhancement +10/-0

Wrapped error detection for Google provider

• Added wrapped error detection in non-streaming response path after JSON parsing
• Added wrapped error detection in streaming path to catch errors in chunk responses

server/src/providers/google.ts


9. server/src/routes/fallback.ts ✨ Enhancement +10/-0

Model pool classification for balanced/smart routing

• Added getModelPool() function to classify models into Fast/Balanced/Smart pools based on
 platform and model ID
• LongCat and Owl Alpha models assigned to Smart pool, all others to Balanced pool
• Updated fallback API response to include pool field for each model entry

server/src/routes/fallback.ts


10. shared/types.ts ✨ Enhancement +8/-0

Model pool type definitions

• Added ModelPool enum with Fast, Balanced, and Smart values
• Exported ModelPool type for use in API responses and routing logic

shared/types.ts


11. server/src/__tests__/routes/provider-session-ban.test.ts 🧪 Tests +92/-48

Tests for balanced mode sticky session disabling

• Updated all test cases to use smart routing mode instead of balanced for sticky session
 testing
• Added new test suite for balanced mode verifying that sticky operations are disabled (no entries
 created, no bans tracked)
• Updated truncation detection test to use cut off instead of conflict in response pattern

server/src/tests/routes/provider-session-ban.test.ts


12. server/src/__tests__/routes/transient-cooldown.test.ts 🧪 Tests +415/-0

Transient model cooldown functionality tests

• New comprehensive test suite for transient model cooldown functionality
• Tests cooldown map operations, expiry pruning, sticky preference override, and integration with
 session bans
• Validates that only 5xx and connection failures trigger cooldowns, not 4xx errors
• Tests auto-recovery after cooldown expiration

server/src/tests/routes/transient-cooldown.test.ts


13. server/src/__tests__/routes/stream-heartbeat-stall.test.ts 🧪 Tests +329/-0

Stream heartbeat and stall protection tests

• New test suite for SSE stream heartbeat and stall protection
• Tests keep-alive comment emission during idle periods, stall detection with stream_timeout error,
 and pre-stream stall 504 response
• Tests client disconnect cleanup and normal streaming with heartbeat enabled
• Validates configurable keepalive intervals and stall thresholds

server/src/tests/routes/stream-heartbeat-stall.test.ts


14. server/src/__tests__/routes/proxy-tools.test.ts 🧪 Tests +4/-3

Transient cooldown map cleanup in proxy tests

• Updated LongCat sticky session cooldown test to clear transient cooldown map in beforeEach
• Updated expected log message from LongCat cooldown active to longcat cooldown active
 (lowercase platform name)

server/src/tests/routes/proxy-tools.test.ts


15. server/src/__tests__/routes/fallback.test.ts 🧪 Tests +11/-0

Model pool property validation tests

• Added test to verify pool property exists in fallback API response
• Added test to validate that all returned pool values are valid ModelPool enum values

server/src/tests/routes/fallback.test.ts


16. server/src/__tests__/services/router.test.ts 🧪 Tests +2/-27

Router test cleanup and imports

• Updated imports to include refreshStatsCache and getAnalyticsScores
• Removed unnecessary comment about fallback order reset
• Updated test to verify apiKey field in routing result

server/src/tests/services/router.test.ts


17. .roo/specs/sse-stream-heartbeat-stall-protection/design.md 📝 Documentation +330/-0

Design documentation for stream heartbeat stall protection

• New design document describing SSE stream heartbeat and stall protection architecture
• Details stream lifecycle with heartbeat intervals and stall detection thresholds
• Explains implementation of cleanup routines, pre-stream vs mid-stream stall handling, and
 interaction with existing error paths
• Includes edge cases, file modifications, and mermaid flowchart diagrams

.roo/specs/sse-stream-heartbeat-stall-protection/design.md


18. .roo/specs/disable-sticky-on-auto/design.md 📝 Documentation +97/-0

Design documentation for balanced mode sticky session disabling

• New design document explaining single-point guard approach to disable sticky sessions in balanced
 mode
• Details how getSessionKey() returning empty string cascades through all sticky functions as
 no-ops
• Includes flow diagram, edge cases, and risk analysis for the implementation

.roo/specs/disable-sticky-on-auto/design.md


19. .npmrc Configuration +4/-0

pnpm package manager configuration

• New pnpm configuration file with shamefully-hoist, strict-peer-dependencies, and
 auto-install-peers settings

.npmrc


20. .roo/specs/wrapped-error-interception/design.md 📝 Documentation +337/-0

Design for HTTP 200 wrapped error payload detection

• Comprehensive design document for detecting and handling wrapped error payloads returned with HTTP
 200 status codes
• Introduces isWrappedError() predicate and throwWrappedError() helper methods on BaseProvider
 class
• Details implementation across five provider types (OpenAI, Cohere, Cloudflare, Google) with
 specific code patterns for both streaming and non-streaming paths
• Includes error classification matrix, edge case analysis, and wrapped error format examples

.roo/specs/wrapped-error-interception/design.md


21. .roo/specs/pr13-code-review-fixes/requirements.md 🐞 Bug fix +268/-0

PR #13 code review bug verification and fix plan

• Documents 10 verified bugs from PR #13 code review across three priority tiers (P0, P1, P2)
• Critical bugs include SQL parenthesis mismatch, wrapped error swallowing in streaming, and NaN
 validation issues
• High-priority issues cover hardcoded platform references, stall detection, and cooldown guard
 logic
• Provides detailed impact analysis and acceptance criteria for each bug

.roo/specs/pr13-code-review-fixes/requirements.md


22. .roo/specs/transient-model-cooldown/design.md ✨ Enhancement +197/-0

Design for shared transient model failure cooldowns

• Introduces shared in-memory circuit breaker for transient model failures across concurrent
 requests
• Defines transientModelCooldowns Map structure with 15-second cooldown window for 5xx and
 connection errors
• Details integration points: pre-routing injection, sticky session override, cooldown registration,
 and mid-stream error handling
• Includes error classification matrix distinguishing between model-level and key-level cooldowns

.roo/specs/transient-model-cooldown/design.md


23. .roo/specs/recency-biased-thompson-sampling/design.md ✨ Enhancement +238/-0

Design for recency-weighted Thompson sampling analytics

• Replaces flat request counting with linear time-decay weighted aggregation in Thompson sampling
 router
• Implements SQL CTE with MIN(1.0, MAX(0.0, 1.0 - age_in_days / 7.0)) recency weight function
• Extends ModelStats interface to include both weighted and raw request counts for dashboard
 transparency
• Adds Math.max(0.1, ...) safety guards to prevent NaN in beta parameter calculations

.roo/specs/recency-biased-thompson-sampling/design.md


24. .roo/specs/pr13-code-review-fixes/design.md 📝 Documentation +210/-0

Implementation design for PR #13 code review fixes

• Provides implementation design for all 10 bugs identified in PR #13 requirements
• Details fixes for SQL parenthesis, wrapped error propagation, NaN validation, and hardcoded
 platform references
• Outlines stall detection abort mechanism, cooldown guard model set correction, and cleanup tasks
• Includes risk assessment and data flow diagrams for each fix category

.roo/specs/pr13-code-review-fixes/design.md


25. .roo/specs/sse-stream-heartbeat-stall-protection/requirements.md ✨ Enhancement +132/-0

Requirements for SSE heartbeat and stall detection

• Specifies SSE keep-alive heartbeat mechanism (15-second interval) to prevent intermediate proxy
 idle timeouts
• Defines stall detection timeout (45 seconds) with graceful stream termination and error signaling
• Details client-disconnect cleanup, heartbeat write failure handling, and pre-stream heartbeat
 behavior
• Includes constants configuration and non-functional requirements for backward compatibility

.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md


26. .roo/specs/owl-alpha-longcat-model-routing/design.md ✨ Enhancement +184/-0

Design for Owl Alpha and LongCat model-level routing

• Defines model-level routing strategy for Owl Alpha and LongCat with balanced exclusion and smart
 preference
• Specifies sticky session cooldown override and error handling with model-level (not
 provider-level) banning
• Includes data flow diagrams for smart preference and sticky cooldown mechanisms
• Documents key design decisions on model-level vs provider-level banning and reusable key
 validation helpers

.roo/specs/owl-alpha-longcat-model-routing/design.md


27. .roo/specs/generalized-thread-protection/requirements.md ✨ Enhancement +109/-0

Requirements for generalized thread protection scanner

• Addresses hardcoded platform-specific logic scattered across proxy.ts by introducing generalized
 thread protection rules engine
• Defines user stories for environment-variable configuration, dynamic platform addition, and
 uniform error handling
• Specifies acceptance criteria eliminating all hardcoded platform checks and centralizing decisions
 through evaluateThreadProtection()
• Includes technical requirements for rules engine API, configuration format, and migration plan

.roo/specs/generalized-thread-protection/requirements.md


28. .roo/specs/owl-alpha-longcat-model-routing/tasks.md 📝 Documentation +116/-0

Implementation tasks for Owl Alpha and LongCat routing

• Breaks down implementation into three phases: router changes, proxy changes, and testing
• Phase 1 adds balanced mode exclusion constants and Owl Alpha smart preference logic
• Phase 2 implements sticky cooldown checks and model-level banning for both LongCat and Owl Alpha
 across multiple error scenarios
• Phase 3 defines test cases for balanced exclusion, smart preference, sticky cooldown, and
 model-level banning

.roo/specs/owl-alpha-longcat-model-routing/tasks.md


29. client/src/pages/FallbackPage.tsx ✨ Enhancement +49/-30

Add pool-based model grouping to fallback page UI

• Adds PoolSection component import and PoolType type for organizing models by routing pool
• Extends FallbackEntry interface with pool field to track model pool assignment
• Refactors table rendering to group models by pool (fast, balanced, smart) with descriptive section
 titles
• Maintains existing sort and filter functionality while adding visual pool-based organization

client/src/pages/FallbackPage.tsx


30. .roo/specs/owl-alpha-longcat-model-routing/requirements.md ✨ Enhancement +126/-0

Requirements for Owl Alpha and LongCat model routing

• Specifies requirements for treating Owl Alpha identically to LongCat with model-level (not
 provider-level) banning
• Defines exclusion from balanced auto routing and preference in smart auto routing when valid keys
 exist
• Details sticky session cooldown protection and model-level banning for 5xx, truncation, and
 retryable errors
• Includes acceptance criteria for balanced exclusion, smart preference, cooldown, and valid key
 checking

.roo/specs/owl-alpha-longcat-model-routing/requirements.md


31. .roo/specs/generalized-thread-protection/design.md ✨ Enhancement +152/-0

Design for generalized thread protection scanner

• Introduces dynamic thread protection scanner module replacing hardcoded longcat platform checks
• Defines ThreadProtectionAction interface with banProvider, skipModel, and
 clearStickyIfPinned flags
• Specifies decision matrix mapping protection levels (provider-ban, model-skip, off) to error
 contexts (5xx, truncation, retryable)
• Details integration points in proxy.ts for 6 hardcoded blocks and sticky cooldown generalization

.roo/specs/generalized-thread-protection/design.md


32. .roo/specs/wrapped-error-interception/tasks.md 📝 Documentation +66/-0

Implementation tasks for wrapped error interception

• Lists 13 implementation steps for wrapped error detection across all provider types
• Steps 1-3 add core methods to BaseProvider and change extractErrorMessage() visibility
• Steps 4-11 add wrapped-error checks in provider implementations (OpenAI, Cohere, Cloudflare,
 Google)
• Steps 12-13 include TypeScript compilation and test verification

.roo/specs/wrapped-error-interception/tasks.md


33. .roo/specs/generalized-thread-protection/tasks.md 📝 Documentation +12/-0

Implementation tasks for generalized thread protection

• Defines 8 implementation tasks for generalizing thread protection logic
• Task 1-3 involve renaming cooldown constant and removing hardcoded LongCat/Owl Alpha blocks
• Task 4 inserts generalized scanner with activeCooldownModels collection and exhaustion
 protection
• Tasks 5-8 cover execution order verification, test creation, regression testing, and smoke testing

.roo/specs/generalized-thread-protection/tasks.md


34. package.json ⚙️ Configuration changes +1/-0

Specify pnpm package manager version

• Adds packageManager field specifying pnpm@11.1.3 as the required package manager
• Ensures consistent dependency management across development environments

package.json


35. .roo/specs/wrapped-error-interception/requirements.md 📝 Documentation +53/-0

Specification for wrapped error payload detection on HTTP 200

• Introduces specification for detecting and handling error payloads wrapped in HTTP 200 responses
 from upstream LLM providers
• Defines functional requirements (FR-1 through FR-8) for inspecting JSON bodies for root-level
 error fields across all provider adapters
• Specifies error handling behavior for both non-streaming and streaming modes, with proper
 ProviderApiError propagation
• Outlines non-functional requirements including backward compatibility, minimal performance impact,
 and integration with existing retry mechanisms

.roo/specs/wrapped-error-interception/requirements.md


36. .roo/specs/recency-biased-thompson-sampling/requirements.md 📝 Documentation +76/-0

Time-decay weighting for Thompson Sampling router analytics

• Defines requirements for implementing time-decay weighting in Thompson Sampling router to
 prioritize recent request data
• Specifies linear decay formula using julianday() SQL function with bounds protection against
 clock drift
• Details backward compatibility requirements with Beta distribution sampler using `Math.max(0.1,
 ...)` guards
• Includes test cases for outage sensitivity and fractional evaluation safety

.roo/specs/recency-biased-thompson-sampling/requirements.md


37. .roo/specs/recency-biased-thompson-sampling/tasks.md 📝 Documentation +17/-0

Implementation task breakdown for recency-biased analytics

• Breaks down implementation into 13 tasks for adding time-decay weighting to router statistics
• Tasks T1-T4 marked complete: adding constants, extending ModelStats interface, rewriting SQL
 query, updating cache population
• Tasks T5-T13 pending: adding safety guards in scoring functions, updating dashboard display,
 writing tests, running test suite

.roo/specs/recency-biased-thompson-sampling/tasks.md


38. .roo/specs/transient-model-cooldown/requirements.md 📝 Documentation +38/-0

Shared temporary cooldowns for transient failure mitigation

• Specifies shared, temporary cooldown mechanism for models experiencing transient failures (5xx or
 connection timeouts)
• Defines 15-second global cooldown window visible to all concurrent requests to reduce unnecessary
 upstream traffic
• Requires sticky session precedence override when preferred model is on cooldown
• Details auto-expiry mechanism and integration with existing routing logic

.roo/specs/transient-model-cooldown/requirements.md


39. .roo/specs/transient-model-cooldown/tasks.md 📝 Documentation +16/-0

Implementation tasks for transient model cooldown feature

• Outlines 8 implementation tasks for adding transient model cooldowns to proxy routing
• Tasks include declaring module-level state, pre-routing cooldown injection, sticky session
 override logic, and cooldown registration on failures
• Specifies test coverage for cooldown injection, expiry pruning, and auto-recovery mechanisms

.roo/specs/transient-model-cooldown/tasks.md


40. .roo/specs/disable-sticky-on-auto/requirements.md 📝 Documentation +44/-0

Disable sticky sessions on balanced routing endpoint

• Specifies disabling sticky session functionality on the balanced/auto endpoint while keeping it
 active on smart/auto-smart
• Defines 6 requirements covering no sticky model/key pinning and no session-level platform bans for
 balanced mode
• Clarifies that per-request retry skip logic remains unchanged for both modes
• Outlines backward compatibility requirements for existing tests

.roo/specs/disable-sticky-on-auto/requirements.md


41. .roo/specs/disable-sticky-on-auto/tasks.md 📝 Documentation +16/-0

Implementation tasks for disabling sticky on auto endpoint

• Lists 4 implementation tasks for disabling sticky sessions on balanced endpoint
• Tasks T1-T3 marked complete: modifying getSessionKey(), adding balanced-mode tests, running
 existing test suite
• Task T4 pending: manual smoke test to verify balanced mode uses free routing

.roo/specs/disable-sticky-on-auto/tasks.md


42. .roo/specs/sse-stream-heartbeat-stall-protection/tasks.md 📝 Documentation +20/-0

SSE stream heartbeat and stall protection implementation

• Defines 11 implementation tasks for adding heartbeat and stall detection to SSE streaming
 responses
• Specifies constants (KEEPALIVE_INTERVAL_MS = 15000, MAX_STREAM_STALL_MS = 45000) and state
 variables for stream monitoring
• Details heartbeat emission logic, stall detection paths (pre-stream and mid-stream), and cleanup
 mechanisms
• Includes 5 unit tests covering heartbeat emission, stall termination, pre-stream 504 fallback,
 client disconnect cleanup, and write failures

.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md


43. client/src/components/pool-badge.tsx ✨ Enhancement +16/-0

New pool badge component for UI display

• Creates new PoolBadge component for displaying routing pool type badges (fast, balanced, smart)
• Defines PoolType type and poolStyles configuration with Tailwind CSS classes for each pool
 variant
• Renders inline badge with pool-specific colors and labels

client/src/components/pool-badge.tsx


44. client/src/components/pool-section.tsx ✨ Enhancement +29/-0

New collapsible pool section component

• Creates new PoolSection component for collapsible pool sections in UI
• Implements expandable/collapsible behavior with toggle arrow indicator
• Integrates PoolBadge component and accepts pool type, title, and children content

client/src/components/pool-section.tsx


45. .roo/specs/pr13-code-review-fixes/tasks.md 📝 Documentation +14/-0

Code review fixes and bug tracking for PR #13

• Lists 10 bug fixes and issues identified in PR #13 code review
• Tasks BUG-01 through BUG-04, BUG-07 through BUG-09 pending: SQL fixes, error propagation, NaN
 guards, hardcoded references, debug script cleanup
• Tasks BUG-05, BUG-06, BUG-10 marked complete: stream abort on stall, cooldown guard fix, double
 semicolon removal

.roo/specs/pr13-code-review-fixes/tasks.md


46. AGENTS.md Additional files +0/-0

...

AGENTS.md


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Jun 5, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (1)

Context used
✅ Compliance rules (platform): 6 rules

Grey Divider


Action required

1. transientModelCooldowns before keys exhausted 📘 Rule violation ≡ Correctness
Description
The proxy applies a model-level penalty (skipModels and a transient cooldown) immediately after a
single request failure, without first attempting other enabled keys for the same platform/model.
This can incorrectly penalize a model even when another key could succeed, violating the requirement
to apply model-level penalties only after all keys are exhausted.
Code

server/src/routes/proxy.ts[R1671-1679]

+      // 5xx failure detection — all providers: model-level ban + transient cooldown
      const errStatus = getErrorStatus(err);
+      const isTransientCooldownEligible = (errStatus !== undefined && errStatus >= 500 && errStatus < 600) || errStatus === undefined;
      if (errStatus && isBanEligibleStatus(errStatus)) {
-        if (route.platform === 'longcat') {
-          console.warn(`[Proxy] 5xx from LongCat — excluding entire LongCat provider for session`);
-          banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId);
-          addProviderModelsToSkipModels(skipModels, 'longcat');
-          // Clear sticky if pinned to LongCat
-          if (preferredModel) {
-            const db = getDb();
-            const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
-            if (prefRow?.platform === 'longcat') {
-              preferredModel = undefined;
-              preferredKeyId = undefined;
-            }
+        // Register transient cooldown for any 5xx ban-eligible error
+        transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);
+        console.warn(`[Proxy] 5xx from ${route.platform}/${route.modelId} — skipping model for session`);
+        skipModels.add(route.modelDbId);
+        // Clear sticky if pinned to this platform
Evidence
PR Compliance ID 876934 requires model-level penalties only after all keys for that model are
exhausted. In handleChatCompletion(), the new logic sets a transient cooldown and adds the model
to skipModels on a single 5xx/connection error, which is a model-level penalty applied before
other keys could be tried.

Rule 876934: Trigger model-level bandit penalty only after all keys for the model are exhausted
server/src/routes/proxy.ts[1671-1679]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handleChatCompletion()` applies model-level skipping/cooldowns (`skipModels.add(route.modelDbId)` and `transientModelCooldowns.set(route.modelDbId, ...)`) after a single failed attempt. This violates the requirement that model-level penalties occur only after all keys for that model are exhausted.

## Issue Context
The retry loop already tracks per-key failures via `skipKeys`, which implies the system can retry the same model with another key. However, `skipModels`/`transientModelCooldowns` prevent routing back to the same model even if other enabled keys exist.

## Fix Focus Areas
- server/src/routes/proxy.ts[1671-1708]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Missing SSE content-type 🐞 Bug ≡ Correctness
Description
handleChatCompletion() writes SSE data: frames for standard chat-completions streaming without
setting Content-Type: text/event-stream (and related SSE headers) on the normal path, only doing
so in the no-chunks fallback. This can cause clients/proxies to buffer or mis-parse the stream and
breaks SSE expectations.
Code

server/src/routes/proxy.ts[R1375-1393]

+              if (!streamStarted) {
+                ttfbMs = Date.now() - start;
+                res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`);
+                if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt));
+                if (responseStreamContext) {
+                  writeResponseStreamStart(res, responseStreamContext, route.modelId);
+                }
+                streamStarted = true;
+              }
+              const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? [];
+              if (deltaToolCalls.length > 0) sawToolCalls = true;
              if (responseStreamContext) {
-                writeResponseStreamStart(res, responseStreamContext, route.modelId);
+                totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk);
+              } else {
+                const text = chunk.choices[0]?.delta?.content ?? '';
+                if (text) streamedText += text;
+                totalOutputTokens += Math.ceil(text.length / 4);
+                res.write(`data: ${JSON.stringify(chunk)}\n\n`);
              }
-              streamStarted = true;
Evidence
On the first streamed chunk, the code sets routing headers and then immediately writes SSE frames,
but never sets Content-Type unless the upstream yields zero chunks (the fallback branch).

server/src/routes/proxy.ts[1375-1393]
server/src/routes/proxy.ts[1427-1434]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
In `server/src/routes/proxy.ts`, the streaming path for non-`responseStreamContext` writes SSE frames (`res.write('data: ...\n\n')`) but does not set `Content-Type: text/event-stream` (and other common SSE headers) before writing.

### Issue Context
SSE responses should set `Content-Type: text/event-stream` early (before the first `res.write`) so that clients and intermediaries treat the response as a stream.

### Fix Focus Areas
- Add SSE headers when streaming starts (before first `res.write` in the `!responseStreamContext` path)
- Keep existing `writeResponseStreamStart(...)` behavior intact for the Responses API path

### Fix Focus Areas (code pointers)
- server/src/routes/proxy.ts[1375-1393]
- server/src/routes/proxy.ts[1427-1434]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Keepalive test mismatch 🐞 Bug ≡ Correctness
Description
The new keepalive unit test expects : keep-alive during a 300ms delay before the first provider
chunk, but the implementation only writes keepalive comments when streamStarted is true (which
only happens after the first chunk arrives). This makes the test fail deterministically (or at best
be flaky) and does not validate the intended heartbeat behavior.
Code

server/src/tests/routes/stream-heartbeat-stall.test.ts[R113-124]

+    const { status, raw } = await request(app, 'POST', '/v1/chat/completions', {
+      messages: [{ role: 'user', content: 'Test heartbeat' }],
+      stream: true,
+    });
+
+    expect(status).toBe(200);
+    // Should contain the actual content
+    expect(raw).toContain('hello');
+    expect(raw).toContain('world');
+    // Should contain at least one keep-alive comment during the 300ms idle period
+    expect(raw).toContain(': keep-alive');
+  });
Evidence
The proxy only writes keepalive comments when streamStarted is true, but streamStarted becomes
true only when the first chunk is processed; meanwhile the test’s delay is entirely before the first
chunk, so the assertion cannot be satisfied.

server/src/routes/proxy.ts[1366-1383]
server/src/tests/routes/stream-heartbeat-stall.test.ts[113-124]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The keepalive test asserts that `: keep-alive` appears during the pre-first-chunk idle window, but the proxy implementation intentionally skips keepalive writes until after the stream has started (after the first chunk / SSE headers).

### Issue Context
`streamStarted` is set only inside the first-chunk handler, and keepalive writes are gated on `streamStarted`, so no keepalive can be emitted before the first chunk.

### Fix Focus Areas
- Update the test to create an idle period *after* the first chunk (e.g., emit chunk #1 immediately to start the stream, then delay 300ms before chunk #2) and assert keepalive appears during that gap
- Alternatively, remove/adjust the assertion to match the intended design (keepalive only after stream start)

### Fix Focus Areas (code pointers)
- server/src/__tests__/routes/stream-heartbeat-stall.test.ts[113-124]
- server/src/routes/proxy.ts[1366-1383]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several major enhancements to the routing and streaming stability of the proxy, including grouping models into fast, balanced, and smart pools on the frontend, excluding LongCat and Owl Alpha from balanced auto-routing, implementing SSE stream heartbeats and stall protection, introducing transient model cooldowns for concurrent failure mitigation, and detecting wrapped error payloads on HTTP 200 responses. Feedback from the review highlights critical issues in proxy.ts: first, changing !key to key === undefined in getStickyModel can lead to session collisions on empty string keys; second, a potential resource and timer leak exists if a client disconnects during a blocked stream loop, which can be resolved with a req.on('close') cleanup listener; and third, an early return on pre-stream stalls bypasses the fallback retry loop, where throwing a 504 error instead would allow proper fallback routing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

function getStickyModel(messages: ChatMessage[], routingMode: RoutingMode): number | undefined {
const key = getSessionKey(messages, routingMode);
if (!key) return undefined;
if (key === undefined) return undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Changing !key to key === undefined allows empty string keys ('') to be treated as valid session keys. If getSessionKey returns '' (which happens when there is no user message or content is not a string), the code will attempt to look up '' in stickySessionMap. This can cause session collisions where multiple requests with empty keys share the same sticky session. Revert this to !key to safely handle both undefined and empty strings.

Suggested change
if (key === undefined) return undefined;
if (!key) return undefined;

Comment on lines +1334 to 1400
let lastChunkTime = Date.now();
let stalled = false;
const keepaliveTimer = setInterval(() => {
if (stalled) {
clearInterval(keepaliveTimer);
return;
}
const elapsed = Date.now() - lastChunkTime;
if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
stalled = true;
clearInterval(keepaliveTimer);
if (streamStarted) {
const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } };
try {
if (responseStreamContext) {
writeResponseStreamEvent(res, {
type: 'response.failed',
response: {
id: responseStreamContext.responseId,
status: 'failed',
error: payload.error,
},
});
} else {
res.write(`data: ${JSON.stringify(payload)}\n\n`);
res.write('data: [DONE]\n\n');
}
res.end();
} catch { /* socket gone */ }
}
return;
}
if (streamStarted && elapsed >= streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS) {
try { res.write(': keep-alive\n\n'); } catch { /* socket gone */ }
}
}, streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS);

try {
for await (const chunk of gen) {
if (stalled) break;
lastChunkTime = Date.now();
if (!streamStarted) {
ttfbMs = Date.now() - start;
res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`);
if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt));
if (responseStreamContext) {
writeResponseStreamStart(res, responseStreamContext, route.modelId);
}
streamStarted = true;
}
const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? [];
if (deltaToolCalls.length > 0) sawToolCalls = true;
if (responseStreamContext) {
writeResponseStreamStart(res, responseStreamContext, route.modelId);
totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk);
} else {
const text = chunk.choices[0]?.delta?.content ?? '';
if (text) streamedText += text;
totalOutputTokens += Math.ceil(text.length / 4);
res.write(`data: ${JSON.stringify(chunk)}\n\n`);
}
streamStarted = true;
}
const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? [];
if (deltaToolCalls.length > 0) sawToolCalls = true;
if (responseStreamContext) {
totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk);
} else {
const text = chunk.choices[0]?.delta?.content ?? '';
if (text) streamedText += text;
totalOutputTokens += Math.ceil(text.length / 4);
res.write(`data: ${JSON.stringify(chunk)}\n\n`);
} finally {
clearInterval(keepaliveTimer);
if (stalled) {
try { gen.return(undefined); } catch { /* already closed */ }
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential resource and timer leak here. If the client disconnects prematurely while the for await loop is suspended waiting for the next chunk from the upstream provider, the finally block will never be entered because the loop is blocked. This leaves the keepaliveTimer running indefinitely. Additionally, when a stall is detected, the generator is not aborted immediately.

To fix this, define a cleanup helper that clears the interval and calls gen.return(), and register it as a req.on('close') listener. This ensures all resources are immediately freed when the client disconnects or a stall occurs.

          let lastChunkTime = Date.now();
          let stalled = false;

          const keepaliveTimer = setInterval(() => {
            if (stalled) {
              clearInterval(keepaliveTimer);
              return;
            }
            const elapsed = Date.now() - lastChunkTime;
            if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
              stalled = true;
              cleanup();
              if (streamStarted) {
                const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } };
                try {
                  if (responseStreamContext) {
                    writeResponseStreamEvent(res, {
                      type: 'response.failed',
                      response: {
                        id: responseStreamContext.responseId,
                        status: 'failed',
                        error: payload.error,
                      },
                    });
                  } else {
                    res.write(`data: ${JSON.stringify(payload)}\n\n`);
                    res.write('data: [DONE]\n\n');
                  }
                  res.end();
                } catch { /* socket gone */ }
              }
              return;
            }
            if (streamStarted && elapsed >= streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS) {
              try { res.write(': keep-alive\n\n'); } catch { /* socket gone */ }
            }
          }, streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS);

          const cleanup = () => {
            clearInterval(keepaliveTimer);
            try { gen.return(undefined); } catch { /* already closed */ }
          };

          req.on('close', cleanup);

          try {
            for await (const chunk of gen) {
              if (stalled) break;
              lastChunkTime = Date.now();
              if (!streamStarted) {
                ttfbMs = Date.now() - start;
                res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`);
                if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt));
                if (responseStreamContext) {
                  writeResponseStreamStart(res, responseStreamContext, route.modelId);
                }
                streamStarted = true;
              }
              const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? [];
              if (deltaToolCalls.length > 0) sawToolCalls = true;
              if (responseStreamContext) {
                totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk);
              } else {
                const text = chunk.choices[0]?.delta?.content ?? '';
                if (text) streamedText += text;
                totalOutputTokens += Math.ceil(text.length / 4);
                res.write(`data: ${JSON.stringify(chunk)}

`);
              }
            }
          } finally {
            req.off('close', cleanup);
            cleanup();
          }

Comment on lines +1414 to +1425
if (stalled && !streamStarted) {
// Pre-stream stall: no headers sent yet, return 504 so the retry loop can try another model
streamAborted = true;
res.status(504).json({
error: {
message: 'Stream timed out: no data received from provider',
type: 'stream_timeout',
},
});
logRequest(route.platform, route.modelId, 'error', estimatedInputTokens, totalOutputTokens, Date.now() - start, ttfbMs, 'Pre-stream stall timeout');
return;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This early return contradicts the design specification and breaks the fallback/retry mechanism. If a pre-stream stall occurs (i.e., streamStarted is false), returning directly from handleChatCompletion terminates the request immediately and sends a 504 to the client, bypassing any remaining fallback attempts in the retry loop. Throwing a 504 error instead allows the outer catch block to handle the failure, register the transient cooldown for the stalled model, and proceed to retry with the next available model in the fallback chain.

          if (stalled && !streamStarted) {
            // Pre-stream stall: throw 504 error so the retry loop can try another model
            throw Object.assign(
              new Error(`Stream timed out: no data received from provider ${route.displayName}`),
              { status: 504 }
            );
          }

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The log message [Proxy] transient cooldowns active for model IDs: [...] is built from the full skipModels set, so it will also include session bans and other skips; consider logging only the IDs actually in transientModelCooldowns to avoid confusing cooldown diagnostics.
  • The new PoolSection UI uses plain ▼/▶ glyphs without any aria attributes; adding aria-expanded and a button/role="button" would make these collapsible sections accessible to screen readers and keyboard users.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The log message `[Proxy] transient cooldowns active for model IDs: [...]` is built from the full `skipModels` set, so it will also include session bans and other skips; consider logging only the IDs actually in `transientModelCooldowns` to avoid confusing cooldown diagnostics.
- The new `PoolSection` UI uses plain ▼/▶ glyphs without any aria attributes; adding `aria-expanded` and a `button`/`role="button"` would make these collapsible sections accessible to screen readers and keyboard users.

## Individual Comments

### Comment 1
<location path="server/src/routes/proxy.ts" line_range="1316-1325" />
<code_context>
+              lastChunkTime = Date.now();
+              if (!streamStarted) {
+                ttfbMs = Date.now() - start;
+                res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`);
+                if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt));
+                if (responseStreamContext) {
+                  writeResponseStreamStart(res, responseStreamContext, route.modelId);
+                }
</code_context>
<issue_to_address>
**issue (bug_risk):** SSE response headers (Content-Type/Connection/etc.) are no longer set for streaming responses.

The previous streaming path explicitly set `Content-Type: text/event-stream`, `Cache-Control: no-cache`, and `Connection: keep-alive` before emitting SSE data. In this new path, only `X-Routed-Via`/`X-Fallback-Attempts` are added, so SSE-specific headers may never be sent, leading some clients or intermediaries to mis-handle the stream.

Unless these headers are guaranteed to be set earlier in the lifecycle, please restore them at the point `streamStarted` becomes true, for both `responseStreamContext` and the plain SSE path.
</issue_to_address>

### Comment 2
<location path="server/src/routes/proxy.ts" line_range="1254-1267" />
<code_context>
+      // Simulate the pre-routing injection logic
+      const skipModels = new Set<number>();
+      const now = Date.now();
+      for (const [id, exp] of transientModelCooldowns) {
+        if (now > exp) {
+          transientModelCooldowns.delete(id);
</code_context>
<issue_to_address>
**suggestion:** Logging of transient cooldowns conflates all skip reasons, not just cooldowns.

This log line is built from the full `skipModels` set, which may include IDs skipped for non-cooldown reasons (session bans, truncation, etc.), so the message doesn’t actually reflect only transient cooldowns.

Consider tracking the IDs added by this block (e.g., a local `cooldownIds` set) and logging those, or emitting a separate log before merging into `skipModels` to keep the signal clear for debugging.

```suggestion
  // Inject transient model cooldowns into skipModels
  {
    const now = Date.now();
    const cooldownIds = new Set<number>();

    for (const [id, exp] of transientModelCooldowns) {
      if (now > exp) {
        transientModelCooldowns.delete(id);
      } else {
        skipModels.add(id);
        cooldownIds.add(id);
      }
    }

    if (cooldownIds.size > 0) {
      console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(cooldownIds).join(',')}]`);
    }
  }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 1316 to +1325
let streamStarted = false;
let ttfbMs: number | null = null;
try {
// Register the session as active
if (sessionKey) {
activeRequests.add({
sessionKey,
platform: route.platform,
modelId: route.modelId,
startTime: Date.now()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): SSE response headers (Content-Type/Connection/etc.) are no longer set for streaming responses.

The previous streaming path explicitly set Content-Type: text/event-stream, Cache-Control: no-cache, and Connection: keep-alive before emitting SSE data. In this new path, only X-Routed-Via/X-Fallback-Attempts are added, so SSE-specific headers may never be sent, leading some clients or intermediaries to mis-handle the stream.

Unless these headers are guaranteed to be set earlier in the lifecycle, please restore them at the point streamStarted becomes true, for both responseStreamContext and the plain SSE path.

Comment on lines +1254 to +1267
// Inject transient model cooldowns into skipModels
{
const now = Date.now();
for (const [id, exp] of transientModelCooldowns) {
if (now > exp) {
transientModelCooldowns.delete(id);
} else {
skipModels.add(id);
}
}
if (skipModels.size > 0) {
console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(skipModels).join(',')}]`);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Logging of transient cooldowns conflates all skip reasons, not just cooldowns.

This log line is built from the full skipModels set, which may include IDs skipped for non-cooldown reasons (session bans, truncation, etc.), so the message doesn’t actually reflect only transient cooldowns.

Consider tracking the IDs added by this block (e.g., a local cooldownIds set) and logging those, or emitting a separate log before merging into skipModels to keep the signal clear for debugging.

Suggested change
// Inject transient model cooldowns into skipModels
{
const now = Date.now();
for (const [id, exp] of transientModelCooldowns) {
if (now > exp) {
transientModelCooldowns.delete(id);
} else {
skipModels.add(id);
}
}
if (skipModels.size > 0) {
console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(skipModels).join(',')}]`);
}
}
// Inject transient model cooldowns into skipModels
{
const now = Date.now();
const cooldownIds = new Set<number>();
for (const [id, exp] of transientModelCooldowns) {
if (now > exp) {
transientModelCooldowns.delete(id);
} else {
skipModels.add(id);
cooldownIds.add(id);
}
}
if (cooldownIds.size > 0) {
console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(cooldownIds).join(',')}]`);
}
}

Comment on lines +1671 to +1679
// 5xx failure detection — all providers: model-level ban + transient cooldown
const errStatus = getErrorStatus(err);
const isTransientCooldownEligible = (errStatus !== undefined && errStatus >= 500 && errStatus < 600) || errStatus === undefined;
if (errStatus && isBanEligibleStatus(errStatus)) {
if (route.platform === 'longcat') {
console.warn(`[Proxy] 5xx from LongCat — excluding entire LongCat provider for session`);
banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId);
addProviderModelsToSkipModels(skipModels, 'longcat');
// Clear sticky if pinned to LongCat
if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === 'longcat') {
preferredModel = undefined;
preferredKeyId = undefined;
}
// Register transient cooldown for any 5xx ban-eligible error
transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);
console.warn(`[Proxy] 5xx from ${route.platform}/${route.modelId} — skipping model for session`);
skipModels.add(route.modelDbId);
// Clear sticky if pinned to this platform

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. transientmodelcooldowns before keys exhausted 📘 Rule violation ≡ Correctness

The proxy applies a model-level penalty (skipModels and a transient cooldown) immediately after a
single request failure, without first attempting other enabled keys for the same platform/model.
This can incorrectly penalize a model even when another key could succeed, violating the requirement
to apply model-level penalties only after all keys are exhausted.
Agent Prompt
## Issue description
`handleChatCompletion()` applies model-level skipping/cooldowns (`skipModels.add(route.modelDbId)` and `transientModelCooldowns.set(route.modelDbId, ...)`) after a single failed attempt. This violates the requirement that model-level penalties occur only after all keys for that model are exhausted.

## Issue Context
The retry loop already tracks per-key failures via `skipKeys`, which implies the system can retry the same model with another key. However, `skipModels`/`transientModelCooldowns` prevent routing back to the same model even if other enabled keys exist.

## Fix Focus Areas
- server/src/routes/proxy.ts[1671-1708]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +1375 to 1393
if (!streamStarted) {
ttfbMs = Date.now() - start;
res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`);
if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt));
if (responseStreamContext) {
writeResponseStreamStart(res, responseStreamContext, route.modelId);
}
streamStarted = true;
}
const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? [];
if (deltaToolCalls.length > 0) sawToolCalls = true;
if (responseStreamContext) {
writeResponseStreamStart(res, responseStreamContext, route.modelId);
totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk);
} else {
const text = chunk.choices[0]?.delta?.content ?? '';
if (text) streamedText += text;
totalOutputTokens += Math.ceil(text.length / 4);
res.write(`data: ${JSON.stringify(chunk)}\n\n`);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Missing sse content-type 🐞 Bug ≡ Correctness

handleChatCompletion() writes SSE data: frames for standard chat-completions streaming without
setting Content-Type: text/event-stream (and related SSE headers) on the normal path, only doing
so in the no-chunks fallback. This can cause clients/proxies to buffer or mis-parse the stream and
breaks SSE expectations.
Agent Prompt
### Issue description
In `server/src/routes/proxy.ts`, the streaming path for non-`responseStreamContext` writes SSE frames (`res.write('data: ...\n\n')`) but does not set `Content-Type: text/event-stream` (and other common SSE headers) before writing.

### Issue Context
SSE responses should set `Content-Type: text/event-stream` early (before the first `res.write`) so that clients and intermediaries treat the response as a stream.

### Fix Focus Areas
- Add SSE headers when streaming starts (before first `res.write` in the `!responseStreamContext` path)
- Keep existing `writeResponseStreamStart(...)` behavior intact for the Responses API path

### Fix Focus Areas (code pointers)
- server/src/routes/proxy.ts[1375-1393]
- server/src/routes/proxy.ts[1427-1434]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +113 to +124
const { status, raw } = await request(app, 'POST', '/v1/chat/completions', {
messages: [{ role: 'user', content: 'Test heartbeat' }],
stream: true,
});

expect(status).toBe(200);
// Should contain the actual content
expect(raw).toContain('hello');
expect(raw).toContain('world');
// Should contain at least one keep-alive comment during the 300ms idle period
expect(raw).toContain(': keep-alive');
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Keepalive test mismatch 🐞 Bug ≡ Correctness

The new keepalive unit test expects : keep-alive during a 300ms delay before the first provider
chunk, but the implementation only writes keepalive comments when streamStarted is true (which
only happens after the first chunk arrives). This makes the test fail deterministically (or at best
be flaky) and does not validate the intended heartbeat behavior.
Agent Prompt
### Issue description
The keepalive test asserts that `: keep-alive` appears during the pre-first-chunk idle window, but the proxy implementation intentionally skips keepalive writes until after the stream has started (after the first chunk / SSE headers).

### Issue Context
`streamStarted` is set only inside the first-chunk handler, and keepalive writes are gated on `streamStarted`, so no keepalive can be emitted before the first chunk.

### Fix Focus Areas
- Update the test to create an idle period *after* the first chunk (e.g., emit chunk #1 immediately to start the stream, then delay 300ms before chunk #2) and assert keepalive appears during that gap
- Alternatively, remove/adjust the assertion to match the intended design (keepalive only after stream start)

### Fix Focus Areas (code pointers)
- server/src/__tests__/routes/stream-heartbeat-stall.test.ts[113-124]
- server/src/routes/proxy.ts[1366-1383]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (10)
.roo/specs/generalized-thread-protection/requirements.md (1)

64-66: 💤 Low value

Consider adding language specifier to fenced code block.

The environment variable format example would render better with a language identifier (e.g., ```bash or ```shell).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/requirements.md around lines 64 -
66, The fenced code block showing THREAD_PROTECTION_PLATFORMS should include a
shell language identifier so syntax/highlighting renders correctly; update the
block that contains
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,owl-alpha:provider-ban,groq:model-skip"
to use a language tag such as ```bash or ```shell at the opening fence (keeping
the variable name THREAD_PROTECTION_PLATFORMS and its value unchanged).
.roo/specs/generalized-thread-protection/design.md (1)

43-45: 💤 Low value

Consider adding language specifier to fenced code block.

The environment variable example would render better with a language identifier (e.g., ```bash or ```shell).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 45, The
fenced code block showing the environment variable THREAD_PROTECTION_PLATFORMS
should include a shell language specifier (e.g., ```bash or ```shell) so the
example renders with proper syntax highlighting; update the block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" to use a
language identifier like ```bash before the line and close with ``` after.
.roo/specs/disable-sticky-on-auto/design.md (1)

11-17: 💤 Low value

Consider adding language specifier to fenced code block.

The code block would render better with a language identifier (e.g., ```typescript or ```javascript).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/disable-sticky-on-auto/design.md around lines 11 - 17, Add a
language identifier to the fenced code block so syntax highlighting works (e.g.,
change ``` to ```typescript or ```javascript) for the snippet containing
stickyOp, getSessionKey, and stickySessionMap; update the triple backtick
opening fence that precedes the function example to include the chosen language
specifier and leave the closing fence unchanged.
.roo/specs/sse-stream-heartbeat-stall-protection/design.md (1)

240-279: 💤 Low value

Consider condensing the design evolution section.

Lines 241-279 show the iterative thinking process ("Wait — this needs more thought", "Better approach", "Final approach"). While this reasoning is valuable during design, the final spec might be clearer if condensed to just the final approach with a brief note about the key decision (pre-stream stall throws 504 for retry, mid-stream stall writes error frame and closes).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 240
- 279, Replace the iterative "design evolution" prose in the
sse-stream-heartbeat-stall-protection section (the paragraphs around the
MAX_STREAM_STALL_MS / stall handling logic) with just the final approach: state
succinctly that when now - lastChunkTimestamp > MAX_STREAM_STALL_MS and
streamStarted is false the handler should throw a retryable 504 error to let the
outer retry/502 logic run, and when streamStarted is true the handler should
write an error frame via writeResponseStreamEvent or plain res.write and then
res.end(); remove the "Wait —", "Better approach" and similar intermediate notes
and keep a one-line rationale that pre-stream stalls are retryable while
mid-stream stalls are terminal (reference symbols: streamStarted,
MAX_STREAM_STALL_MS, route.displayName, responseStreamContext,
writeResponseStreamEvent, res.end()).
server/src/providers/base.ts (1)

135-151: 💤 Low value

Consider extracting rawCode only when needed.

Lines 142-143 extract and parse rawCode unconditionally, even when errPayload is a string (where .code would be undefined). While the logic is safe because line 145 guards the usage, extracting inside the conditional would be clearer:

   protected throwWrappedError(body: unknown): void {
     const obj = body as Record<string, unknown>;
     const errPayload = obj.error;
     const message = this.extractErrorMessage(body, 'Unknown wrapped error');
     const error = new Error(
       `${this.name} API error (wrapped in 200): ${message}`,
     ) as ProviderApiError;
-    const rawCode = (errPayload as Record<string, unknown>).code;
-    const parsedCode = typeof rawCode === 'number' ? rawCode : Number(rawCode);
-    error.status =
-      typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>)
-        ? (Number.isFinite(parsedCode) ? parsedCode : 200)
-        : 200;
+    if (typeof errPayload === 'object' && errPayload !== null && 'code' in errPayload) {
+      const rawCode = (errPayload as Record<string, unknown>).code;
+      const parsedCode = typeof rawCode === 'number' ? rawCode : Number(rawCode);
+      error.status = Number.isFinite(parsedCode) ? parsedCode : 200;
+    } else {
+      error.status = 200;
+    }
     error.provider = this.name;
     error.responseBody = body;
     throw error;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/providers/base.ts` around lines 135 - 151, In throwWrappedError,
avoid unconditionally reading and parsing rawCode—move the extraction and
parsedCode logic inside the conditional that checks errPayload is an object and
has 'code' (the block that sets error.status) so you only access (errPayload as
Record<string, unknown>).code when it's present; compute parsedCode there, use
Number.isFinite(parsedCode) to decide the status, and fallback to 200 as
currently done, keeping error.provider and error.responseBody assignment and the
thrown ProviderApiError unchanged.
client/src/components/pool-badge.tsx (1)

1-1: ⚡ Quick win

Consider importing PoolType from shared types instead of duplicating the definition.

Defining PoolType locally as a string union creates potential for drift if the server-side ModelPool enum changes. Since the review stack context mentions "Shared ModelPool Types," consider importing or deriving this type from @freellmapi/shared/types to ensure the client and server stay in sync.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/src/components/pool-badge.tsx` at line 1, The local PoolType union
should be replaced with the shared type to avoid drift; import the shared
ModelPool/PoolType from the central types package (e.g., from
"`@freellmapi/shared/types`") instead of declaring export type PoolType = 'fast' |
'balanced' | 'smart'; update any references in pool-badge.tsx to use the
imported symbol (PoolType or ModelPool) so client and server share a single
source of truth.
server/src/__tests__/routes/fallback.test.ts (1)

56-62: ⚡ Quick win

Consider testing that expected pools are actually assigned, not just valid.

This test validates that returned pool values are members of the ModelPool enum, but it doesn't verify that the pools you expect to see (e.g., Smart for LongCat, Balanced for others) are actually assigned. If getModelPool() incorrectly returned Balanced for every model, this test would still pass.

🧪 Suggested additional assertion
  it('GET /api/fallback pool values are valid ModelPool enum values', async () => {
    const { body } = await request(app, 'GET', '/api/fallback');
    const validPools = [ModelPool.Fast, ModelPool.Balanced, ModelPool.Smart];
    for (const entry of body) {
      expect(validPools).toContain(entry.pool);
    }
+   // Verify specific expected pool assignments
+   const longcatEntry = body.find((e: any) => e.platform === 'longcat');
+   if (longcatEntry) {
+     expect(longcatEntry.pool).toBe(ModelPool.Smart);
+   }
  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/fallback.test.ts` around lines 56 - 62, Update
the test so it asserts not only that each entry.pool is a valid ModelPool but
also that specific models get the expected pool assignments: call out
getModelPool()/the /api/fallback response and build an expectedPools mapping
(e.g., "LongCat" => ModelPool.Smart, other known model names =>
ModelPool.Balanced) and for each response entry assert entry.pool ===
expectedPools[entry.model] (or equivalent per-model assertions) in addition to
the existing validPools check; reference the response body variable used in the
test and the ModelPool enum to locate where to add these assertions.
client/src/pages/FallbackPage.tsx (1)

67-72: 💤 Low value

The 'fast' pool is defined but never populated by the server.

poolOrder and poolTitles include 'fast', but getModelPool() in the fallback route only returns 'smart' or 'balanced'. While line 300 filters out empty groups (preventing a visual bug), the 'fast' entries in these objects are dead code unless future models will be classified as Fast.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/src/pages/FallbackPage.tsx` around lines 67 - 72, The client defines a
'fast' pool in poolOrder and poolTitles (types PoolType, symbols poolOrder and
poolTitles) but the server-side getModelPool only returns 'smart' or 'balanced',
so 'fast' is dead code; either remove 'fast' from poolOrder and poolTitles (and
update PoolType accordingly) or update getModelPool to classify some models as
'fast' so the client and server agree—pick the approach consistent with product
intent and keep the symbol names poolOrder, poolTitles, and PoolType in sync
with getModelPool.
server/src/routes/proxy.ts (1)

22-24: ⚖️ Poor tradeoff

Consider using a Map instead of Set for more efficient lookups.

The activeRequests Set stores objects and cleanup logic iterates the entire Set to find matching entries (lines 1604-1609, 1657-1662, 1238-1241). Since Sets use reference equality and the code never stores the original reference, each cleanup is O(n).

A Map<string, { platform, modelId, startTime }> keyed by sessionKey (or a composite key) would enable O(1) lookups and deletions.

♻️ Alternative structure
-const activeRequests = new Set<{ sessionKey: string; platform: string; modelId: string; startTime: number }>();
+const activeRequests = new Map<string, { platform: string; modelId: string; startTime: number }>();

Then update registration (line 1321) and cleanup (lines 1604-1609, etc.) to use Map methods.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 22 - 24, Replace the Set named
activeRequests with a Map keyed by sessionKey (or a composite key like
`${sessionKey}:${platform}:${modelId}`) to enable O(1) lookups and deletions;
change its type to Map<string, { platform: string; modelId: string; startTime:
number }>, update the registration logic that currently adds to activeRequests
(the block that creates the active entry) to use activeRequests.set(key, {
platform, modelId, startTime }), and replace all cleanup/lookup code that
currently iterates the Set (the blocks that search for matching entries and
remove them) to use activeRequests.get(key)/activeRequests.has(key) and
activeRequests.delete(key) instead; ensure any composite-key construction is
consistent across where entries are added, checked, and removed.
server/src/services/threadProtection.ts (1)

41-51: ⚡ Quick win

Consider logging invalid protection levels during config parsing.

The parsing loop silently skips invalid protection levels (lines 48-50). If a user sets THREAD_PROTECTION_PLATFORMS=groq:ban-provider (typo: should be provider-ban), it's silently ignored and falls back to the default model-skip. This could lead to unexpected runtime behavior that's difficult to debug.

🔍 Proposed enhancement to add warning logs
       const [platform, level] = trimmed.split(':');
       if (!platform || !level) continue;
       const normalizedLevel = level.trim().toLowerCase();
       if (normalizedLevel === 'provider-ban' || normalizedLevel === 'model-skip' || normalizedLevel === 'off') {
         map.set(platform.trim().toLowerCase(), normalizedLevel as ProtectionLevel);
+      } else {
+        console.warn(`[ThreadProtection] Invalid protection level "${level}" for platform "${platform}" — valid values: provider-ban, model-skip, off`);
       }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/threadProtection.ts` around lines 41 - 51, The parsing
loop in threadProtection.ts silently skips invalid protection levels; update the
loop that processes raw.split(',') (where variables raw, pair, trimmed,
platform, level, normalizedLevel, and map are used) to emit a warning when an
unrecognized normalizedLevel is encountered (i.e., not 'provider-ban',
'model-skip', or 'off'); log the platform and the provided level (and the raw
pair) so users can see the typo/misconfiguration, then continue to skip adding
it to map. Use the same logger used elsewhere in this module (or fallback to
console.warn if none is available) and keep the existing behavior of ignoring
invalid entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/wrapped-error-interception/tasks.md:
- Around line 38-58: Add a clarifying note to the streaming-provider checklist
entries for CohereProvider.streamChatCompletion,
CloudflareProvider.streamChatCompletion, and GoogleProvider.streamChatCompletion
stating that the wrapped-error check (i.e., parsing the first chunk into a
variable and calling this.isWrappedError(...) / this.throwWrappedError(...))
must only be applied to the very first parsed SSE chunk before any chunk is
forwarded to the client; place this note alongside the existing instructions for
each function so implementers know to check the first payload only to avoid
aborting mid-stream after partial content has been sent.

In `@server/src/routes/proxy.ts`:
- Line 53: The current check uses `if (key === undefined)` but `getSessionKey()`
returns an empty string when no key is derived, so replace the condition to
explicitly guard empty-string (and still allow undefined) — e.g., change the
check around the `key` variable in the proxy handler to `if (key === '' || key
=== undefined) return undefined;` so the code never calls
`stickySessionMap.get('')`; refer to `getSessionKey()` and the use of
`stickySessionMap.get(key)` to locate where to change this.

---

Nitpick comments:
In @.roo/specs/disable-sticky-on-auto/design.md:
- Around line 11-17: Add a language identifier to the fenced code block so
syntax highlighting works (e.g., change ``` to ```typescript or ```javascript)
for the snippet containing stickyOp, getSessionKey, and stickySessionMap; update
the triple backtick opening fence that precedes the function example to include
the chosen language specifier and leave the closing fence unchanged.

In @.roo/specs/generalized-thread-protection/design.md:
- Around line 43-45: The fenced code block showing the environment variable
THREAD_PROTECTION_PLATFORMS should include a shell language specifier (e.g.,
```bash or ```shell) so the example renders with proper syntax highlighting;
update the block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" to use a
language identifier like ```bash before the line and close with ``` after.

In @.roo/specs/generalized-thread-protection/requirements.md:
- Around line 64-66: The fenced code block showing THREAD_PROTECTION_PLATFORMS
should include a shell language identifier so syntax/highlighting renders
correctly; update the block that contains
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,owl-alpha:provider-ban,groq:model-skip"
to use a language tag such as ```bash or ```shell at the opening fence (keeping
the variable name THREAD_PROTECTION_PLATFORMS and its value unchanged).

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md:
- Around line 240-279: Replace the iterative "design evolution" prose in the
sse-stream-heartbeat-stall-protection section (the paragraphs around the
MAX_STREAM_STALL_MS / stall handling logic) with just the final approach: state
succinctly that when now - lastChunkTimestamp > MAX_STREAM_STALL_MS and
streamStarted is false the handler should throw a retryable 504 error to let the
outer retry/502 logic run, and when streamStarted is true the handler should
write an error frame via writeResponseStreamEvent or plain res.write and then
res.end(); remove the "Wait —", "Better approach" and similar intermediate notes
and keep a one-line rationale that pre-stream stalls are retryable while
mid-stream stalls are terminal (reference symbols: streamStarted,
MAX_STREAM_STALL_MS, route.displayName, responseStreamContext,
writeResponseStreamEvent, res.end()).

In `@client/src/components/pool-badge.tsx`:
- Line 1: The local PoolType union should be replaced with the shared type to
avoid drift; import the shared ModelPool/PoolType from the central types package
(e.g., from "`@freellmapi/shared/types`") instead of declaring export type
PoolType = 'fast' | 'balanced' | 'smart'; update any references in
pool-badge.tsx to use the imported symbol (PoolType or ModelPool) so client and
server share a single source of truth.

In `@client/src/pages/FallbackPage.tsx`:
- Around line 67-72: The client defines a 'fast' pool in poolOrder and
poolTitles (types PoolType, symbols poolOrder and poolTitles) but the
server-side getModelPool only returns 'smart' or 'balanced', so 'fast' is dead
code; either remove 'fast' from poolOrder and poolTitles (and update PoolType
accordingly) or update getModelPool to classify some models as 'fast' so the
client and server agree—pick the approach consistent with product intent and
keep the symbol names poolOrder, poolTitles, and PoolType in sync with
getModelPool.

In `@server/src/__tests__/routes/fallback.test.ts`:
- Around line 56-62: Update the test so it asserts not only that each entry.pool
is a valid ModelPool but also that specific models get the expected pool
assignments: call out getModelPool()/the /api/fallback response and build an
expectedPools mapping (e.g., "LongCat" => ModelPool.Smart, other known model
names => ModelPool.Balanced) and for each response entry assert entry.pool ===
expectedPools[entry.model] (or equivalent per-model assertions) in addition to
the existing validPools check; reference the response body variable used in the
test and the ModelPool enum to locate where to add these assertions.

In `@server/src/providers/base.ts`:
- Around line 135-151: In throwWrappedError, avoid unconditionally reading and
parsing rawCode—move the extraction and parsedCode logic inside the conditional
that checks errPayload is an object and has 'code' (the block that sets
error.status) so you only access (errPayload as Record<string, unknown>).code
when it's present; compute parsedCode there, use Number.isFinite(parsedCode) to
decide the status, and fallback to 200 as currently done, keeping error.provider
and error.responseBody assignment and the thrown ProviderApiError unchanged.

In `@server/src/routes/proxy.ts`:
- Around line 22-24: Replace the Set named activeRequests with a Map keyed by
sessionKey (or a composite key like `${sessionKey}:${platform}:${modelId}`) to
enable O(1) lookups and deletions; change its type to Map<string, { platform:
string; modelId: string; startTime: number }>, update the registration logic
that currently adds to activeRequests (the block that creates the active entry)
to use activeRequests.set(key, { platform, modelId, startTime }), and replace
all cleanup/lookup code that currently iterates the Set (the blocks that search
for matching entries and remove them) to use
activeRequests.get(key)/activeRequests.has(key) and activeRequests.delete(key)
instead; ensure any composite-key construction is consistent across where
entries are added, checked, and removed.

In `@server/src/services/threadProtection.ts`:
- Around line 41-51: The parsing loop in threadProtection.ts silently skips
invalid protection levels; update the loop that processes raw.split(',') (where
variables raw, pair, trimmed, platform, level, normalizedLevel, and map are
used) to emit a warning when an unrecognized normalizedLevel is encountered
(i.e., not 'provider-ban', 'model-skip', or 'off'); log the platform and the
provided level (and the raw pair) so users can see the typo/misconfiguration,
then continue to skip adding it to map. Use the same logger used elsewhere in
this module (or fallback to console.warn if none is available) and keep the
existing behavior of ignoring invalid entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2916449e-b59f-44a9-8e06-5a905a00cdc4

📥 Commits

Reviewing files that changed from the base of the PR and between 233e031 and f1a0d76.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (46)
  • .npmrc
  • .roo/specs/disable-sticky-on-auto/design.md
  • .roo/specs/disable-sticky-on-auto/requirements.md
  • .roo/specs/disable-sticky-on-auto/tasks.md
  • .roo/specs/generalized-thread-protection/design.md
  • .roo/specs/generalized-thread-protection/requirements.md
  • .roo/specs/generalized-thread-protection/tasks.md
  • .roo/specs/owl-alpha-longcat-model-routing/design.md
  • .roo/specs/owl-alpha-longcat-model-routing/requirements.md
  • .roo/specs/owl-alpha-longcat-model-routing/tasks.md
  • .roo/specs/pr13-code-review-fixes/design.md
  • .roo/specs/pr13-code-review-fixes/requirements.md
  • .roo/specs/pr13-code-review-fixes/tasks.md
  • .roo/specs/recency-biased-thompson-sampling/design.md
  • .roo/specs/recency-biased-thompson-sampling/requirements.md
  • .roo/specs/recency-biased-thompson-sampling/tasks.md
  • .roo/specs/sse-stream-heartbeat-stall-protection/design.md
  • .roo/specs/sse-stream-heartbeat-stall-protection/requirements.md
  • .roo/specs/sse-stream-heartbeat-stall-protection/tasks.md
  • .roo/specs/transient-model-cooldown/design.md
  • .roo/specs/transient-model-cooldown/requirements.md
  • .roo/specs/transient-model-cooldown/tasks.md
  • .roo/specs/wrapped-error-interception/design.md
  • .roo/specs/wrapped-error-interception/requirements.md
  • .roo/specs/wrapped-error-interception/tasks.md
  • AGENTS.md
  • client/src/components/pool-badge.tsx
  • client/src/components/pool-section.tsx
  • client/src/pages/FallbackPage.tsx
  • package.json
  • server/src/__tests__/routes/fallback.test.ts
  • server/src/__tests__/routes/provider-session-ban.test.ts
  • server/src/__tests__/routes/proxy-tools.test.ts
  • server/src/__tests__/routes/stream-heartbeat-stall.test.ts
  • server/src/__tests__/routes/transient-cooldown.test.ts
  • server/src/__tests__/services/router.test.ts
  • server/src/providers/base.ts
  • server/src/providers/cloudflare.ts
  • server/src/providers/cohere.ts
  • server/src/providers/google.ts
  • server/src/providers/openai-compat.ts
  • server/src/routes/fallback.ts
  • server/src/routes/proxy.ts
  • server/src/services/router.ts
  • server/src/services/threadProtection.ts
  • shared/types.ts

Comment on lines +38 to +58
- [x] 7. Add wrapped-error check in `CohereProvider.streamChatCompletion()` in `server/src/providers/cohere.ts`
- Inside the `try` block at line 110, after `JSON.parse(data)` succeeds:
- Insert: `if (this.isWrappedError(parsed)) { this.throwWrappedError(parsed); }`
- Note: assign the result of `JSON.parse` to a variable first, then check, then yield

- [x] 8. Add wrapped-error check in `CloudflareProvider.chatCompletion()` in `server/src/providers/cloudflare.ts`
- After line 62 (`const data = await res.json() as ChatCompletionResponse;`), before line 63 (`data._routed_via = ...`):
- Insert: `if (this.isWrappedError(data)) { this.throwWrappedError(data); }`

- [x] 9. Add wrapped-error check in `CloudflareProvider.streamChatCompletion()` in `server/src/providers/cloudflare.ts`
- Inside the `try` block at line 119, after `JSON.parse(data)` succeeds:
- Insert: `if (this.isWrappedError(parsed)) { this.throwWrappedError(parsed); }`
- Note: assign the result of `JSON.parse` to a variable first, then check, then yield

- [x] 10. Add wrapped-error check in `GoogleProvider.chatCompletion()` in `server/src/providers/google.ts`
- After line 246 (`const data = await res.json() as GeminiResponse;`), before line 247 (`const candidate = data.candidates?.[0];`):
- Insert: `if (this.isWrappedError(data)) { this.throwWrappedError(data); }`

- [x] 11. Add wrapped-error check in `GoogleProvider.streamChatCompletion()` in `server/src/providers/google.ts`
- After line 354 (`chunk = JSON.parse(raw) as GeminiResponse;`), before line 358 (`const candidate = chunk.candidates?.[0];`):
- Insert: `if (this.isWrappedError(chunk)) { this.throwWrappedError(chunk); }`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harmonize streaming error detection guidance across all providers.

Step 5 (line 32) includes an important note about checking wrapped errors only on the first parsed payload before any chunk is forwarded. This aligns with FR-6's requirement that the check applies to "the first SSE chunk." However, Steps 7 (Cohere), 9 (Cloudflare), and 11 (Google) lack this same guidance.

All streaming methods should include the same note to ensure uniform behavior and prevent mid-stream aborts after partial content has been sent to clients.

📝 Suggested addition for Steps 7, 9, and 11

Add the following note to Step 7 (after line 41), Step 9 (after line 50), and Step 11 (after line 58):

   - Insert: `if (this.isWrappedError(parsed)) { this.throwWrappedError(parsed); }`
   - Note: assign the result of `JSON.parse` to a variable first, then check, then yield
+  - Only throw wrapped SSE errors before any chunk has been forwarded (first parsed payload only). Track whether any chunk has been yielded and skip the check after the first yield.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/wrapped-error-interception/tasks.md around lines 38 - 58, Add a
clarifying note to the streaming-provider checklist entries for
CohereProvider.streamChatCompletion, CloudflareProvider.streamChatCompletion,
and GoogleProvider.streamChatCompletion stating that the wrapped-error check
(i.e., parsing the first chunk into a variable and calling
this.isWrappedError(...) / this.throwWrappedError(...)) must only be applied to
the very first parsed SSE chunk before any chunk is forwarded to the client;
place this note alongside the existing instructions for each function so
implementers know to check the first payload only to avoid aborting mid-stream
after partial content has been sent.

function getStickyModel(messages: ChatMessage[], routingMode: RoutingMode): number | undefined {
const key = getSessionKey(messages, routingMode);
if (!key) return undefined;
if (key === undefined) return undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Incorrect type check causes logic error.

Line 53 checks if (key === undefined), but getSessionKey() (line 41-49) returns an empty string '' when no session key can be derived, not undefined. This condition will never trigger for empty strings, allowing the function to proceed with an empty key and potentially access stickySessionMap.get('').

🐛 Proposed fix
-  if (key === undefined) return undefined;
+  if (!key) return undefined;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (key === undefined) return undefined;
if (!key) return undefined;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` at line 53, The current check uses `if (key ===
undefined)` but `getSessionKey()` returns an empty string when no key is
derived, so replace the condition to explicitly guard empty-string (and still
allow undefined) — e.g., change the check around the `key` variable in the proxy
handler to `if (key === '' || key === undefined) return undefined;` so the code
never calls `stickySessionMap.get('')`; refer to `getSessionKey()` and the use
of `stickySessionMap.get(key)` to locate where to change this.

@vi70x3 vi70x3 closed this Jun 5, 2026
@vi70x4 vi70x4 deleted the feat/realtime-sticky branch June 7, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant