Fix/realtime sticky by vi70x3 · Pull Request #18 · animaios/api-llm-localhost

vi70x3 · 2026-06-05T17:21:13Z

Summary by Sourcery

Refine routing resilience and admin UX by introducing generalized thread protection, transient model cooldowns, and SSE stream heartbeat/stall handling, while surfacing model pools in the fallback dashboard and tightening analytics and error handling across providers.

New Features:

Expose model pool metadata via the fallback API and group models into Fast/Balanced/Smart pools with collapsible sections in the admin fallback page.
Introduce a generalized thread protection service that centralizes provider/model ban decisions based on error context and configurable protection levels.
Add a shared transient model cooldown mechanism so recent 5xx failures temporarily suppress problematic models across concurrent requests.
Implement SSE stream keepalive heartbeats and stall detection to protect long-running streams from upstream hangs and intermediary timeouts.
Support detection and propagation of wrapped error payloads returned with HTTP 200 across OpenAI-compatible, Cohere, Cloudflare, and Google providers.

Bug Fixes:

Correct and harden recency-weighted analytics aggregation for routing decisions to avoid miscomputed statistics.
Ensure balanced-mode routing respects sticky preferences by allowing preferred models from otherwise excluded platforms.
Fix tests and specs that referenced outdated paths, behaviors, or expectations around sticky sessions, routing, and fallback metadata.

Enhancements:

Bias Thompson-sampling-based routing toward recent behavior using time-decayed success/total counts while keeping raw counts for observability.
Track active requests per session/platform and apply safeguards for provider-ban platforms to prevent concurrent overload from multiple sessions.
Normalize provider error handling so 5xx, truncation, and retryable errors flow through the thread protection rules instead of hardcoded LongCat/Owl Alpha logic.
Export and reuse base provider helpers for richer error classification, including wrapped error parsing and safer status code derivation.
Improve fallback API validation and tests to assert that pool values conform to the shared ModelPool enum.

Documentation:

Document the designs and requirements for generalized thread protection, transient cooldowns, recency-biased Thompson sampling, SSE stall protection, and wrapped error interception in the .roo spec suite.

Tests:

Add comprehensive tests for transient model cooldown behavior, including registration, pruning, sticky override, and integration with provider bans.
Add SSE streaming tests that cover heartbeats, stall timeouts, pre-stream stall retries, client disconnect cleanup, and normal streaming with keepalive enabled.
Extend routing, provider-ban, and fallback API test coverage to reflect balanced-mode stickiness, updated analytics fields, and model pool metadata.

…alized thread protection scanner

… longcat branches

…tracking for LongCat and Owl Alpha

- Change activeRequests from Map to Set to allow concurrent requests from same session - Add stale active request cleanup with 10-minute TTL - Cache owl-alpha model ID to avoid repeated DB lookups - Fix active request iteration to use Set-compatible syntax

- Remove package-lock.json (npm lockfile) - Add packageManager field to package.json - Create .npmrc with pnpm configuration

BUG-05: Abort upstream provider stream on stall detection by breaking the for-await loop and calling gen.return() when the keepalive timer detects MAX_STREAM_STALL_MS has elapsed without data. BUG-06: Fix cooldown guard to use the actual routable fallback chain (fallback_config JOIN models) instead of all enabled models, ensuring transient cooldowns only skip models that would actually be routed to. BUG-10: Remove double semicolon in proxy.ts. Also adds SSE keep-alive comments during idle periods, transient model cooldown injection before retry loops, and LongCat sticky session cooldown support in balanced routing mode.

…, TTL refresh, collapsible pools, doc paths, cleanup

… pre-stream, cooldown gating, timer cleanup, a11y, log clarity

… pre-stream, cooldown gating, timer cleanup, a11y, log clarity, test fixes

mergeguards · 2026-06-05T17:21:17Z

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

sourcery-ai · 2026-06-05T17:21:21Z

Reviewer's Guide

Introduces generalized provider/thread protection for streaming and non-streaming chat completions, adds transient model cooldowns and SSE heartbeat/stall protection on the server, and restructures fallback routing analytics and UI by adding model pools with grouping, while also improving wrapped-error handling and fixing several design/spec/test issues.

Sequence diagram for SSE streaming with stall protection and thread protection

sequenceDiagram
  actor Client
  participant Proxy as handleChatCompletion
  participant Provider as route.provider
  participant ThreadProtection as evaluateThreadProtection

  Client->>Proxy: handleChatCompletion
  Proxy->>Provider: streamChatCompletion(apiKey, messages, modelId, options)
  Proxy->>Proxy: activeRequests.add
  Proxy->>Proxy: setInterval(keepaliveTimer)

  loop for each chunk
    Provider-->>Proxy: ChatCompletionChunk
    Proxy->>Proxy: writeResponseStreamChunk / res.write
  end

  alt [stream stalls]
    Proxy->>Proxy: cleanup
    alt [streamStarted]
      Proxy-->>Client: writeResponseStreamEvent / res.write timeout
      Proxy-->>Client: res.end
    else [pre-stream stall]
      Proxy->>Proxy: throw Error(status=504)
    end
  else [mid-stream 5xx]
    Proxy->>ThreadProtection: evaluateThreadProtection({ platform, kind: '5xx', midStream: true })
    ThreadProtection-->>Proxy: ThreadProtectionAction
    alt [action.banProvider]
      Proxy->>Proxy: banPlatformFromSession
      Proxy->>Proxy: addProviderModelsToSkipModels
    end
    alt [action.skipModel]
      Proxy->>Proxy: skipModels.add(route.modelDbId)
    end
    Proxy->>Proxy: transientModelCooldowns.set(route.modelDbId, expiry)
  end

  Proxy->>Proxy: activeRequests.delete(sessionKey, platform, modelId)
  Proxy-->>Client: stream completes / response

File-Level Changes

Change	Details	Files
Generalized provider-ban/thread protection, transient model cooldowns, active-request safeguards, and SSE heartbeat/stall handling in chat completion proxy routing.	Import and use threadProtection service (getProtectionLevel, evaluateThreadProtection) to replace hardcoded LongCat/Owl Alpha error-handling branches. Introduce activeRequests tracking per session/platform/model and use it to exclude provider-ban platforms when another session is actively streaming from them. Add transientModelCooldowns map and associated TRANSIENT_COOLDOWN_MS, inject cooled-down models into skipModels, and register cooldowns on mid-stream and pre-stream 5xx/connection failures. Implement streamKeepaliveConfig with keepalive/stall intervals, plus SSE heartbeat comments and stall timeout handling that either retries pre-stream or emits structured stream_timeout errors mid-stream. Ensure active requests and stream generators are cleaned up on completion, error, or client disconnect to avoid leaks. Adjust provider-ban sticky cooldown to rely on provider protection level instead of specific platforms, and make balanced mode respect sticky preferred models for excluded platforms. Update and extend tests for provider session bans and proxy tools to reflect the new balanced-mode sticky behavior and transient cooldown logic. Document thread protection, transient cooldowns, stall protection, and PR13 fix plan in new design/requirements specs.	`server/src/routes/proxy.ts` `server/src/services/threadProtection.ts` `server/src/__tests__/routes/proxy-tools.test.ts` `server/src/__tests__/routes/provider-session-ban.test.ts` `server/src/__tests__/routes/stream-heartbeat-stall.test.ts` `.roo/specs/sse-stream-heartbeat-stall-protection/design.md` `.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md` `.roo/specs/transient-model-cooldown/design.md` `.roo/specs/transient-model-cooldown/requirements.md` `.roo/specs/generalized-thread-protection/design.md` `.roo/specs/generalized-thread-protection/requirements.md` `.roo/specs/pr13-code-review-fixes/design.md` `.roo/specs/pr13-code-review-fixes/requirements.md` `.roo/specs/pr13-code-review-fixes/tasks.md`
Improve provider adapters to detect wrapped error payloads on HTTP 200 responses and surface them as ProviderApiError, plus expose shared error helpers.	Make BaseProvider.extractErrorMessage protected and add isWrappedError/throwWrappedError helpers for detecting root-level error objects in parsed JSON bodies. Invoke wrapped error detection in chatCompletion and streamChatCompletion for OpenAI-compatible, Cloudflare, Cohere, and Google providers, ensuring wrapped 200 errors throw ProviderApiError before normalization. Handle wrapped errors in streaming by checking the first parsed SSE chunk and aborting with throwWrappedError, while ignoring malformed chunks. Set error.status from error.code when numeric, with a fallback and NaN guard, so downstream retry logic can classify errors (e.g., rate limits). Capture and route provider/wrapped errors into existing proxy retry and cooldown logic without modifying router interfaces.	`server/src/providers/base.ts` `server/src/providers/openai-compat.ts` `server/src/providers/cloudflare.ts` `server/src/providers/cohere.ts` `server/src/providers/google.ts` `.roo/specs/wrapped-error-interception/design.md` `.roo/specs/wrapped-error-interception/requirements.md` `.roo/specs/wrapped-error-interception/tasks.md`
Add recency-biased analytics and model pools, and expose pool metadata in the fallback API and UI grouped sections.	Change router analytics aggregation to use recency-weighted successes/total plus rawTotal for display, and ensure analytics cache is refreshed before computing scores. Ensure balanced routing excludes LongCat and Owl Alpha by default but allows them when they are the sticky preferred model for a session. Expose ModelPool enum (Fast/Balanced/Smart) in shared types and compute pool from platform/model in fallback route, adding pool/speedRank to the fallback API response with validation tests. Refactor fallback page to group models by pool using new PoolSection/PoolBadge components and show pool-specific titles while keeping sorting and toggling intact. Simplify router tests, ensure enabled/invalid keys are handled correctly, and assert routed API key content.	`server/src/services/router.ts` `server/src/routes/fallback.ts` `server/src/__tests__/services/router.test.ts` `server/src/__tests__/routes/fallback.test.ts` `shared/types.ts` `client/src/pages/FallbackPage.tsx` `client/src/components/pool-section.tsx` `client/src/components/pool-badge.tsx` `.roo/specs/recency-biased-thompson-sampling/design.md` `.roo/specs/recency-biased-thompson-sampling/requirements.md` `.roo/specs/recency-biased-thompson-sampling/tasks.md`
Add and update design/spec/task documents and test scaffolding for PR13 fixes, thread protection, cooldowns, and SSE stall protection.	Clean up and reformat existing owl-alpha/longcat routing design docs (code fences, relative paths). Add detailed requirements/design/tasks specs for SSE heartbeat/stall protection, transient model cooldown, generalized thread protection, and PR13 follow-up fixes. Introduce dedicated test suites for stream heartbeat/stall behavior and transient cooldown behavior, including various edge cases and classification tests. Track pnpm as the package manager in package.json and add .npmrc to align tooling.	`.roo/specs/owl-alpha-longcat-model-routing/design.md` `.roo/specs/owl-alpha-longcat-model-routing/requirements.md` `.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md` `.roo/specs/transient-model-cooldown/tasks.md` `.roo/specs/generalized-thread-protection/tasks.md` `.roo/specs/pr13-code-review-fixes/tasks.md` `.roo/specs/pr13-code-review-fixes/design.md` `.roo/specs/pr13-code-review-fixes/requirements.md` `server/src/__tests__/routes/stream-heartbeat-stall.test.ts` `server/src/__tests__/routes/transient-cooldown.test.ts`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

qodo-code-review · 2026-06-05T17:22:00Z

Review Summary by Qodo

Realtime Sticky Session Improvements: Thread Protection, Stream Stall Detection, Transient Cooldowns, and Wrapped Error Handling

✨ Enhancement 🐞 Bug fix 🧪 Tests

Walkthroughs

Description

  **Core Features:**
• Generalized thread protection system replacing hardcoded platform checks with configurable
  getProtectionLevel() and evaluateThreadProtection() functions supporting provider-ban,
  model-skip, and off modes
• Stream keepalive heartbeat (15s interval) and stall detection (45s threshold) to prevent hanging
  SSE streams with graceful termination
• Transient model cooldowns (15s global window) for 5xx errors shared across concurrent requests
  with sticky session override logic
• Wrapped error detection for HTTP 200 responses with root-level error field across all provider
  adapters (OpenAI-compat, Cohere, Cloudflare, Google)
• Recency-weighted analytics using 7-day decay function to prioritize recent performance data in
  routing decisions
  **Implementation Details:**
• New threadProtection.ts service module with environment-driven configuration via
  THREAD_PROTECTION_PLATFORMS
• Active request tracking to prevent concurrent sessions from overwhelming provider-ban platforms
• Fallback chain for model skipping instead of all enabled models
• Model pool classification (Fast, Balanced, Smart) in fallback API and UI with collapsible pool
  sections
• Updated balanced mode routing to allow preferred sticky models through exclusion filters
  **Testing & Documentation:**
• Comprehensive test suites for transient cooldowns (30+ cases), stream heartbeat/stall detection (5
  cases), and pool classification
• Design specifications for all major features with architecture diagrams and edge case analysis
• Requirements and task documentation for implementation tracking
• Bug fix documentation addressing 10 verified issues from code review
  **Configuration:**
• Added pnpm package manager specification (v11.1.3) with npm configuration file
• Protected error message extraction visibility for provider reuse

Diagram

flowchart LR
  A["Incoming Request"] --> B["Thread Protection Scanner"]
  B --> C["getProtectionLevel()"]
  C --> D["evaluateThreadProtection()"]
  D --> E["Action: Ban/Skip/Clear"]
  
  A --> F["Transient Cooldown Check"]
  F --> G["Skip Cooled Models"]
  G --> H["Route to Provider"]
  
  H --> I["Stream Keepalive"]
  I --> J["Heartbeat 15s"]
  J --> K["Stall Detection 45s"]
  K --> L["Graceful Termination"]
  
  H --> M["Wrapped Error Detection"]
  M --> N["HTTP 200 + error field"]
  N --> O["Register Cooldown"]
  O --> P["Retry with Next Model"]

File Changes

1. server/src/routes/proxy.ts ✨ Enhancement +333/-214

Generalized thread protection, stream stall detection, and transient cooldowns

• Refactored sticky session cooldown logic to use generalized getProtectionLevel() and
 evaluateThreadProtection() functions instead of hardcoded LongCat/Owl Alpha checks
• Added stream keepalive heartbeat and stall detection with configurable intervals to prevent
 hanging streams
• Implemented active request tracking to prevent concurrent sessions from overwhelming provider-ban
 platforms
• Added transient model cooldowns (15s global cooldown) for models returning 5xx errors, shared
 across all concurrent requests
• Replaced provider-specific error handling with unified evaluateThreadProtection() decision
 matrix for 5xx, truncation, and retryable errors
• Updated addProviderModelsToSkipModels() to use fallback chain instead of all enabled models

server/src/routes/proxy.ts

2. server/src/services/threadProtection.ts ✨ Enhancement +119/-0

New thread protection service with configurable platform policies

• New service module implementing configurable thread protection levels per platform (provider-ban,
 model-skip, off)
• Provides getProtectionLevel() to look up protection configuration and
 evaluateThreadProtection() to determine actions (ban provider, skip model, clear sticky)
• Parses THREAD_PROTECTION_PLATFORMS environment variable for runtime configuration with
 backward-compatible defaults
• Centralizes error response decision logic previously scattered across proxy.ts

server/src/services/threadProtection.ts

3. server/src/__tests__/routes/transient-cooldown.test.ts 🧪 Tests +415/-0

Test suite for transient model cooldown system

• Comprehensive test suite for transient model cooldown functionality with 6 test suites covering
 30+ test cases
• Tests cooldown map basics, injection/pruning logic, auto-recovery after expiry, sticky session
 override behavior
• Validates cooldown registration eligibility (5xx and connection failures only, not auth/rate-limit
 errors)
• Tests integration with addProviderModelsToSkipModels() for combined session-ban and
 global-cooldown scenarios

server/src/tests/routes/transient-cooldown.test.ts

View more (39)

4. server/src/__tests__/routes/stream-heartbeat-stall.test.ts 🧪 Tests +330/-0

Test suite for stream heartbeat and stall detection

• New test suite validating SSE stream heartbeat and stall protection with 5 test cases
• Tests keep-alive comment emission during idle periods, stream termination on stall detection,
 pre-stream stall handling
• Validates client disconnect cleanup and normal streaming operation with heartbeat enabled
• Uses configurable streamKeepaliveConfig for test-friendly timing

server/src/tests/routes/stream-heartbeat-stall.test.ts

5. server/src/services/router.ts ✨ Enhancement +21/-10

Recency-weighted analytics and sticky model routing exceptions

• Added recency weighting to analytics stats calculation using 7-day decay function to prioritize
 recent performance data
• Added rawTotal field to ModelStats to track unweighted request count for analytics reporting
• Updated getAnalyticsScores() to refresh stats cache and return rawTotal instead of weighted
 total
• Modified balanced mode routing to allow preferred sticky models through exclusion filters
 (LongCat, Owl Alpha)

server/src/services/router.ts

6. server/src/__tests__/services/router.test.ts 🧪 Tests +2/-27

Minor router test cleanup and imports

• Removed test for invalid key status (no longer relevant)
• Added import for refreshStatsCache and getAnalyticsScores functions
• Simplified test setup by removing redundant comments

server/src/tests/services/router.test.ts

7. server/src/__tests__/routes/provider-session-ban.test.ts 🧪 Tests +14/-14

Update balanced mode session tests for new key behavior

• Updated balanced mode tests to reflect new behavior where getSessionKey() returns real hash
 instead of empty string
• Changed test expectations for banPlatformFromSession() and setStickyModel() to expect entries
 in balanced mode
• Updated test descriptions to clarify that balanced mode now uses real session keys

server/src/tests/routes/provider-session-ban.test.ts

8. server/src/providers/base.ts ✨ Enhancement +30/-1

Add wrapped error detection and handling to base provider

• Changed extractErrorMessage() visibility from private to protected for reuse in error
 handling
• Added isWrappedError() predicate to detect root-level error field in JSON responses (HTTP 200
 with error payload)
• Added throwWrappedError() helper to construct and throw ProviderApiError from wrapped error
 payloads with proper status code extraction

server/src/providers/base.ts

9. server/src/providers/openai-compat.ts ✨ Enhancement +14/-1

Add wrapped error detection to OpenAI-compatible provider

• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() for first SSE chunk before yielding
• Improved error handling to skip malformed chunks and detect wrapped errors in streaming responses

server/src/providers/openai-compat.ts

10. server/src/providers/cohere.ts ✨ Enhancement +12/-1

Add wrapped error detection to Cohere provider
• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() for SSE chunks before yielding
server/src/providers/cohere.ts

11. server/src/providers/cloudflare.ts ✨ Enhancement +12/-1

Add wrapped error detection to Cloudflare provider
• Added wrapped error detection in chatCompletion() after JSON parsing
• Added wrapped error detection in streamChatCompletion() for SSE chunks before yielding
server/src/providers/cloudflare.ts

12. server/src/providers/google.ts ✨ Enhancement +10/-0

Add wrapped error detection to Google provider

• Added wrapped error detection in chatCompletion() after JSON parsing and before candidate access
• Added wrapped error detection in streamChatCompletion() after parsing each Gemini response chunk

server/src/providers/google.ts

13. server/src/routes/fallback.ts ✨ Enhancement +10/-0

Add model pool classification to fallback API

• Added getModelPool() function to classify models into Smart or Balanced pools based on
 platform/model ID
• Added pool field to fallback API response indicating which pool each model belongs to
• Imported ModelPool type from shared types

server/src/routes/fallback.ts

14. server/src/__tests__/routes/fallback.test.ts 🧪 Tests +11/-0

Add pool property validation tests to fallback API

• Added test to verify fallback API response includes pool property
• Added test to validate all returned pool values are valid ModelPool enum values
• Imported ModelPool type for test validation

server/src/tests/routes/fallback.test.ts

15. server/src/__tests__/routes/proxy-tools.test.ts 🧪 Tests +4/-3

 Minor updates to proxy tools tests for transient cooldown support
server/src/tests/routes/proxy-tools.test.ts

16. shared/types.ts ✨ Enhancement +8/-0

Add ModelPool enum type to shared types

• Added ModelPool constant object with three pool types: Fast, Balanced, Smart
• Added ModelPool type alias for the union of pool type values

shared/types.ts

17. .roo/specs/sse-stream-heartbeat-stall-protection/design.md 📝 Documentation +330/-0

Design specification for stream heartbeat and stall protection
• Comprehensive design document for stream heartbeat and stall detection feature
• Includes architecture diagrams, implementation details, edge case handling, and file modification
 guide
• Documents interaction with existing code paths and Responses API streams
.roo/specs/sse-stream-heartbeat-stall-protection/design.md

18. .roo/specs/wrapped-error-interception/design.md 📝 Documentation +337/-0

Design specification for wrapped error interception

• Comprehensive design document for wrapped error payload detection on HTTP 200 responses
• Includes architecture overview, component changes for each provider, error detection flow, and
 edge case analysis
• Documents wrapped error formats and integration with existing retry/cooldown logic

.roo/specs/wrapped-error-interception/design.md

19. .roo/specs/transient-model-cooldown/requirements.md 📝 Documentation +38/-0

Requirements specification for transient model cooldowns

• Requirements document for shared temporary cooldowns to mitigate concurrent failure impact
• Defines problem statement, 6 key requirements, scope, and acceptance criteria
• Specifies 15-second cooldown window and integration with existing routing logic

.roo/specs/transient-model-cooldown/requirements.md

20. package.json ⚙️ Configuration changes +1/-0

Specify pnpm package manager version
• Added packageManager field specifying pnpm@11.1.3 as the required package manager
package.json

21. .npmrc ⚙️ Configuration changes +4/-0

Add pnpm configuration file

• New npm configuration file with pnpm-specific settings
• Enables shamefully-hoist, disables strict peer dependencies, enables auto-install-peers

.npmrc

22. .roo/specs/pr13-code-review-fixes/requirements.md 📝 Documentation +268/-0

PR #13 Code Review Bugs Documentation and Fix Plan

• Comprehensive documentation of 10 verified bugs found in PR #13, organized by severity (P0
 critical, P1 behavioral, P2 code quality)
• Detailed problem statements, code examples, and impact analysis for each bug including SQL
 parenthesis mismatch, wrapped error swallowing, NaN validation, hardcoded platform references, stall
 detection, and cooldown guard issues
• Acceptance criteria and priority matrix provided for tracking fixes

.roo/specs/pr13-code-review-fixes/requirements.md

23. .roo/specs/transient-model-cooldown/design.md Design +197/-0

Transient Model Cooldown Circuit Breaker Design

• Architecture for shared in-memory circuit breaker using module-level transientModelCooldowns Map
 to track temporary model failures across concurrent requests
• Integration points for cooldown injection, sticky session override, failure registration, and
 mid-stream error handling
• Error classification matrix distinguishing between 5xx/connection failures (trigger cooldown) vs
 rate limits/auth/client errors (do not trigger)
• Test strategy and risk mitigation for preventing all-models-on-cooldown scenarios

.roo/specs/transient-model-cooldown/design.md

24. .roo/specs/recency-biased-thompson-sampling/design.md Design +238/-0

Recency-Biased Thompson Sampling Time-Decay Architecture

• Time-decay weighting mechanism for analytics aggregation using linear decay formula over 7-day
 window
• SQL CTE-based query with MIN(1.0, MAX(0.0, ...)) bounds to protect against clock drift
• Beta parameter safety guards using Math.max(0.1, ...) to prevent non-positive values
• Dashboard display updates showing weighted success rates alongside raw request counts

.roo/specs/recency-biased-thompson-sampling/design.md

25. .roo/specs/pr13-code-review-fixes/design.md Design +210/-0

PR #13 Code Review Fixes Design and Implementation

• Design decisions for fixing 10 identified bugs with specific code patterns and examples
• Detailed fixes for SQL parenthesis, wrapped error propagation, NaN validation, and hardcoded
 platform references
• Stall upstream abort mechanism using AbortController and cooldown guard model set correction
• Risk assessment matrix mapping each fix to risk level and mitigation strategy

.roo/specs/pr13-code-review-fixes/design.md

26. .roo/specs/sse-stream-heartbeat-stall-protection/requirements.md Requirements +132/-0

SSE Stream Heartbeat and Stall Protection Requirements

• SSE keep-alive heartbeat mechanism (15-second interval) to prevent intermediate proxy idle
 timeouts
• Stream stall detection (45-second threshold) with graceful termination and structured error frames
• Client-disconnect cleanup and heartbeat write failure handling with idempotent cleanup routine
• Pre-stream and mid-stream stall behavior differentiation with appropriate error responses

.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md

27. .roo/specs/generalized-thread-protection/requirements.md Requirements +109/-0

Generalized Thread Protection Scanner Requirements

• Problem statement addressing 6+ hardcoded longcat platform checks scattered across proxy.ts
• User stories for configurable thread protection via environment variables and platform-agnostic
 rules engine
• Acceptance criteria requiring zero hardcoded platform names and unified
 evaluateThreadProtection() decision point
• Technical requirements for rules engine API, configuration format, and proxy refactoring with
 migration plan

.roo/specs/generalized-thread-protection/requirements.md

28. client/src/pages/FallbackPage.tsx ✨ Enhancement +49/-30

Fallback Page Pool-Based Model Grouping UI

• Added PoolSection component import and PoolType type import for pool-based grouping
• Added pool field to FallbackEntry interface to support pool categorization
• Refactored model display to group entries by pool (fast, balanced, smart) with collapsible
 sections
• Pool groups filtered to show only non-empty pools with descriptive titles

client/src/pages/FallbackPage.tsx

29. .roo/specs/generalized-thread-protection/design.md Design +152/-0

Generalized Thread Protection Scanner Architecture

• Architecture overview showing thread protection scanner replacing hardcoded platform checks with
 dynamic rules engine
• Protection rules matrix defining behavior for provider-ban, model-skip, and off levels
 across error types
• Scanner API with ErrorContext and ThreadProtectionAction interfaces for decision matrix
 implementation
• Integration points replacing 6 hardcoded longcat blocks and sticky cooldown generalization

.roo/specs/generalized-thread-protection/design.md

30. .roo/specs/wrapped-error-interception/tasks.md Tasks +69/-0

Wrapped Error Interception Implementation Tasks

• Implementation tasks for adding isWrappedError() and throwWrappedError() methods to
 BaseProvider
• Tasks for integrating wrapped error checks in all four provider implementations (OpenAI-compat,
 Cohere, Cloudflare, Google)
• Visibility change for extractErrorMessage() from private to protected for reuse
• TypeScript compilation and test verification tasks

.roo/specs/wrapped-error-interception/tasks.md

31. .roo/specs/wrapped-error-interception/requirements.md Requirements +53/-0

Wrapped Error Payloads on HTTP 200 Responses Requirements

• Critical edge case handling for upstream providers returning error payloads with HTTP 200 status
• Detection layer for root-level error field in JSON responses before normalization
• Functional requirements for all provider adapters to inspect parsed JSON and throw
 ProviderApiError
• Non-functional requirements for backward compatibility, minimal performance impact, and existing
 retry loop integration

.roo/specs/wrapped-error-interception/requirements.md

32. .roo/specs/recency-biased-thompson-sampling/tasks.md Tasks +17/-0

Recency-Biased Thompson Sampling Implementation Tasks

• Task breakdown for implementing time-decay aggregation including ANALYTICS_WINDOW_DAYS constant
 and ModelStats interface extension
• SQL query rewrite with CTE-based weighted aggregation and Math.max(0.1, ...) guards in scoring
 functions
• Dashboard display updates to show rawTotal instead of weighted totals
• Test cases for outage sensitivity, safe fractional evaluation, and clock drift safety

.roo/specs/recency-biased-thompson-sampling/tasks.md

33. .roo/specs/recency-biased-thompson-sampling/requirements.md Requirements +76/-0

Recency-Biased Thompson Sampling Requirements

• Linear time-decay weighting formula for historical request aggregation over 7-day window
• Backward compatibility requirements with Beta sampling using Math.max(0.1, ...) guards
• Zero-extension portability using standard SQLite julianday() function
• Test cases for outage sensitivity and safe fractional evaluation with edge case risk mitigation

.roo/specs/recency-biased-thompson-sampling/requirements.md

34. .roo/specs/sse-stream-heartbeat-stall-protection/tasks.md Tasks +20/-0

SSE Stream Heartbeat and Stall Protection Tasks

• Task list for adding KEEPALIVE_INTERVAL_MS and MAX_STREAM_STALL_MS constants
• Implementation of heartbeat interval with stall detection logic and cleanupStream() function
• Pre-stream and mid-stream stall handling with appropriate error frames and response termination
• Unit tests for heartbeat emission, stall detection, client-disconnect cleanup, and write failure
 handling

.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md

35. .roo/specs/transient-model-cooldown/tasks.md Tasks +16/-0

Transient Model Cooldown Implementation Tasks

• Implementation tasks for declaring transientModelCooldowns Map and cooldown constant at module
 level
• Pre-routing cooldown injection with expired entry pruning and sticky session override logic
• Global cooldown registration on 5xx and connection failures in retry loop and mid-stream error
 handlers
• Unit test coverage for cooldown injection, registration, sticky override, and auto-recovery

.roo/specs/transient-model-cooldown/tasks.md

36. client/src/components/pool-section.tsx ✨ Enhancement +41/-0

Pool Section Collapsible Component

• New collapsible section component for grouping models by pool type (fast, balanced, smart)
• Expandable/collapsible UI with keyboard accessibility (Enter/Space keys) and ARIA labels
• Renders pool badge and title with visual indicator (▼/▶) for expanded/collapsed state

client/src/components/pool-section.tsx

37. .roo/specs/owl-alpha-longcat-model-routing/design.md Formatting +49/-49

Owl Alpha LongCat Model Routing Design Formatting

• Fixed markdown code block formatting from bare backticks to text language specification
• Corrected ASCII diagram indentation and alignment for better readability
• Maintained all architectural content describing smart preference flow, sticky cooldown, and error
 handling

.roo/specs/owl-alpha-longcat-model-routing/design.md

38. client/src/components/pool-badge.tsx ✨ Enhancement +16/-0

Pool Badge Component with Color Coding

• New badge component for displaying pool type (fast, balanced, smart) with color-coded styling
• Exports PoolType type definition for use across components
• Provides dark mode support with appropriate color schemes for each pool type

client/src/components/pool-badge.tsx

39. .roo/specs/pr13-code-review-fixes/tasks.md Tasks +14/-0

PR #13 Code Review Fixes Task Checklist

• Task checklist for fixing 10 identified bugs with completion status indicators
• Tasks organized by bug ID (BUG-01 through BUG-10) with specific file locations and fix
 descriptions
• Tracks completion status with checkboxes for SQL fix, wrapped error propagation, NaN guard,
 hardcoded refs, stall abort, cooldown guard, debug script removal, spec completion, test SQL, and
 semicolon removal

.roo/specs/pr13-code-review-fixes/tasks.md

40. .roo/specs/generalized-thread-protection/tasks.md Tasks +12/-0

Generalized Thread Protection Implementation Tasks

• Implementation tasks for renaming LONGCAT_STICKY_COOLDOWN_MS to THREAD_COOLDOWN_MS with
 reference updates
• Tasks for removing hardcoded LongCat and Owl Alpha cooldown blocks and inserting generalized
 thread protection scanner
• Execution order verification for skipModels pipeline and test file creation
• Manual smoke test task for concurrent request thread protection validation

.roo/specs/generalized-thread-protection/tasks.md

41. .roo/specs/owl-alpha-longcat-model-routing/requirements.md 📝 Documentation +5/-5

Owl Alpha LongCat Model Routing Requirements Path Fixes
• Fixed relative file path references from ../ to ../../../ to correctly point to server source
 files
• Updated all four dependency links to use correct path depth for router.ts, proxy.ts, and
 db/index.ts
.roo/specs/owl-alpha-longcat-model-routing/requirements.md

42. AGENTS.md Additional files +0/-0

...

AGENTS.md

qodo-code-review · 2026-06-05T17:22:02Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

1. Timer throw crashes process 🐞 Bug ☼ Reliability

Description

In server/src/routes/proxy.ts, the pre-stream stall path throws from inside the setInterval
keepalive callback, which is outside the request handler’s try/catch and can surface as an uncaught
exception (crashing the server) instead of returning a 504 or retrying fallback.

Code

server/src/routes/proxy.ts[R1343-1375]

+          const keepaliveTimer = setInterval(() => {
+            if (stalled) {
+              clearInterval(keepaliveTimer);
+              return;
+            }
+            const elapsed = Date.now() - lastChunkTime;
+            if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
+              stalled = true;
+              cleanup();
+              if (streamStarted) {
+                const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } };
+                try {
+                  if (responseStreamContext) {
+                    writeResponseStreamEvent(res, {
+                      type: 'response.failed',
+                      response: {
+                        id: responseStreamContext.responseId,
+                        status: 'failed',
+                        error: payload.error,
+                      },
+                    });
+                  } else {
+                    res.write(`data: ${JSON.stringify(payload)}\n\n`);
+                    res.write('data: [DONE]\n\n');
+                  }
+                  res.end();
+                } catch { /* socket gone */ }
+              } else {
+                // Pre-stream stall: throw so the outer catch can retry fallback models
+                throw Object.assign(
+                  new Error(`Stream timed out: no data received from provider ${route.displayName}`),
+                  { status: 504 }
+                );

Evidence

The code explicitly throws inside setInterval() when !streamStarted, but the only surrounding
try/finally is around the async generator iteration; the timer callback runs on a separate call
stack and won’t be caught there. The newly added test expects a clean 504 response for this
scenario, which this implementation cannot reliably produce (it will instead surface as an uncaught
exception).

server/src/routes/proxy.ts[1343-1416]
server/src/tests/routes/stream-heartbeat-stall.test.ts[178-210]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`handleChatCompletion()` throws an error from inside the `setInterval()` keepalive callback when a stream stalls before the first chunk. That throw is not catchable by the surrounding request/stream try/catch and can crash the Node process.

### Issue Context
The intention (also asserted by tests) is to return HTTP 504 on pre-stream stall (no SSE headers sent yet) or to allow the outer retry loop to fall back.

### Fix Focus Areas
- server/src/routes/proxy.ts[1343-1416]

### Implementation notes
- Do **not** `throw` inside the timer callback.
- Instead, set a `stallError` variable (or resolve/reject a Promise) and stop the generator (`cleanup()`), then after the `for await` loop ends, `throw stallError` from the main async function flow so the existing outer `catch`/retry logic can handle it.
- Ensure the pre-stream stall path results in `status=504` and `error.type='stream_timeout'` as the test expects.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ActiveRequests stale entries 🐞 Bug ☼ Reliability

Description

activeRequests stores per-request objects, but cleanup deletes only the first matching entry and
breaks; concurrent requests for the same session/platform/model can leave stale entries that keep
provider-ban platforms excluded until the 10-minute TTL cleanup runs.

Code

server/src/routes/proxy.ts[R1629-1636]

+        } finally {
+          // Ensure the session is deregistered immediately on end/abort/fail
+          if (sessionKey) {
+            for (const active of activeRequests) {
+              if (active.sessionKey === sessionKey && active.platform === route.platform && active.modelId === route.modelId) {
+                activeRequests.delete(active);
+                break;
+              }

Evidence

activeRequests is a Set of object literals and is explicitly described as supporting concurrent
requests; however, the cleanup logic deletes only one matching object (with break), so additional
concurrent entries remain and can incorrectly influence routing safeguards.

server/src/routes/proxy.ts[22-25]
server/src/routes/proxy.ts[1321-1327]
server/src/routes/proxy.ts[1629-1636]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`activeRequests` is a `Set` of newly-created objects. On cleanup, the code searches the set and deletes only the first matching entry, which can leak additional entries if multiple concurrent requests exist with the same `sessionKey/platform/modelId`.

### Issue Context
The code comment says the Set is used to “allow concurrent requests from the same session”. If concurrency is allowed, cleanup must remove the specific entry added by that request (or decrement a reference count).

### Fix Focus Areas
- server/src/routes/proxy.ts[1320-1328]
- server/src/routes/proxy.ts[1629-1638]
- server/src/routes/proxy.ts[1643-1651]

### Implementation notes
Prefer one of:
1) Store the created object in a local `const active = { ... }` and remove it via `activeRequests.delete(active)` in `finally` (no iteration, no ambiguity).
2) Replace the Set with a `Map<string, number>` keyed by `sessionKey|platform|modelId` and increment/decrement counts.
3) If keeping the current structure, delete *all* matches (remove the `break`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

mergeguards · 2026-06-05T17:24:40Z

MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

In the new stream stall handler, the pre-stream timeout path throws an error directly from inside the setInterval callback, which won’t be caught by the outer try/catch and can surface as an unhandled exception; consider signaling the outer flow (e.g., via a flag or abort controller) instead of throwing inside the timer so the retry logic can handle the timeout deterministically.
The transient model cooldown handling in the non-stream error path has overlapping branches (ban-eligible 5xx vs isTransientCooldownEligible) that both register cooldowns on non-retryable errors; it would be clearer and less error-prone to consolidate this into a single decision path so it’s obvious exactly when a model enters cooldown and you avoid duplicated set/log behavior.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In the new stream stall handler, the pre-stream timeout path throws an error directly from inside the `setInterval` callback, which won’t be caught by the outer `try/catch` and can surface as an unhandled exception; consider signaling the outer flow (e.g., via a flag or abort controller) instead of throwing inside the timer so the retry logic can handle the timeout deterministically.
- The transient model cooldown handling in the non-stream error path has overlapping branches (ban-eligible 5xx vs `isTransientCooldownEligible`) that both register cooldowns on non-retryable errors; it would be clearer and less error-prone to consolidate this into a single decision path so it’s obvious exactly when a model enters cooldown and you avoid duplicated `set`/log behavior.

## Individual Comments

### Comment 1
<location path="server/src/routes/proxy.ts" line_range="1338-1343" />
<code_context>
+          let lastChunkTime = Date.now();
+          let stalled = false;
+
+          const cleanup = () => {
+            clearInterval(keepaliveTimer);
+            try { gen.return(undefined); } catch { /* already closed */ }
+          };
+
+          const keepaliveTimer = setInterval(() => {
+            if (stalled) {
+              clearInterval(keepaliveTimer);
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid referencing keepaliveTimer before initialization and throwing inside the interval callback

Two issues to address:

1) `cleanup` closes over `keepaliveTimer` before it’s initialized with `const`. If `cleanup` runs early (e.g. `req.on('close')` fires immediately), `clearInterval(keepaliveTimer)` will hit the temporal dead zone and throw a `ReferenceError`. Declare `let keepaliveTimer: NodeJS.Timeout | undefined` before `cleanup`, assign it after, and guard `clearInterval` with `if (keepaliveTimer)`.

2) Throwing from inside the `setInterval` callback won’t be caught by the outer `try/catch` around the streaming loop and can crash the process. Instead, set a flag and have the main loop handle the error, or call a rejection/abort handler from the timer callback rather than throwing directly.
</issue_to_address>

### Comment 2
<location path="server/src/providers/openai-compat.ts" line_range="115-116" />
<code_context>

     const decoder = new TextDecoder();
     let buffer = '';
+    let hasYielded = false;

     while (true) {
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Wrapped-error detection is inconsistent between providers and depends on hasYielded flag

For OpenAI-compatible streaming you only call `isWrappedError(parsed)` before the first yielded chunk (`!hasYielded`), while Cloudflare/Cohere/Google check every chunk. This means wrapped error payloads later in the stream won’t be detected. Consider either checking `isWrappedError` on every chunk for consistency, or explicitly documenting that only first-chunk errors are treated as wrapped and confirming upstream behavior matches that assumption.

Suggested implementation:

```typescript
        let parsed: ChatCompletionChunk;
        try {
          parsed = JSON.parse(data) as ChatCompletionChunk;
        } catch {
          // Skip malformed chunks
        }

        // Detect wrapped errors consistently on every chunk
        if (this.isWrappedError(parsed)) {
          this.throwWrappedError(parsed);
        }

```

I assumed that the existing code only called `this.isWrappedError(parsed)` conditionally, something like `if (!hasYielded && this.isWrappedError(parsed))`. If that condition still exists elsewhere in the file, you should remove the `!hasYielded &&` part so that `isWrappedError` is checked unconditionally:

- Replace `if (!hasYielded && this.isWrappedError(parsed)) {` with `if (this.isWrappedError(parsed)) {`.

If `hasYielded` is no longer used for anything else after this change, you should also remove the `let hasYielded = false;` declaration to avoid an unused variable.
</issue_to_address>

### Comment 3
<location path="server/src/__tests__/routes/stream-heartbeat-stall.test.ts" line_range="34-43" />
<code_context>
+describe('SSE stream heartbeat and stall protection', () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Time-based stream heartbeat tests are at risk of flakiness; consider using fake timers.

These tests cover key behavior but depend on real `setTimeout` delays and tweaked global config, which can be flaky in CI and slow.

Consider:
- Using `vi.useFakeTimers()` / `vi.setSystemTime()` and advancing timers instead of real waits.
- Driving heartbeat and stall detection via `vi.advanceTimersByTime()` so you can assert exact timeout/error points.
- Keeping at most one end-to-end test with real timing, and moving other timing-sensitive checks to a fully fake-timer setup.

That should keep coverage of the heartbeat/stall logic while making the suite faster and more reliable.
</issue_to_address>

### Comment 4
<location path="server/src/__tests__/routes/transient-cooldown.test.ts" line_range="316-325" />
<code_context>
+  describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Cooldown registration tests re-encode implementation conditions rather than exercising the actual routing paths.

These tests largely restate the implementation’s boolean conditions instead of exercising the real code that mutates `transientModelCooldowns`, making them fragile and tightly coupled to current logic.

Prefer tests that:
- Invoke the actual error-handling path (or a thin wrapper) with simulated statuses/errors.
- Assert on `transientModelCooldowns` contents for cases like 5xx, 429, 401, 404, and undefined status.

That way the tests validate observable behavior and remain stable even if the internal conditions or eligible statuses change.

Suggested implementation:

```typescript
  // ---------- Test Suite 5: Cooldown registration Error Classification ----------
  describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
    beforeEach(() => {
      // Ensure we start from a clean cooldown state for each test
      transientModelCooldowns.clear();
    });

    function invokeCooldownErrorPath(options: {
      status?: number;
      error?: Error;
      modelId?: string;
    }) {
      /**
       * Thin wrapper around the real error-handling / routing code that is
       * responsible for registering cooldowns for transient models.
       *
       * This MUST call into the same path the router uses when a transient
       * model request fails (e.g. something like `handleTransientModelError`),
       * so that these tests exercise observable behavior rather than
       * re-encoding implementation details.
       */
      return handleTransientModelErrorForTest(options);
    }

    it('registers a cooldown for 5xx upstream errors (500-504)', () => {
      const eligibleStatuses = [500, 502, 503, 504];

      for (const status of eligibleStatuses) {
        transientModelCooldowns.clear();

        invokeCooldownErrorPath({
          status,
          error: new Error(`upstream ${status}`),
          modelId: 'test-model',
        });

        // Assert based on observable cooldown state rather than status checks
        expect(transientModelCooldowns.has('test-model')).toBe(true);
      }
    });

    it('does not register a cooldown for 429 rate limit errors', () => {
      invokeCooldownErrorPath({
        status: 429,
        error: new Error('rate limited'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('does not register a cooldown for 401 unauthorized errors', () => {
      invokeCooldownErrorPath({
        status: 401,
        error: new Error('unauthorized'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('does not register a cooldown for 404 not found errors', () => {
      invokeCooldownErrorPath({
        status: 404,
        error: new Error('not found'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(false);
    });

    it('registers a cooldown when there is a connection failure (no status)', () => {
      invokeCooldownErrorPath({
        status: undefined,
        error: new Error('ECONNRESET'),
        modelId: 'test-model',
      });

      expect(transientModelCooldowns.has('test-model')).toBe(true);
    });

```

1. Implement `handleTransientModelErrorForTest` in this test file (or import it) so that it calls the **same** error-handling path the router uses to register cooldowns for transient models. For example, it might delegate to something like `handleTransientModelError({ status, error, modelId })` exported from the route module.
2. Ensure `transientModelCooldowns` is imported/accessible in this test file and supports `.clear()` and `.has(modelId)` (e.g. a `Map` or similar). If the underlying structure differs (e.g. a `Map` keyed by provider+model, or a plain object), adjust the assertions to check the appropriate key and API.
3. Remove or update any remaining tests inside this `describe` block that still restate implementation conditions (e.g. any leftover `it('429 rate limit is NOT eligible...` that only checks booleans) so that all tests in this suite go through `invokeCooldownErrorPath`.
4. If your production code uses a different identifier than `'test-model'` (e.g. includes provider or route info), update the `modelId` and corresponding `has(...)` checks to match the real key shape used in `transientModelCooldowns`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-05T17:25:12Z

+          const cleanup = () => {
+            clearInterval(keepaliveTimer);
+            try { gen.return(undefined); } catch { /* already closed */ }
+          };
+
+          const keepaliveTimer = setInterval(() => {


issue (bug_risk): Avoid referencing keepaliveTimer before initialization and throwing inside the interval callback

Two issues to address:

cleanup closes over keepaliveTimer before it’s initialized with const. If cleanup runs early (e.g. req.on('close') fires immediately), clearInterval(keepaliveTimer) will hit the temporal dead zone and throw a ReferenceError. Declare let keepaliveTimer: NodeJS.Timeout | undefined before cleanup, assign it after, and guard clearInterval with if (keepaliveTimer).

Throwing from inside the setInterval callback won’t be caught by the outer try/catch around the streaming loop and can crash the process. Instead, set a flag and have the main loop handle the error, or call a rejection/abort handler from the timer callback rather than throwing directly.

sourcery-ai · 2026-06-05T17:25:12Z

    let buffer = '';
+    let hasYielded = false;


suggestion (bug_risk): Wrapped-error detection is inconsistent between providers and depends on hasYielded flag

For OpenAI-compatible streaming you only call isWrappedError(parsed) before the first yielded chunk (!hasYielded), while Cloudflare/Cohere/Google check every chunk. This means wrapped error payloads later in the stream won’t be detected. Consider either checking isWrappedError on every chunk for consistency, or explicitly documenting that only first-chunk errors are treated as wrapped and confirming upstream behavior matches that assumption.

Suggested implementation:

let parsed: ChatCompletionChunk; try { parsed = JSON.parse(data) as ChatCompletionChunk; } catch { // Skip malformed chunks } // Detect wrapped errors consistently on every chunk if (this.isWrappedError(parsed)) { this.throwWrappedError(parsed); }

I assumed that the existing code only called this.isWrappedError(parsed) conditionally, something like if (!hasYielded && this.isWrappedError(parsed)). If that condition still exists elsewhere in the file, you should remove the !hasYielded && part so that isWrappedError is checked unconditionally:

Replace if (!hasYielded && this.isWrappedError(parsed)) { with if (this.isWrappedError(parsed)) {.

If hasYielded is no longer used for anything else after this change, you should also remove the let hasYielded = false; declaration to avoid an unused variable.

sourcery-ai · 2026-06-05T17:25:12Z

+describe('SSE stream heartbeat and stall protection', () => {
+  let app: Express;
+  let origKeepaliveInterval: number;
+  let origMaxStall: number;
+
+  beforeAll(() => {
+    process.env.ENCRYPTION_KEY = '0'.repeat(64);
+    process.env.ADMIN_DASHBOARD_KEY = 'test-admin-key-that-is-long-enough';
+    process.env.NODE_ENV = 'test';
+    initDb(':memory:');


suggestion (testing): Time-based stream heartbeat tests are at risk of flakiness; consider using fake timers.

These tests cover key behavior but depend on real setTimeout delays and tweaked global config, which can be flaky in CI and slow.

Consider:

Using vi.useFakeTimers() / vi.setSystemTime() and advancing timers instead of real waits.

Driving heartbeat and stall detection via vi.advanceTimersByTime() so you can assert exact timeout/error points.

Keeping at most one end-to-end test with real timing, and moving other timing-sensitive checks to a fully fake-timer setup.

That should keep coverage of the heartbeat/stall logic while making the suite faster and more reliable.

sourcery-ai · 2026-06-05T17:25:12Z

+  describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
+    it('5xx status codes (500-504) are eligible for cooldown registration', () => {
+      // Simulate the condition: (errStatus >= 500 && errStatus < 600)
+      const eligibleStatuses = [500, 502, 503, 504];
+      for (const status of eligibleStatuses) {
+        const condition = status !== undefined && status >= 500 && status < 600;
+        expect(condition).toBe(true);
+      }
+    });
+


suggestion (testing): Cooldown registration tests re-encode implementation conditions rather than exercising the actual routing paths.

These tests largely restate the implementation’s boolean conditions instead of exercising the real code that mutates transientModelCooldowns, making them fragile and tightly coupled to current logic.

Prefer tests that:

Invoke the actual error-handling path (or a thin wrapper) with simulated statuses/errors.

Assert on transientModelCooldowns contents for cases like 5xx, 429, 401, 404, and undefined status.

That way the tests validate observable behavior and remain stable even if the internal conditions or eligible statuses change.

Suggested implementation:

// ---------- Test Suite 5: Cooldown registration Error Classification ---------- describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => { beforeEach(() => { // Ensure we start from a clean cooldown state for each test transientModelCooldowns.clear(); }); function invokeCooldownErrorPath(options: { status?: number; error?: Error; modelId?: string; }) { /** * Thin wrapper around the real error-handling / routing code that is * responsible for registering cooldowns for transient models. * * This MUST call into the same path the router uses when a transient * model request fails (e.g. something like `handleTransientModelError`), * so that these tests exercise observable behavior rather than * re-encoding implementation details. */ return handleTransientModelErrorForTest(options); } it('registers a cooldown for 5xx upstream errors (500-504)', () => { const eligibleStatuses = [500, 502, 503, 504]; for (const status of eligibleStatuses) { transientModelCooldowns.clear(); invokeCooldownErrorPath({ status, error: new Error(`upstream ${status}`), modelId: 'test-model', }); // Assert based on observable cooldown state rather than status checks expect(transientModelCooldowns.has('test-model')).toBe(true); } }); it('does not register a cooldown for 429 rate limit errors', () => { invokeCooldownErrorPath({ status: 429, error: new Error('rate limited'), modelId: 'test-model', }); expect(transientModelCooldowns.has('test-model')).toBe(false); }); it('does not register a cooldown for 401 unauthorized errors', () => { invokeCooldownErrorPath({ status: 401, error: new Error('unauthorized'), modelId: 'test-model', }); expect(transientModelCooldowns.has('test-model')).toBe(false); }); it('does not register a cooldown for 404 not found errors', () => { invokeCooldownErrorPath({ status: 404, error: new Error('not found'), modelId: 'test-model', }); expect(transientModelCooldowns.has('test-model')).toBe(false); }); it('registers a cooldown when there is a connection failure (no status)', () => { invokeCooldownErrorPath({ status: undefined, error: new Error('ECONNRESET'), modelId: 'test-model', }); expect(transientModelCooldowns.has('test-model')).toBe(true); });

Implement handleTransientModelErrorForTest in this test file (or import it) so that it calls the same error-handling path the router uses to register cooldowns for transient models. For example, it might delegate to something like handleTransientModelError({ status, error, modelId }) exported from the route module.

Ensure transientModelCooldowns is imported/accessible in this test file and supports .clear() and .has(modelId) (e.g. a Map or similar). If the underlying structure differs (e.g. a Map keyed by provider+model, or a plain object), adjust the assertions to check the appropriate key and API.

Remove or update any remaining tests inside this describe block that still restate implementation conditions (e.g. any leftover it('429 rate limit is NOT eligible... that only checks booleans) so that all tests in this suite go through invokeCooldownErrorPath.

If your production code uses a different identifier than 'test-model' (e.g. includes provider or route info), update the modelId and corresponding has(...) checks to match the real key shape used in transientModelCooldowns.

qodo-code-review · 2026-06-05T17:29:30Z

+          const keepaliveTimer = setInterval(() => {
+            if (stalled) {
+              clearInterval(keepaliveTimer);
+              return;
+            }
+            const elapsed = Date.now() - lastChunkTime;
+            if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
+              stalled = true;
+              cleanup();
+              if (streamStarted) {
+                const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } };
+                try {
+                  if (responseStreamContext) {
+                    writeResponseStreamEvent(res, {
+                      type: 'response.failed',
+                      response: {
+                        id: responseStreamContext.responseId,
+                        status: 'failed',
+                        error: payload.error,
+                      },
+                    });
+                  } else {
+                    res.write(`data: ${JSON.stringify(payload)}\n\n`);
+                    res.write('data: [DONE]\n\n');
+                  }
+                  res.end();
+                } catch { /* socket gone */ }
+              } else {
+                // Pre-stream stall: throw so the outer catch can retry fallback models
+                throw Object.assign(
+                  new Error(`Stream timed out: no data received from provider ${route.displayName}`),
+                  { status: 504 }
+                );


1. Timer throw crashes process 🐞 Bug ☼ Reliability

In server/src/routes/proxy.ts, the pre-stream stall path throws from inside the setInterval keepalive callback, which is outside the request handler’s try/catch and can surface as an uncaught exception (crashing the server) instead of returning a 504 or retrying fallback.

Agent Prompt

### Issue description `handleChatCompletion()` throws an error from inside the `setInterval()` keepalive callback when a stream stalls before the first chunk. That throw is not catchable by the surrounding request/stream try/catch and can crash the Node process. ### Issue Context The intention (also asserted by tests) is to return HTTP 504 on pre-stream stall (no SSE headers sent yet) or to allow the outer retry loop to fall back. ### Fix Focus Areas - server/src/routes/proxy.ts[1343-1416] ### Implementation notes - Do **not** `throw` inside the timer callback. - Instead, set a `stallError` variable (or resolve/reject a Promise) and stop the generator (`cleanup()`), then after the `for await` loop ends, `throw stallError` from the main async function flow so the existing outer `catch`/retry logic can handle it. - Ensure the pre-stream stall path results in `status=504` and `error.type='stream_timeout'` as the test expects.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-05T17:29:30Z

+        } finally {
+          // Ensure the session is deregistered immediately on end/abort/fail
+          if (sessionKey) {
+            for (const active of activeRequests) {
+              if (active.sessionKey === sessionKey && active.platform === route.platform && active.modelId === route.modelId) {
+                activeRequests.delete(active);
+                break;
+              }


2. Activerequests stale entries 🐞 Bug ☼ Reliability

activeRequests stores per-request objects, but cleanup deletes only the first matching entry and breaks; concurrent requests for the same session/platform/model can leave stale entries that keep provider-ban platforms excluded until the 10-minute TTL cleanup runs.

Agent Prompt

### Issue description `activeRequests` is a `Set` of newly-created objects. On cleanup, the code searches the set and deletes only the first matching entry, which can leak additional entries if multiple concurrent requests exist with the same `sessionKey/platform/modelId`. ### Issue Context The code comment says the Set is used to “allow concurrent requests from the same session”. If concurrency is allowed, cleanup must remove the specific entry added by that request (or decrement a reference count). ### Fix Focus Areas - server/src/routes/proxy.ts[1320-1328] - server/src/routes/proxy.ts[1629-1638] - server/src/routes/proxy.ts[1643-1651] ### Implementation notes Prefer one of: 1) Store the created object in a local `const active = { ... }` and remove it via `activeRequests.delete(active)` in `finally` (no iteration, no ambiguity). 2) Replace the Set with a `Map<string, number>` keyed by `sessionKey|platform|modelId` and increment/decrement counts. 3) If keeping the current structure, delete *all* matches (remove the `break`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…ts concurrency, wrapped-error consistency, heartbeat fake timers, cooldown test accuracy

vi70x3 added 13 commits June 5, 2026 14:26

feat(providers): intercept wrapped error payloads on HTTP 200 responses

97035ea

feat(proxy): replace hardcoded LongCat/Owl Alpha cooldowns with gener…

2022f94

…alized thread protection scanner

chore: temporary commit before switching branch

332e93c

feat(thread-protection): implement rules engine and replace hardcoded…

1d75b00

… longcat branches

refactor(proxy): replace flat cooldown with real-time active request …

2bda726

…tracking for LongCat and Owl Alpha

chore: standardize on pnpm as package manager

4829f97

- Remove package-lock.json (npm lockfile) - Add packageManager field to package.json - Create .npmrc with pnpm configuration

chore: rename CLAUE.md to AGENTS.md

62961e6

chore: cleanup repo

5883f78

fix: address PR review comments — stream abort guard, SSE error scope…

57c6d6e

…, TTL refresh, collapsible pools, doc paths, cleanup

fix: address PR #16 review — sticky key check, SSE headers, keepalive…

b5d26eb

… pre-stream, cooldown gating, timer cleanup, a11y, log clarity

fix: address PR #16 review — sticky key check, SSE headers, keepalive…

dbda0e3

… pre-stream, cooldown gating, timer cleanup, a11y, log clarity, test fixes

ci: switch from npm to pnpm in GitHub Actions workflow

549942e

sourcery-ai Bot reviewed Jun 5, 2026

View reviewed changes

qodo-code-review Bot reviewed Jun 5, 2026

View reviewed changes

vi70x3 closed this Jun 5, 2026

vi70x4 pushed a commit that referenced this pull request Jun 5, 2026

fix: address PR #18 review — pre-stream stall gen.throw, activeReques…

ed5d381

…ts concurrency, wrapped-error consistency, heartbeat fake timers, cooldown test accuracy

vi70x4 mentioned this pull request Jun 5, 2026

fix realtime sticky sessions protection x2 #19

Merged

Conversation

vi70x3 commented Jun 5, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

mergeguards Bot commented Jun 5, 2026

Uh oh!

sourcery-ai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for SSE streaming with stall protection and thread protection

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

qodo-code-review Bot commented Jun 5, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

mergeguards Bot commented Jun 5, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vi70x3 commented Jun 5, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 5, 2026 •

edited

Loading

qodo-code-review Bot commented Jun 5, 2026 •

edited

Loading