fix realtime sticky sessions protection x2#19
Conversation
…alized thread protection scanner
… longcat branches
…tracking for LongCat and Owl Alpha
- Change activeRequests from Map to Set to allow concurrent requests from same session - Add stale active request cleanup with 10-minute TTL - Cache owl-alpha model ID to avoid repeated DB lookups - Fix active request iteration to use Set-compatible syntax
- Remove package-lock.json (npm lockfile) - Add packageManager field to package.json - Create .npmrc with pnpm configuration
BUG-05: Abort upstream provider stream on stall detection by breaking the for-await loop and calling gen.return() when the keepalive timer detects MAX_STREAM_STALL_MS has elapsed without data. BUG-06: Fix cooldown guard to use the actual routable fallback chain (fallback_config JOIN models) instead of all enabled models, ensuring transient cooldowns only skip models that would actually be routed to. BUG-10: Remove double semicolon in proxy.ts. Also adds SSE keep-alive comments during idle periods, transient model cooldown injection before retry loops, and LongCat sticky session cooldown support in balanced routing mode.
…, TTL refresh, collapsible pools, doc paths, cleanup
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity, test fixes
…ts concurrency, wrapped-error consistency, heartbeat fake timers, cooldown test accuracy
|
MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories. |
Reviewer's GuideRefactors proxy sticky-session and thread protection logic into a generalized rules engine, introduces shared transient model cooldowns and active-request tracking for provider-ban platforms, adds SSE stream heartbeat/stall protection, and surfaces model pools in the fallback UI while standardizing pnpm usage in CI. Sequence diagram for proxy thread protection, active requests, and transient cooldownssequenceDiagram
actor Client
participant Proxy as proxy.handleChatCompletion
participant Router as routeRequest
participant Provider as route.provider
participant ThreadProtection as evaluateThreadProtection
participant ActiveRequests as activeRequests
participant Cooldowns as transientModelCooldowns
Client->>Proxy: POST /proxy (chat request)
Proxy->>Router: routeRequest(messages, routingMode, skipModels, skipKeys)
Router-->>Proxy: route
alt stream=true
Proxy->>ActiveRequests: add({sessionKey, platform, modelId, startTime})
Proxy->>Provider: streamChatCompletion(apiKey, messages, modelId, options)
loop streaming chunks
Provider-->>Proxy: ChatCompletionChunk
Proxy->>Proxy: writeResponseStreamStart / writeResponseStreamChunk
end
Proxy->>ActiveRequests: delete({sessionKey, platform, modelId})
else stream stalled (no chunks)
Proxy->>Proxy: streamKeepaliveConfig / keepaliveTimer
Proxy->>Proxy: [MAX_STREAM_STALL_MS exceeded]
Proxy->>Proxy: writeResponseStreamEvent OR res.write error
Proxy->>ActiveRequests: delete({sessionKey, platform, modelId})
end
alt provider returns 5xx / truncation / retryable error
Proxy->>ThreadProtection: evaluateThreadProtection({platform, kind, midStream, modelDbId, error})
ThreadProtection-->>Proxy: ThreadProtectionAction
alt action.banProvider
Proxy->>Proxy: banPlatformFromSession(messages, routingMode, platform, modelDbId)
Proxy->>Proxy: addProviderModelsToSkipModels(skipModels, platform)
end
alt action.skipModel
Proxy->>Proxy: skipModels.add(modelDbId)
end
alt !isRetryableError(err) and isTransientCooldownEligible
Proxy->>Cooldowns: set(modelDbId, now + TRANSIENT_COOLDOWN_MS)
end
end
Note over Proxy,Cooldowns: On next attempts, proxy injects transientModelCooldowns into skipModels before calling routeRequest again
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
|
Overall Grade |
Security Reliability Complexity Hygiene |
Code Review Summary
| Analyzer | Status | Updated (UTC) | Details |
|---|---|---|---|
| JavaScript | Jun 5, 2026 7:37p.m. | Review ↗ |
Important
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
Review Summary by QodoGeneralized thread protection with real-time tracking, stream heartbeat, transient cooldowns, and wrapped error interception
WalkthroughsDescription**Core Features:** • Implemented generalized thread protection rules engine (threadProtection.ts) replacing hardcoded LongCat/Owl Alpha logic with configurable protection levels (provider-ban, model-skip, off) • Added real-time active request tracking via activeRequests Set to prevent concurrent sessions from overwhelming provider-ban platforms • Introduced SSE stream heartbeat (15s interval) and stall detection (60s timeout) with graceful error recovery and client-disconnect cleanup • Implemented transient model cooldowns (15s) for non-retryable errors, automatically injected into skipModels to temporarily exclude problematic models during outages **Provider Enhancements:** • Added wrapped error detection and handling across all providers (OpenAI, Cohere, Cloudflare, Google) to intercept error payloads returned with HTTP 200 status • Made extractErrorMessage() protected in BaseProvider for subclass access • Fixed malformed chunk handling to continue instead of silently skipping **Analytics & Routing:** • Implemented recency-weighted analytics using 7-day decay formula for more responsive model selection • Updated balanced mode to support real session hashing instead of empty strings • Added model pool classification (Fast, Balanced, Smart) to fallback API and UI **Testing & Documentation:** • Added comprehensive test suites for transient cooldowns (539 lines), stream heartbeat/stall detection (347 lines), and balanced mode behavior • Created design specifications for all major features with architecture diagrams and implementation guidance • Fixed test expectations for balanced mode session key behavior and mock fetch handling **Infrastructure:** • Migrated from npm to pnpm as package manager across CI/CD, configuration, and workspace management • Updated GitHub Actions workflow to use pnpm with frozen lockfile for reproducible builds • Added .npmrc configuration with pnpm-specific settings **Documentation Fixes:** • Corrected relative file path references in specification documents • Added PR #13 code review bug verification and fix tracking Diagramflowchart LR
A["HTTP 200<br/>Wrapped Errors"] -->|"isWrappedError()"| B["BaseProvider<br/>Detection"]
B -->|"throwWrappedError()"| C["ProviderApiError"]
D["Concurrent<br/>Requests"] -->|"Active Request<br/>Tracking"| E["ThreadProtection<br/>Rules Engine"]
E -->|"Provider-ban<br/>Model-skip<br/>Off"| F["Protection<br/>Decision"]
G["Delayed<br/>Chunks"] -->|"15s Heartbeat<br/>60s Stall"| H["SSE Stream<br/>Protection"]
H -->|"Keep-alive<br/>Comments"| I["Graceful<br/>Recovery"]
J["5xx/Connection<br/>Errors"] -->|"15s Cooldown"| K["Transient<br/>Cooldowns"]
K -->|"skipModels<br/>Injection"| L["Model<br/>Exclusion"]
M["Historical<br/>Stats"] -->|"7-day Decay<br/>Weighting"| N["Recency-biased<br/>Analytics"]
N -->|"Responsive<br/>Selection"| O["Router<br/>Optimization"]
File Changes1. server/src/routes/proxy.ts
|
Code Review by Qodo
1. Premature keepalive writes
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In the SSE keepalive / stall code, the pre-stream stall path currently throws from inside the
setIntervalcallback, which won’t be caught by the surroundingtry/catchand can surface as an unhandled exception; instead, consider signalling the stall via shared state (e.g. setting a flag or storing an error) and handling the retry/504 logic in the main async function, rather than throwing directly from the timer callback. - The active-request tracking and transient cooldown logic now walks the entire
activeRequests/transientModelCooldownsmaps on every request; if you expect many concurrent sessions, you might want to index these by a composite key (e.g.${sessionKey}:${platform}:${modelId}) or cache per-platform model lists to avoid repeated O(n) scans in hot paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the SSE keepalive / stall code, the pre-stream stall path currently throws from inside the `setInterval` callback, which won’t be caught by the surrounding `try`/`catch` and can surface as an unhandled exception; instead, consider signalling the stall via shared state (e.g. setting a flag or storing an error) and handling the retry/504 logic in the main async function, rather than throwing directly from the timer callback.
- The active-request tracking and transient cooldown logic now walks the entire `activeRequests`/`transientModelCooldowns` maps on every request; if you expect many concurrent sessions, you might want to index these by a composite key (e.g. `${sessionKey}:${platform}:${modelId}`) or cache per-platform model lists to avoid repeated O(n) scans in hot paths.
## Individual Comments
### Comment 1
<location path="client/src/components/pool-section.tsx" line_range="16" />
<code_context>
+}) {
+ const [isExpanded, setIsExpanded] = useState(true);
+
+ const handleKeyDown = (e: React.KeyboardEvent) => {
+ if (e.key === 'Enter' || e.key === ' ') {
+ e.preventDefault();
</code_context>
<issue_to_address>
**issue:** Using React.KeyboardEvent requires importing the React type, which currently isn't in scope.
In this file only `useState` and `type ReactNode` are imported, so `React` isn’t defined as a namespace. To fix this, either import the event type directly:
```ts
import { useState, type ReactNode, type KeyboardEvent } from 'react';
const handleKeyDown = (e: KeyboardEvent) => { ... };
```
or add a React namespace type import if you prefer `React.*`:
```ts
import * as React from 'react';
```
and keep `React.KeyboardEvent`.
</issue_to_address>
### Comment 2
<location path="server/src/providers/openai-compat.ts" line_range="115-119" />
<code_context>
const decoder = new TextDecoder();
let buffer = '';
+ let hasYielded = false;
</code_context>
<issue_to_address>
**question (bug_risk):** Wrapped-error detection only runs on the first parsed SSE data line; later error payloads may be treated as partial chunks.
Because `hasYielded` is set after the first successful event, only that event is ever checked with `isWrappedError(parsed)`. Any later wrapped error will be treated as a normal chunk.
To support error payloads appearing later in the stream, consider either:
- Removing the `hasYielded` guard and relying only on `isWrappedError`, or
- Limiting the guard to some early window (e.g., until the first non-empty `choices`).
If you keep the current behavior, it would help to document that only “error as first event” is supported.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| }) { | ||
| const [isExpanded, setIsExpanded] = useState(true); | ||
|
|
||
| const handleKeyDown = (e: React.KeyboardEvent) => { |
There was a problem hiding this comment.
issue: Using React.KeyboardEvent requires importing the React type, which currently isn't in scope.
In this file only useState and type ReactNode are imported, so React isn’t defined as a namespace. To fix this, either import the event type directly:
import { useState, type ReactNode, type KeyboardEvent } from 'react';
const handleKeyDown = (e: KeyboardEvent) => { ... };or add a React namespace type import if you prefer React.*:
import * as React from 'react';and keep React.KeyboardEvent.
| let buffer = ''; | ||
| let hasYielded = false; | ||
|
|
||
| while (true) { | ||
| const { done, value } = await reader.read(); |
There was a problem hiding this comment.
question (bug_risk): Wrapped-error detection only runs on the first parsed SSE data line; later error payloads may be treated as partial chunks.
Because hasYielded is set after the first successful event, only that event is ever checked with isWrappedError(parsed). Any later wrapped error will be treated as a normal chunk.
To support error payloads appearing later in the stream, consider either:
- Removing the
hasYieldedguard and relying only onisWrappedError, or - Limiting the guard to some early window (e.g., until the first non-empty
choices).
If you keep the current behavior, it would help to document that only “error as first event” is supported.
| if (!stalled && elapsed >= streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS) { | ||
| try { res.write(': keep-alive\n\n'); } catch { /* socket gone */ } | ||
| } |
There was a problem hiding this comment.
1. Premature keepalive writes 🐞 Bug ☼ Reliability
In handleChatCompletion() streaming mode, the keepalive timer writes : keep-alive to res even when streamStarted is still false, which can commit the HTTP response before SSE headers are set. This can produce malformed SSE (wrong/missing headers) and undermines the code’s stated “pre-stream errors stay retryable” behavior.
Agent Prompt
## Issue description
`proxy.ts` writes SSE keepalive comments before SSE headers are sent. This can commit the response prematurely, making later `res.setHeader(...)` calls invalid/ineffective and breaking pre-stream retry semantics.
## Issue Context
The design spec explicitly says keepalive comments should only be written after SSE headers are sent (`streamStarted === true`). The current implementation does not check `streamStarted`.
## Fix Focus Areas
- server/src/routes/proxy.ts[1335-1405]
- .roo/specs/sse-stream-heartbeat-stall-protection/design.md[110-129]
### Implementation direction
- Change the keepalive write to only occur when `streamStarted === true` (or `res.headersSent === true` after you’ve set SSE headers).
- Ensure no `res.write(...)` happens before the first time you set `Content-Type: text/event-stream`, `Cache-Control`, `Connection`, etc.
- If you still need pre-first-chunk liveness, choose one:
- set SSE headers immediately (accepting that pre-stream errors are no longer “retryable” via normal JSON responses), or
- don’t emit keepalives until after headers, relying on stall timeout to abort/retry.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary by Sourcery
Generalize and harden provider thread protection, streaming, and routing behavior while introducing transient model cooldowns and pool-aware fallback UI, along with CI migration to pnpm.
New Features:
Bug Fixes:
Enhancements:
Build:
CI:
Tests: