Feat/realtime sticky#16
Conversation
…alized thread protection scanner
… longcat branches
…tracking for LongCat and Owl Alpha
- Change activeRequests from Map to Set to allow concurrent requests from same session - Add stale active request cleanup with 10-minute TTL - Cache owl-alpha model ID to avoid repeated DB lookups - Fix active request iteration to use Set-compatible syntax
- Remove package-lock.json (npm lockfile) - Add packageManager field to package.json - Create .npmrc with pnpm configuration
BUG-05: Abort upstream provider stream on stall detection by breaking the for-await loop and calling gen.return() when the keepalive timer detects MAX_STREAM_STALL_MS has elapsed without data. BUG-06: Fix cooldown guard to use the actual routable fallback chain (fallback_config JOIN models) instead of all enabled models, ensuring transient cooldowns only skip models that would actually be routed to. BUG-10: Remove double semicolon in proxy.ts. Also adds SSE keep-alive comments during idle periods, transient model cooldown injection before retry loops, and LongCat sticky session cooldown support in balanced routing mode.
…, TTL refresh, collapsible pools, doc paths, cleanup
Reviewer's GuideImplements generalized provider/thread protection and streaming robustness: adds transient model cooldowns and active-request safeguards, introduces SSE stream heartbeat and stall detection, refines routing analytics with recency-weighted stats and balanced/smart pool separation, centralizes wrapped-error handling in providers, and updates UI and tests to reflect new routing behavior and pools. Sequence diagram for SSE streaming with heartbeat, stall detection, and active-request trackingsequenceDiagram
actor Client
participant Proxy as proxy.handleChatCompletion
participant Provider as provider.streamChatCompletion
Client->>Proxy: handleChatCompletion
Proxy->>Proxy: routeRequest
Proxy->>Proxy: activeRequests.add
Proxy->>Provider: streamChatCompletion
loop [SSE chunks]
Provider-->>Proxy: streamChatCompletion (chunk)
Proxy->>Proxy: writeResponseStreamStart
Proxy-->>Client: writeResponseStreamEvent
end
opt [keepalive timer]
Proxy-->>Client: res.write(: keep-alive)
end
alt [stream stalled before first chunk]
Proxy->>Proxy: streamAborted = true
Proxy-->>Client: res.status(504).json
Proxy->>Proxy: logRequest
else [mid-stream 5xx or truncation]
Proxy->>Proxy: getErrorStatus
Proxy->>Proxy: isBanEligibleStatus
Proxy->>Proxy: skipModels.add
Proxy->>Proxy: transientModelCooldowns.set
end
Proxy->>Proxy: activeRequests.delete
Proxy-->>Client: res.end
Flow diagram for routing modes, pools, and key capacityflowchart LR
A[routeRequest] --> B{routingMode}
B -- balanced --> C[filter chain\nEXCLUDED_FROM_BALANCED]
C --> D[filteredChain]
B -- smart --> E[compute effectiveScore]
E --> F[sorted]
F --> G{hasValidKeys\nlongcat}
G -- yes --> H[move longcat entries to front]
H --> I
G -- no --> I[keep order]
I --> J{hasValidKeys\nowl-alpha}
J -- yes --> K[move owl-alpha to front]
J -- no --> L[use existing order]
subgraph Analytics
M[refreshStatsCache]
M --> N[ModelStats\nsuccesses,total,rawTotal]
end
subgraph Pools
O[getModelPool]
O --> P[ModelPool.Balanced]
O --> Q[ModelPool.Smart]
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughThis PR introduces a comprehensive set of features for LLM routing and error handling: generalized thread protection rules, wrapped error detection across providers, model-level routing with balanced/smart modes, transient failure cooldowns, sticky session behavior separation, SSE stream keepalive and stall protection, recency-biased analytics, and frontend pool display grouping. The changes span specs, backend services, providers, routes, tests, and UI components. ChangesRouting, Error Handling, Session Management, and Streaming
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
Review Summary by QodoRealtime sticky sessions with heartbeat protection, transient cooldowns, and generalized thread protection
WalkthroughsDescription• **Realtime sticky sessions with heartbeat protection**: Implemented SSE stream keep-alive heartbeat (15s interval) and stall detection (45s timeout) to prevent hanging streams and intermediate proxy timeouts, with automatic recovery and client disconnect cleanup • **Transient model cooldowns**: Added shared in-memory circuit breaker for models experiencing 5xx or connection failures with 15-second global cooldown window visible across all concurrent requests, overriding sticky session preferences when active • **Configurable thread protection levels**: Generalized LongCat-specific provider banning logic to support environment-variable-configurable protection levels (provider-ban, model-skip, off) for different platforms, eliminating hardcoded platform checks • **Wrapped error payload detection**: Implemented detection and handling of error payloads returned with HTTP 200 status codes across all provider types (OpenAI, Cohere, Cloudflare, Google) with proper ProviderApiError propagation in both streaming and non-streaming paths • **Recency-weighted analytics**: Added 7-day linear time-decay weighting to Thompson Sampling router to prioritize recent model performance data, with NaN safety guards and backward-compatible dashboard reporting • **Model pool classification and routing**: Introduced ModelPool enum (Fast, Balanced, Smart) with balanced mode exclusions for LongCat and Owl Alpha models, only reachable via explicit request or smart mode, plus Owl Alpha smart-mode preference logic • **Disabled sticky sessions on balanced endpoint**: Single-point guard via getSessionKey() returning empty string cascades through all sticky functions as no-ops for balanced mode while keeping smart mode active • **Enhanced error handling**: Improved truncation detection across all providers, fixed session key comparison from falsy check to explicit undefined check, and added active request tracking to prevent concurrent session overload • **Comprehensive test coverage**: Added test suites for transient cooldowns, stream heartbeat/stall protection, balanced mode sticky disabling, and model pool validation; updated existing tests for new routing modes • **UI enhancements**: Added pool-based model grouping to fallback page with new PoolBadge and PoolSection components for visual pool organization • **Configuration and documentation**: Added pnpm package manager specification, comprehensive design documents for all major features, and implementation task tracking Diagramflowchart LR
A["Request arrives"] --> B["Check transient cooldowns"]
B --> C["Get session key<br/>balanced vs smart"]
C --> D{Routing mode}
D -->|Smart| E["Apply sticky preference<br/>+ Owl Alpha logic"]
D -->|Balanced| F["Free routing<br/>no sticky"]
E --> G["Route to model"]
F --> G
G --> H["Stream response"]
H --> I["Emit heartbeat<br/>every 15s"]
I --> J{Stall detected<br/>45s timeout?}
J -->|Yes| K["Send error<br/>cleanup stream"]
J -->|No| L["Continue streaming"]
K --> M["Clear sticky if pinned"]
L --> N["On 5xx error"]
N --> O["Register 15s<br/>transient cooldown"]
O --> P["Evaluate thread<br/>protection level"]
P --> Q{Protection level}
Q -->|provider-ban| R["Ban provider"]
Q -->|model-skip| S["Skip model"]
Q -->|off| T["No action"]
File Changes1. server/src/routes/proxy.ts
|
Code Review by Qodo
Context used✅ Compliance rules (platform):
6 rules 1. transientModelCooldowns before keys exhausted
|
There was a problem hiding this comment.
Code Review
This pull request introduces several major enhancements to the routing and streaming stability of the proxy, including grouping models into fast, balanced, and smart pools on the frontend, excluding LongCat and Owl Alpha from balanced auto-routing, implementing SSE stream heartbeats and stall protection, introducing transient model cooldowns for concurrent failure mitigation, and detecting wrapped error payloads on HTTP 200 responses. Feedback from the review highlights critical issues in proxy.ts: first, changing !key to key === undefined in getStickyModel can lead to session collisions on empty string keys; second, a potential resource and timer leak exists if a client disconnects during a blocked stream loop, which can be resolved with a req.on('close') cleanup listener; and third, an early return on pre-stream stalls bypasses the fallback retry loop, where throwing a 504 error instead would allow proper fallback routing.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| function getStickyModel(messages: ChatMessage[], routingMode: RoutingMode): number | undefined { | ||
| const key = getSessionKey(messages, routingMode); | ||
| if (!key) return undefined; | ||
| if (key === undefined) return undefined; |
There was a problem hiding this comment.
Changing !key to key === undefined allows empty string keys ('') to be treated as valid session keys. If getSessionKey returns '' (which happens when there is no user message or content is not a string), the code will attempt to look up '' in stickySessionMap. This can cause session collisions where multiple requests with empty keys share the same sticky session. Revert this to !key to safely handle both undefined and empty strings.
| if (key === undefined) return undefined; | |
| if (!key) return undefined; |
| let lastChunkTime = Date.now(); | ||
| let stalled = false; | ||
| const keepaliveTimer = setInterval(() => { | ||
| if (stalled) { | ||
| clearInterval(keepaliveTimer); | ||
| return; | ||
| } | ||
| const elapsed = Date.now() - lastChunkTime; | ||
| if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) { | ||
| stalled = true; | ||
| clearInterval(keepaliveTimer); | ||
| if (streamStarted) { | ||
| const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } }; | ||
| try { | ||
| if (responseStreamContext) { | ||
| writeResponseStreamEvent(res, { | ||
| type: 'response.failed', | ||
| response: { | ||
| id: responseStreamContext.responseId, | ||
| status: 'failed', | ||
| error: payload.error, | ||
| }, | ||
| }); | ||
| } else { | ||
| res.write(`data: ${JSON.stringify(payload)}\n\n`); | ||
| res.write('data: [DONE]\n\n'); | ||
| } | ||
| res.end(); | ||
| } catch { /* socket gone */ } | ||
| } | ||
| return; | ||
| } | ||
| if (streamStarted && elapsed >= streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS) { | ||
| try { res.write(': keep-alive\n\n'); } catch { /* socket gone */ } | ||
| } | ||
| }, streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS); | ||
|
|
||
| try { | ||
| for await (const chunk of gen) { | ||
| if (stalled) break; | ||
| lastChunkTime = Date.now(); | ||
| if (!streamStarted) { | ||
| ttfbMs = Date.now() - start; | ||
| res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`); | ||
| if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt)); | ||
| if (responseStreamContext) { | ||
| writeResponseStreamStart(res, responseStreamContext, route.modelId); | ||
| } | ||
| streamStarted = true; | ||
| } | ||
| const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? []; | ||
| if (deltaToolCalls.length > 0) sawToolCalls = true; | ||
| if (responseStreamContext) { | ||
| writeResponseStreamStart(res, responseStreamContext, route.modelId); | ||
| totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk); | ||
| } else { | ||
| const text = chunk.choices[0]?.delta?.content ?? ''; | ||
| if (text) streamedText += text; | ||
| totalOutputTokens += Math.ceil(text.length / 4); | ||
| res.write(`data: ${JSON.stringify(chunk)}\n\n`); | ||
| } | ||
| streamStarted = true; | ||
| } | ||
| const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? []; | ||
| if (deltaToolCalls.length > 0) sawToolCalls = true; | ||
| if (responseStreamContext) { | ||
| totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk); | ||
| } else { | ||
| const text = chunk.choices[0]?.delta?.content ?? ''; | ||
| if (text) streamedText += text; | ||
| totalOutputTokens += Math.ceil(text.length / 4); | ||
| res.write(`data: ${JSON.stringify(chunk)}\n\n`); | ||
| } finally { | ||
| clearInterval(keepaliveTimer); | ||
| if (stalled) { | ||
| try { gen.return(undefined); } catch { /* already closed */ } | ||
| } | ||
| } |
There was a problem hiding this comment.
There is a potential resource and timer leak here. If the client disconnects prematurely while the for await loop is suspended waiting for the next chunk from the upstream provider, the finally block will never be entered because the loop is blocked. This leaves the keepaliveTimer running indefinitely. Additionally, when a stall is detected, the generator is not aborted immediately.
To fix this, define a cleanup helper that clears the interval and calls gen.return(), and register it as a req.on('close') listener. This ensures all resources are immediately freed when the client disconnects or a stall occurs.
let lastChunkTime = Date.now();
let stalled = false;
const keepaliveTimer = setInterval(() => {
if (stalled) {
clearInterval(keepaliveTimer);
return;
}
const elapsed = Date.now() - lastChunkTime;
if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
stalled = true;
cleanup();
if (streamStarted) {
const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } };
try {
if (responseStreamContext) {
writeResponseStreamEvent(res, {
type: 'response.failed',
response: {
id: responseStreamContext.responseId,
status: 'failed',
error: payload.error,
},
});
} else {
res.write(`data: ${JSON.stringify(payload)}\n\n`);
res.write('data: [DONE]\n\n');
}
res.end();
} catch { /* socket gone */ }
}
return;
}
if (streamStarted && elapsed >= streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS) {
try { res.write(': keep-alive\n\n'); } catch { /* socket gone */ }
}
}, streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS);
const cleanup = () => {
clearInterval(keepaliveTimer);
try { gen.return(undefined); } catch { /* already closed */ }
};
req.on('close', cleanup);
try {
for await (const chunk of gen) {
if (stalled) break;
lastChunkTime = Date.now();
if (!streamStarted) {
ttfbMs = Date.now() - start;
res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`);
if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt));
if (responseStreamContext) {
writeResponseStreamStart(res, responseStreamContext, route.modelId);
}
streamStarted = true;
}
const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? [];
if (deltaToolCalls.length > 0) sawToolCalls = true;
if (responseStreamContext) {
totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk);
} else {
const text = chunk.choices[0]?.delta?.content ?? '';
if (text) streamedText += text;
totalOutputTokens += Math.ceil(text.length / 4);
res.write(`data: ${JSON.stringify(chunk)}
`);
}
}
} finally {
req.off('close', cleanup);
cleanup();
}| if (stalled && !streamStarted) { | ||
| // Pre-stream stall: no headers sent yet, return 504 so the retry loop can try another model | ||
| streamAborted = true; | ||
| res.status(504).json({ | ||
| error: { | ||
| message: 'Stream timed out: no data received from provider', | ||
| type: 'stream_timeout', | ||
| }, | ||
| }); | ||
| logRequest(route.platform, route.modelId, 'error', estimatedInputTokens, totalOutputTokens, Date.now() - start, ttfbMs, 'Pre-stream stall timeout'); | ||
| return; | ||
| } |
There was a problem hiding this comment.
This early return contradicts the design specification and breaks the fallback/retry mechanism. If a pre-stream stall occurs (i.e., streamStarted is false), returning directly from handleChatCompletion terminates the request immediately and sends a 504 to the client, bypassing any remaining fallback attempts in the retry loop. Throwing a 504 error instead allows the outer catch block to handle the failure, register the transient cooldown for the stalled model, and proceed to retry with the next available model in the fallback chain.
if (stalled && !streamStarted) {
// Pre-stream stall: throw 504 error so the retry loop can try another model
throw Object.assign(
new Error(`Stream timed out: no data received from provider ${route.displayName}`),
{ status: 504 }
);
}There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The log message
[Proxy] transient cooldowns active for model IDs: [...]is built from the fullskipModelsset, so it will also include session bans and other skips; consider logging only the IDs actually intransientModelCooldownsto avoid confusing cooldown diagnostics. - The new
PoolSectionUI uses plain ▼/▶ glyphs without any aria attributes; addingaria-expandedand abutton/role="button"would make these collapsible sections accessible to screen readers and keyboard users.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The log message `[Proxy] transient cooldowns active for model IDs: [...]` is built from the full `skipModels` set, so it will also include session bans and other skips; consider logging only the IDs actually in `transientModelCooldowns` to avoid confusing cooldown diagnostics.
- The new `PoolSection` UI uses plain ▼/▶ glyphs without any aria attributes; adding `aria-expanded` and a `button`/`role="button"` would make these collapsible sections accessible to screen readers and keyboard users.
## Individual Comments
### Comment 1
<location path="server/src/routes/proxy.ts" line_range="1316-1325" />
<code_context>
+ lastChunkTime = Date.now();
+ if (!streamStarted) {
+ ttfbMs = Date.now() - start;
+ res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`);
+ if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt));
+ if (responseStreamContext) {
+ writeResponseStreamStart(res, responseStreamContext, route.modelId);
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** SSE response headers (Content-Type/Connection/etc.) are no longer set for streaming responses.
The previous streaming path explicitly set `Content-Type: text/event-stream`, `Cache-Control: no-cache`, and `Connection: keep-alive` before emitting SSE data. In this new path, only `X-Routed-Via`/`X-Fallback-Attempts` are added, so SSE-specific headers may never be sent, leading some clients or intermediaries to mis-handle the stream.
Unless these headers are guaranteed to be set earlier in the lifecycle, please restore them at the point `streamStarted` becomes true, for both `responseStreamContext` and the plain SSE path.
</issue_to_address>
### Comment 2
<location path="server/src/routes/proxy.ts" line_range="1254-1267" />
<code_context>
+ // Simulate the pre-routing injection logic
+ const skipModels = new Set<number>();
+ const now = Date.now();
+ for (const [id, exp] of transientModelCooldowns) {
+ if (now > exp) {
+ transientModelCooldowns.delete(id);
</code_context>
<issue_to_address>
**suggestion:** Logging of transient cooldowns conflates all skip reasons, not just cooldowns.
This log line is built from the full `skipModels` set, which may include IDs skipped for non-cooldown reasons (session bans, truncation, etc.), so the message doesn’t actually reflect only transient cooldowns.
Consider tracking the IDs added by this block (e.g., a local `cooldownIds` set) and logging those, or emitting a separate log before merging into `skipModels` to keep the signal clear for debugging.
```suggestion
// Inject transient model cooldowns into skipModels
{
const now = Date.now();
const cooldownIds = new Set<number>();
for (const [id, exp] of transientModelCooldowns) {
if (now > exp) {
transientModelCooldowns.delete(id);
} else {
skipModels.add(id);
cooldownIds.add(id);
}
}
if (cooldownIds.size > 0) {
console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(cooldownIds).join(',')}]`);
}
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| let streamStarted = false; | ||
| let ttfbMs: number | null = null; | ||
| try { | ||
| // Register the session as active | ||
| if (sessionKey) { | ||
| activeRequests.add({ | ||
| sessionKey, | ||
| platform: route.platform, | ||
| modelId: route.modelId, | ||
| startTime: Date.now() |
There was a problem hiding this comment.
issue (bug_risk): SSE response headers (Content-Type/Connection/etc.) are no longer set for streaming responses.
The previous streaming path explicitly set Content-Type: text/event-stream, Cache-Control: no-cache, and Connection: keep-alive before emitting SSE data. In this new path, only X-Routed-Via/X-Fallback-Attempts are added, so SSE-specific headers may never be sent, leading some clients or intermediaries to mis-handle the stream.
Unless these headers are guaranteed to be set earlier in the lifecycle, please restore them at the point streamStarted becomes true, for both responseStreamContext and the plain SSE path.
| // Inject transient model cooldowns into skipModels | ||
| { | ||
| const now = Date.now(); | ||
| for (const [id, exp] of transientModelCooldowns) { | ||
| if (now > exp) { | ||
| transientModelCooldowns.delete(id); | ||
| } else { | ||
| skipModels.add(id); | ||
| } | ||
| } | ||
| if (skipModels.size > 0) { | ||
| console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(skipModels).join(',')}]`); | ||
| } | ||
| } |
There was a problem hiding this comment.
suggestion: Logging of transient cooldowns conflates all skip reasons, not just cooldowns.
This log line is built from the full skipModels set, which may include IDs skipped for non-cooldown reasons (session bans, truncation, etc.), so the message doesn’t actually reflect only transient cooldowns.
Consider tracking the IDs added by this block (e.g., a local cooldownIds set) and logging those, or emitting a separate log before merging into skipModels to keep the signal clear for debugging.
| // Inject transient model cooldowns into skipModels | |
| { | |
| const now = Date.now(); | |
| for (const [id, exp] of transientModelCooldowns) { | |
| if (now > exp) { | |
| transientModelCooldowns.delete(id); | |
| } else { | |
| skipModels.add(id); | |
| } | |
| } | |
| if (skipModels.size > 0) { | |
| console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(skipModels).join(',')}]`); | |
| } | |
| } | |
| // Inject transient model cooldowns into skipModels | |
| { | |
| const now = Date.now(); | |
| const cooldownIds = new Set<number>(); | |
| for (const [id, exp] of transientModelCooldowns) { | |
| if (now > exp) { | |
| transientModelCooldowns.delete(id); | |
| } else { | |
| skipModels.add(id); | |
| cooldownIds.add(id); | |
| } | |
| } | |
| if (cooldownIds.size > 0) { | |
| console.log(`[Proxy] transient cooldowns active for model IDs: [${Array.from(cooldownIds).join(',')}]`); | |
| } | |
| } |
| // 5xx failure detection — all providers: model-level ban + transient cooldown | ||
| const errStatus = getErrorStatus(err); | ||
| const isTransientCooldownEligible = (errStatus !== undefined && errStatus >= 500 && errStatus < 600) || errStatus === undefined; | ||
| if (errStatus && isBanEligibleStatus(errStatus)) { | ||
| if (route.platform === 'longcat') { | ||
| console.warn(`[Proxy] 5xx from LongCat — excluding entire LongCat provider for session`); | ||
| banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId); | ||
| addProviderModelsToSkipModels(skipModels, 'longcat'); | ||
| // Clear sticky if pinned to LongCat | ||
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === 'longcat') { | ||
| preferredModel = undefined; | ||
| preferredKeyId = undefined; | ||
| } | ||
| // Register transient cooldown for any 5xx ban-eligible error | ||
| transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS); | ||
| console.warn(`[Proxy] 5xx from ${route.platform}/${route.modelId} — skipping model for session`); | ||
| skipModels.add(route.modelDbId); | ||
| // Clear sticky if pinned to this platform |
There was a problem hiding this comment.
1. transientmodelcooldowns before keys exhausted 📘 Rule violation ≡ Correctness
The proxy applies a model-level penalty (skipModels and a transient cooldown) immediately after a single request failure, without first attempting other enabled keys for the same platform/model. This can incorrectly penalize a model even when another key could succeed, violating the requirement to apply model-level penalties only after all keys are exhausted.
Agent Prompt
## Issue description
`handleChatCompletion()` applies model-level skipping/cooldowns (`skipModels.add(route.modelDbId)` and `transientModelCooldowns.set(route.modelDbId, ...)`) after a single failed attempt. This violates the requirement that model-level penalties occur only after all keys for that model are exhausted.
## Issue Context
The retry loop already tracks per-key failures via `skipKeys`, which implies the system can retry the same model with another key. However, `skipModels`/`transientModelCooldowns` prevent routing back to the same model even if other enabled keys exist.
## Fix Focus Areas
- server/src/routes/proxy.ts[1671-1708]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| if (!streamStarted) { | ||
| ttfbMs = Date.now() - start; | ||
| res.setHeader('X-Routed-Via', `${route.platform}/${route.modelId}`); | ||
| if (attempt > 0) res.setHeader('X-Fallback-Attempts', String(attempt)); | ||
| if (responseStreamContext) { | ||
| writeResponseStreamStart(res, responseStreamContext, route.modelId); | ||
| } | ||
| streamStarted = true; | ||
| } | ||
| const deltaToolCalls = chunk.choices[0]?.delta?.tool_calls ?? []; | ||
| if (deltaToolCalls.length > 0) sawToolCalls = true; | ||
| if (responseStreamContext) { | ||
| writeResponseStreamStart(res, responseStreamContext, route.modelId); | ||
| totalOutputTokens += writeResponseStreamChunk(res, responseStreamContext, chunk); | ||
| } else { | ||
| const text = chunk.choices[0]?.delta?.content ?? ''; | ||
| if (text) streamedText += text; | ||
| totalOutputTokens += Math.ceil(text.length / 4); | ||
| res.write(`data: ${JSON.stringify(chunk)}\n\n`); | ||
| } |
There was a problem hiding this comment.
2. Missing sse content-type 🐞 Bug ≡ Correctness
handleChatCompletion() writes SSE data: frames for standard chat-completions streaming without setting Content-Type: text/event-stream (and related SSE headers) on the normal path, only doing so in the no-chunks fallback. This can cause clients/proxies to buffer or mis-parse the stream and breaks SSE expectations.
Agent Prompt
### Issue description
In `server/src/routes/proxy.ts`, the streaming path for non-`responseStreamContext` writes SSE frames (`res.write('data: ...\n\n')`) but does not set `Content-Type: text/event-stream` (and other common SSE headers) before writing.
### Issue Context
SSE responses should set `Content-Type: text/event-stream` early (before the first `res.write`) so that clients and intermediaries treat the response as a stream.
### Fix Focus Areas
- Add SSE headers when streaming starts (before first `res.write` in the `!responseStreamContext` path)
- Keep existing `writeResponseStreamStart(...)` behavior intact for the Responses API path
### Fix Focus Areas (code pointers)
- server/src/routes/proxy.ts[1375-1393]
- server/src/routes/proxy.ts[1427-1434]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const { status, raw } = await request(app, 'POST', '/v1/chat/completions', { | ||
| messages: [{ role: 'user', content: 'Test heartbeat' }], | ||
| stream: true, | ||
| }); | ||
|
|
||
| expect(status).toBe(200); | ||
| // Should contain the actual content | ||
| expect(raw).toContain('hello'); | ||
| expect(raw).toContain('world'); | ||
| // Should contain at least one keep-alive comment during the 300ms idle period | ||
| expect(raw).toContain(': keep-alive'); | ||
| }); |
There was a problem hiding this comment.
3. Keepalive test mismatch 🐞 Bug ≡ Correctness
The new keepalive unit test expects : keep-alive during a 300ms delay before the first provider chunk, but the implementation only writes keepalive comments when streamStarted is true (which only happens after the first chunk arrives). This makes the test fail deterministically (or at best be flaky) and does not validate the intended heartbeat behavior.
Agent Prompt
### Issue description
The keepalive test asserts that `: keep-alive` appears during the pre-first-chunk idle window, but the proxy implementation intentionally skips keepalive writes until after the stream has started (after the first chunk / SSE headers).
### Issue Context
`streamStarted` is set only inside the first-chunk handler, and keepalive writes are gated on `streamStarted`, so no keepalive can be emitted before the first chunk.
### Fix Focus Areas
- Update the test to create an idle period *after* the first chunk (e.g., emit chunk #1 immediately to start the stream, then delay 300ms before chunk #2) and assert keepalive appears during that gap
- Alternatively, remove/adjust the assertion to match the intended design (keepalive only after stream start)
### Fix Focus Areas (code pointers)
- server/src/__tests__/routes/stream-heartbeat-stall.test.ts[113-124]
- server/src/routes/proxy.ts[1366-1383]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (10)
.roo/specs/generalized-thread-protection/requirements.md (1)
64-66: 💤 Low valueConsider adding language specifier to fenced code block.
The environment variable format example would render better with a language identifier (e.g.,
```bashor```shell).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/generalized-thread-protection/requirements.md around lines 64 - 66, The fenced code block showing THREAD_PROTECTION_PLATFORMS should include a shell language identifier so syntax/highlighting renders correctly; update the block that contains THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,owl-alpha:provider-ban,groq:model-skip" to use a language tag such as ```bash or ```shell at the opening fence (keeping the variable name THREAD_PROTECTION_PLATFORMS and its value unchanged)..roo/specs/generalized-thread-protection/design.md (1)
43-45: 💤 Low valueConsider adding language specifier to fenced code block.
The environment variable example would render better with a language identifier (e.g.,
```bashor```shell).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 45, The fenced code block showing the environment variable THREAD_PROTECTION_PLATFORMS should include a shell language specifier (e.g., ```bash or ```shell) so the example renders with proper syntax highlighting; update the block containing THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" to use a language identifier like ```bash before the line and close with ``` after..roo/specs/disable-sticky-on-auto/design.md (1)
11-17: 💤 Low valueConsider adding language specifier to fenced code block.
The code block would render better with a language identifier (e.g.,
```typescriptor```javascript).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/disable-sticky-on-auto/design.md around lines 11 - 17, Add a language identifier to the fenced code block so syntax highlighting works (e.g., change ``` to ```typescript or ```javascript) for the snippet containing stickyOp, getSessionKey, and stickySessionMap; update the triple backtick opening fence that precedes the function example to include the chosen language specifier and leave the closing fence unchanged..roo/specs/sse-stream-heartbeat-stall-protection/design.md (1)
240-279: 💤 Low valueConsider condensing the design evolution section.
Lines 241-279 show the iterative thinking process ("Wait — this needs more thought", "Better approach", "Final approach"). While this reasoning is valuable during design, the final spec might be clearer if condensed to just the final approach with a brief note about the key decision (pre-stream stall throws 504 for retry, mid-stream stall writes error frame and closes).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 240 - 279, Replace the iterative "design evolution" prose in the sse-stream-heartbeat-stall-protection section (the paragraphs around the MAX_STREAM_STALL_MS / stall handling logic) with just the final approach: state succinctly that when now - lastChunkTimestamp > MAX_STREAM_STALL_MS and streamStarted is false the handler should throw a retryable 504 error to let the outer retry/502 logic run, and when streamStarted is true the handler should write an error frame via writeResponseStreamEvent or plain res.write and then res.end(); remove the "Wait —", "Better approach" and similar intermediate notes and keep a one-line rationale that pre-stream stalls are retryable while mid-stream stalls are terminal (reference symbols: streamStarted, MAX_STREAM_STALL_MS, route.displayName, responseStreamContext, writeResponseStreamEvent, res.end()).server/src/providers/base.ts (1)
135-151: 💤 Low valueConsider extracting
rawCodeonly when needed.Lines 142-143 extract and parse
rawCodeunconditionally, even whenerrPayloadis a string (where.codewould be undefined). While the logic is safe because line 145 guards the usage, extracting inside the conditional would be clearer:protected throwWrappedError(body: unknown): void { const obj = body as Record<string, unknown>; const errPayload = obj.error; const message = this.extractErrorMessage(body, 'Unknown wrapped error'); const error = new Error( `${this.name} API error (wrapped in 200): ${message}`, ) as ProviderApiError; - const rawCode = (errPayload as Record<string, unknown>).code; - const parsedCode = typeof rawCode === 'number' ? rawCode : Number(rawCode); - error.status = - typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>) - ? (Number.isFinite(parsedCode) ? parsedCode : 200) - : 200; + if (typeof errPayload === 'object' && errPayload !== null && 'code' in errPayload) { + const rawCode = (errPayload as Record<string, unknown>).code; + const parsedCode = typeof rawCode === 'number' ? rawCode : Number(rawCode); + error.status = Number.isFinite(parsedCode) ? parsedCode : 200; + } else { + error.status = 200; + } error.provider = this.name; error.responseBody = body; throw error; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/providers/base.ts` around lines 135 - 151, In throwWrappedError, avoid unconditionally reading and parsing rawCode—move the extraction and parsedCode logic inside the conditional that checks errPayload is an object and has 'code' (the block that sets error.status) so you only access (errPayload as Record<string, unknown>).code when it's present; compute parsedCode there, use Number.isFinite(parsedCode) to decide the status, and fallback to 200 as currently done, keeping error.provider and error.responseBody assignment and the thrown ProviderApiError unchanged.client/src/components/pool-badge.tsx (1)
1-1: ⚡ Quick winConsider importing
PoolTypefrom shared types instead of duplicating the definition.Defining
PoolTypelocally as a string union creates potential for drift if the server-sideModelPoolenum changes. Since the review stack context mentions "Shared ModelPool Types," consider importing or deriving this type from@freellmapi/shared/typesto ensure the client and server stay in sync.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/src/components/pool-badge.tsx` at line 1, The local PoolType union should be replaced with the shared type to avoid drift; import the shared ModelPool/PoolType from the central types package (e.g., from "`@freellmapi/shared/types`") instead of declaring export type PoolType = 'fast' | 'balanced' | 'smart'; update any references in pool-badge.tsx to use the imported symbol (PoolType or ModelPool) so client and server share a single source of truth.server/src/__tests__/routes/fallback.test.ts (1)
56-62: ⚡ Quick winConsider testing that expected pools are actually assigned, not just valid.
This test validates that returned pool values are members of the
ModelPoolenum, but it doesn't verify that the pools you expect to see (e.g.,Smartfor LongCat,Balancedfor others) are actually assigned. IfgetModelPool()incorrectly returnedBalancedfor every model, this test would still pass.🧪 Suggested additional assertion
it('GET /api/fallback pool values are valid ModelPool enum values', async () => { const { body } = await request(app, 'GET', '/api/fallback'); const validPools = [ModelPool.Fast, ModelPool.Balanced, ModelPool.Smart]; for (const entry of body) { expect(validPools).toContain(entry.pool); } + // Verify specific expected pool assignments + const longcatEntry = body.find((e: any) => e.platform === 'longcat'); + if (longcatEntry) { + expect(longcatEntry.pool).toBe(ModelPool.Smart); + } });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/fallback.test.ts` around lines 56 - 62, Update the test so it asserts not only that each entry.pool is a valid ModelPool but also that specific models get the expected pool assignments: call out getModelPool()/the /api/fallback response and build an expectedPools mapping (e.g., "LongCat" => ModelPool.Smart, other known model names => ModelPool.Balanced) and for each response entry assert entry.pool === expectedPools[entry.model] (or equivalent per-model assertions) in addition to the existing validPools check; reference the response body variable used in the test and the ModelPool enum to locate where to add these assertions.client/src/pages/FallbackPage.tsx (1)
67-72: 💤 Low valueThe
'fast'pool is defined but never populated by the server.
poolOrderandpoolTitlesinclude'fast', butgetModelPool()in the fallback route only returns'smart'or'balanced'. While line 300 filters out empty groups (preventing a visual bug), the'fast'entries in these objects are dead code unless future models will be classified as Fast.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/src/pages/FallbackPage.tsx` around lines 67 - 72, The client defines a 'fast' pool in poolOrder and poolTitles (types PoolType, symbols poolOrder and poolTitles) but the server-side getModelPool only returns 'smart' or 'balanced', so 'fast' is dead code; either remove 'fast' from poolOrder and poolTitles (and update PoolType accordingly) or update getModelPool to classify some models as 'fast' so the client and server agree—pick the approach consistent with product intent and keep the symbol names poolOrder, poolTitles, and PoolType in sync with getModelPool.server/src/routes/proxy.ts (1)
22-24: ⚖️ Poor tradeoffConsider using a Map instead of Set for more efficient lookups.
The
activeRequestsSet stores objects and cleanup logic iterates the entire Set to find matching entries (lines 1604-1609, 1657-1662, 1238-1241). Since Sets use reference equality and the code never stores the original reference, each cleanup is O(n).A
Map<string, { platform, modelId, startTime }>keyed bysessionKey(or a composite key) would enable O(1) lookups and deletions.♻️ Alternative structure
-const activeRequests = new Set<{ sessionKey: string; platform: string; modelId: string; startTime: number }>(); +const activeRequests = new Map<string, { platform: string; modelId: string; startTime: number }>();Then update registration (line 1321) and cleanup (lines 1604-1609, etc.) to use Map methods.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 22 - 24, Replace the Set named activeRequests with a Map keyed by sessionKey (or a composite key like `${sessionKey}:${platform}:${modelId}`) to enable O(1) lookups and deletions; change its type to Map<string, { platform: string; modelId: string; startTime: number }>, update the registration logic that currently adds to activeRequests (the block that creates the active entry) to use activeRequests.set(key, { platform, modelId, startTime }), and replace all cleanup/lookup code that currently iterates the Set (the blocks that search for matching entries and remove them) to use activeRequests.get(key)/activeRequests.has(key) and activeRequests.delete(key) instead; ensure any composite-key construction is consistent across where entries are added, checked, and removed.server/src/services/threadProtection.ts (1)
41-51: ⚡ Quick winConsider logging invalid protection levels during config parsing.
The parsing loop silently skips invalid protection levels (lines 48-50). If a user sets
THREAD_PROTECTION_PLATFORMS=groq:ban-provider(typo: should beprovider-ban), it's silently ignored and falls back to the defaultmodel-skip. This could lead to unexpected runtime behavior that's difficult to debug.🔍 Proposed enhancement to add warning logs
const [platform, level] = trimmed.split(':'); if (!platform || !level) continue; const normalizedLevel = level.trim().toLowerCase(); if (normalizedLevel === 'provider-ban' || normalizedLevel === 'model-skip' || normalizedLevel === 'off') { map.set(platform.trim().toLowerCase(), normalizedLevel as ProtectionLevel); + } else { + console.warn(`[ThreadProtection] Invalid protection level "${level}" for platform "${platform}" — valid values: provider-ban, model-skip, off`); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/threadProtection.ts` around lines 41 - 51, The parsing loop in threadProtection.ts silently skips invalid protection levels; update the loop that processes raw.split(',') (where variables raw, pair, trimmed, platform, level, normalizedLevel, and map are used) to emit a warning when an unrecognized normalizedLevel is encountered (i.e., not 'provider-ban', 'model-skip', or 'off'); log the platform and the provided level (and the raw pair) so users can see the typo/misconfiguration, then continue to skip adding it to map. Use the same logger used elsewhere in this module (or fallback to console.warn if none is available) and keep the existing behavior of ignoring invalid entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.roo/specs/wrapped-error-interception/tasks.md:
- Around line 38-58: Add a clarifying note to the streaming-provider checklist
entries for CohereProvider.streamChatCompletion,
CloudflareProvider.streamChatCompletion, and GoogleProvider.streamChatCompletion
stating that the wrapped-error check (i.e., parsing the first chunk into a
variable and calling this.isWrappedError(...) / this.throwWrappedError(...))
must only be applied to the very first parsed SSE chunk before any chunk is
forwarded to the client; place this note alongside the existing instructions for
each function so implementers know to check the first payload only to avoid
aborting mid-stream after partial content has been sent.
In `@server/src/routes/proxy.ts`:
- Line 53: The current check uses `if (key === undefined)` but `getSessionKey()`
returns an empty string when no key is derived, so replace the condition to
explicitly guard empty-string (and still allow undefined) — e.g., change the
check around the `key` variable in the proxy handler to `if (key === '' || key
=== undefined) return undefined;` so the code never calls
`stickySessionMap.get('')`; refer to `getSessionKey()` and the use of
`stickySessionMap.get(key)` to locate where to change this.
---
Nitpick comments:
In @.roo/specs/disable-sticky-on-auto/design.md:
- Around line 11-17: Add a language identifier to the fenced code block so
syntax highlighting works (e.g., change ``` to ```typescript or ```javascript)
for the snippet containing stickyOp, getSessionKey, and stickySessionMap; update
the triple backtick opening fence that precedes the function example to include
the chosen language specifier and leave the closing fence unchanged.
In @.roo/specs/generalized-thread-protection/design.md:
- Around line 43-45: The fenced code block showing the environment variable
THREAD_PROTECTION_PLATFORMS should include a shell language specifier (e.g.,
```bash or ```shell) so the example renders with proper syntax highlighting;
update the block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" to use a
language identifier like ```bash before the line and close with ``` after.
In @.roo/specs/generalized-thread-protection/requirements.md:
- Around line 64-66: The fenced code block showing THREAD_PROTECTION_PLATFORMS
should include a shell language identifier so syntax/highlighting renders
correctly; update the block that contains
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,owl-alpha:provider-ban,groq:model-skip"
to use a language tag such as ```bash or ```shell at the opening fence (keeping
the variable name THREAD_PROTECTION_PLATFORMS and its value unchanged).
In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md:
- Around line 240-279: Replace the iterative "design evolution" prose in the
sse-stream-heartbeat-stall-protection section (the paragraphs around the
MAX_STREAM_STALL_MS / stall handling logic) with just the final approach: state
succinctly that when now - lastChunkTimestamp > MAX_STREAM_STALL_MS and
streamStarted is false the handler should throw a retryable 504 error to let the
outer retry/502 logic run, and when streamStarted is true the handler should
write an error frame via writeResponseStreamEvent or plain res.write and then
res.end(); remove the "Wait —", "Better approach" and similar intermediate notes
and keep a one-line rationale that pre-stream stalls are retryable while
mid-stream stalls are terminal (reference symbols: streamStarted,
MAX_STREAM_STALL_MS, route.displayName, responseStreamContext,
writeResponseStreamEvent, res.end()).
In `@client/src/components/pool-badge.tsx`:
- Line 1: The local PoolType union should be replaced with the shared type to
avoid drift; import the shared ModelPool/PoolType from the central types package
(e.g., from "`@freellmapi/shared/types`") instead of declaring export type
PoolType = 'fast' | 'balanced' | 'smart'; update any references in
pool-badge.tsx to use the imported symbol (PoolType or ModelPool) so client and
server share a single source of truth.
In `@client/src/pages/FallbackPage.tsx`:
- Around line 67-72: The client defines a 'fast' pool in poolOrder and
poolTitles (types PoolType, symbols poolOrder and poolTitles) but the
server-side getModelPool only returns 'smart' or 'balanced', so 'fast' is dead
code; either remove 'fast' from poolOrder and poolTitles (and update PoolType
accordingly) or update getModelPool to classify some models as 'fast' so the
client and server agree—pick the approach consistent with product intent and
keep the symbol names poolOrder, poolTitles, and PoolType in sync with
getModelPool.
In `@server/src/__tests__/routes/fallback.test.ts`:
- Around line 56-62: Update the test so it asserts not only that each entry.pool
is a valid ModelPool but also that specific models get the expected pool
assignments: call out getModelPool()/the /api/fallback response and build an
expectedPools mapping (e.g., "LongCat" => ModelPool.Smart, other known model
names => ModelPool.Balanced) and for each response entry assert entry.pool ===
expectedPools[entry.model] (or equivalent per-model assertions) in addition to
the existing validPools check; reference the response body variable used in the
test and the ModelPool enum to locate where to add these assertions.
In `@server/src/providers/base.ts`:
- Around line 135-151: In throwWrappedError, avoid unconditionally reading and
parsing rawCode—move the extraction and parsedCode logic inside the conditional
that checks errPayload is an object and has 'code' (the block that sets
error.status) so you only access (errPayload as Record<string, unknown>).code
when it's present; compute parsedCode there, use Number.isFinite(parsedCode) to
decide the status, and fallback to 200 as currently done, keeping error.provider
and error.responseBody assignment and the thrown ProviderApiError unchanged.
In `@server/src/routes/proxy.ts`:
- Around line 22-24: Replace the Set named activeRequests with a Map keyed by
sessionKey (or a composite key like `${sessionKey}:${platform}:${modelId}`) to
enable O(1) lookups and deletions; change its type to Map<string, { platform:
string; modelId: string; startTime: number }>, update the registration logic
that currently adds to activeRequests (the block that creates the active entry)
to use activeRequests.set(key, { platform, modelId, startTime }), and replace
all cleanup/lookup code that currently iterates the Set (the blocks that search
for matching entries and remove them) to use
activeRequests.get(key)/activeRequests.has(key) and activeRequests.delete(key)
instead; ensure any composite-key construction is consistent across where
entries are added, checked, and removed.
In `@server/src/services/threadProtection.ts`:
- Around line 41-51: The parsing loop in threadProtection.ts silently skips
invalid protection levels; update the loop that processes raw.split(',') (where
variables raw, pair, trimmed, platform, level, normalizedLevel, and map are
used) to emit a warning when an unrecognized normalizedLevel is encountered
(i.e., not 'provider-ban', 'model-skip', or 'off'); log the platform and the
provided level (and the raw pair) so users can see the typo/misconfiguration,
then continue to skip adding it to map. Use the same logger used elsewhere in
this module (or fallback to console.warn if none is available) and keep the
existing behavior of ignoring invalid entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2916449e-b59f-44a9-8e06-5a905a00cdc4
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (46)
.npmrc.roo/specs/disable-sticky-on-auto/design.md.roo/specs/disable-sticky-on-auto/requirements.md.roo/specs/disable-sticky-on-auto/tasks.md.roo/specs/generalized-thread-protection/design.md.roo/specs/generalized-thread-protection/requirements.md.roo/specs/generalized-thread-protection/tasks.md.roo/specs/owl-alpha-longcat-model-routing/design.md.roo/specs/owl-alpha-longcat-model-routing/requirements.md.roo/specs/owl-alpha-longcat-model-routing/tasks.md.roo/specs/pr13-code-review-fixes/design.md.roo/specs/pr13-code-review-fixes/requirements.md.roo/specs/pr13-code-review-fixes/tasks.md.roo/specs/recency-biased-thompson-sampling/design.md.roo/specs/recency-biased-thompson-sampling/requirements.md.roo/specs/recency-biased-thompson-sampling/tasks.md.roo/specs/sse-stream-heartbeat-stall-protection/design.md.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md.roo/specs/transient-model-cooldown/design.md.roo/specs/transient-model-cooldown/requirements.md.roo/specs/transient-model-cooldown/tasks.md.roo/specs/wrapped-error-interception/design.md.roo/specs/wrapped-error-interception/requirements.md.roo/specs/wrapped-error-interception/tasks.mdAGENTS.mdclient/src/components/pool-badge.tsxclient/src/components/pool-section.tsxclient/src/pages/FallbackPage.tsxpackage.jsonserver/src/__tests__/routes/fallback.test.tsserver/src/__tests__/routes/provider-session-ban.test.tsserver/src/__tests__/routes/proxy-tools.test.tsserver/src/__tests__/routes/stream-heartbeat-stall.test.tsserver/src/__tests__/routes/transient-cooldown.test.tsserver/src/__tests__/services/router.test.tsserver/src/providers/base.tsserver/src/providers/cloudflare.tsserver/src/providers/cohere.tsserver/src/providers/google.tsserver/src/providers/openai-compat.tsserver/src/routes/fallback.tsserver/src/routes/proxy.tsserver/src/services/router.tsserver/src/services/threadProtection.tsshared/types.ts
| - [x] 7. Add wrapped-error check in `CohereProvider.streamChatCompletion()` in `server/src/providers/cohere.ts` | ||
| - Inside the `try` block at line 110, after `JSON.parse(data)` succeeds: | ||
| - Insert: `if (this.isWrappedError(parsed)) { this.throwWrappedError(parsed); }` | ||
| - Note: assign the result of `JSON.parse` to a variable first, then check, then yield | ||
|
|
||
| - [x] 8. Add wrapped-error check in `CloudflareProvider.chatCompletion()` in `server/src/providers/cloudflare.ts` | ||
| - After line 62 (`const data = await res.json() as ChatCompletionResponse;`), before line 63 (`data._routed_via = ...`): | ||
| - Insert: `if (this.isWrappedError(data)) { this.throwWrappedError(data); }` | ||
|
|
||
| - [x] 9. Add wrapped-error check in `CloudflareProvider.streamChatCompletion()` in `server/src/providers/cloudflare.ts` | ||
| - Inside the `try` block at line 119, after `JSON.parse(data)` succeeds: | ||
| - Insert: `if (this.isWrappedError(parsed)) { this.throwWrappedError(parsed); }` | ||
| - Note: assign the result of `JSON.parse` to a variable first, then check, then yield | ||
|
|
||
| - [x] 10. Add wrapped-error check in `GoogleProvider.chatCompletion()` in `server/src/providers/google.ts` | ||
| - After line 246 (`const data = await res.json() as GeminiResponse;`), before line 247 (`const candidate = data.candidates?.[0];`): | ||
| - Insert: `if (this.isWrappedError(data)) { this.throwWrappedError(data); }` | ||
|
|
||
| - [x] 11. Add wrapped-error check in `GoogleProvider.streamChatCompletion()` in `server/src/providers/google.ts` | ||
| - After line 354 (`chunk = JSON.parse(raw) as GeminiResponse;`), before line 358 (`const candidate = chunk.candidates?.[0];`): | ||
| - Insert: `if (this.isWrappedError(chunk)) { this.throwWrappedError(chunk); }` |
There was a problem hiding this comment.
Harmonize streaming error detection guidance across all providers.
Step 5 (line 32) includes an important note about checking wrapped errors only on the first parsed payload before any chunk is forwarded. This aligns with FR-6's requirement that the check applies to "the first SSE chunk." However, Steps 7 (Cohere), 9 (Cloudflare), and 11 (Google) lack this same guidance.
All streaming methods should include the same note to ensure uniform behavior and prevent mid-stream aborts after partial content has been sent to clients.
📝 Suggested addition for Steps 7, 9, and 11
Add the following note to Step 7 (after line 41), Step 9 (after line 50), and Step 11 (after line 58):
- Insert: `if (this.isWrappedError(parsed)) { this.throwWrappedError(parsed); }`
- Note: assign the result of `JSON.parse` to a variable first, then check, then yield
+ - Only throw wrapped SSE errors before any chunk has been forwarded (first parsed payload only). Track whether any chunk has been yielded and skip the check after the first yield.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/wrapped-error-interception/tasks.md around lines 38 - 58, Add a
clarifying note to the streaming-provider checklist entries for
CohereProvider.streamChatCompletion, CloudflareProvider.streamChatCompletion,
and GoogleProvider.streamChatCompletion stating that the wrapped-error check
(i.e., parsing the first chunk into a variable and calling
this.isWrappedError(...) / this.throwWrappedError(...)) must only be applied to
the very first parsed SSE chunk before any chunk is forwarded to the client;
place this note alongside the existing instructions for each function so
implementers know to check the first payload only to avoid aborting mid-stream
after partial content has been sent.
| function getStickyModel(messages: ChatMessage[], routingMode: RoutingMode): number | undefined { | ||
| const key = getSessionKey(messages, routingMode); | ||
| if (!key) return undefined; | ||
| if (key === undefined) return undefined; |
There was a problem hiding this comment.
Critical: Incorrect type check causes logic error.
Line 53 checks if (key === undefined), but getSessionKey() (line 41-49) returns an empty string '' when no session key can be derived, not undefined. This condition will never trigger for empty strings, allowing the function to proceed with an empty key and potentially access stickySessionMap.get('').
🐛 Proposed fix
- if (key === undefined) return undefined;
+ if (!key) return undefined;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (key === undefined) return undefined; | |
| if (!key) return undefined; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/routes/proxy.ts` at line 53, The current check uses `if (key ===
undefined)` but `getSessionKey()` returns an empty string when no key is
derived, so replace the condition to explicitly guard empty-string (and still
allow undefined) — e.g., change the check around the `key` variable in the proxy
handler to `if (key === '' || key === undefined) return undefined;` so the code
never calls `stickySessionMap.get('')`; refer to `getSessionKey()` and the use
of `stickySessionMap.get(key)` to locate where to change this.
Summary by Sourcery
Introduce provider-agnostic thread protection, transient model cooldowns, and SSE heartbeat/stall safeguards while refining routing logic and surfacing model pools in the fallback UI.
New Features:
Bug Fixes:
Enhancements:
Tests:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation