refactor(proxy): LongCat immediate provider exclusion, non-LongCat model-only skip on 5xx#7
refactor(proxy): LongCat immediate provider exclusion, non-LongCat model-only skip on 5xx#7vi70x3 wants to merge 8 commits into
Conversation
Implements the LongCat session ban feature in proxy.ts. When LongCat detects multiple API key use (auth/rate-limit errors) or returns truncated responses, the platform is banned from the sticky session. Future requests in that session route to non-LongCat models. Changes: - Extend stickySessionMap with bannedPlatforms?: Set<string> - Add isSessionBannedFromPlatform() to check session bans - Add banPlatformFromSession() to record platform bans - Add addLongcatModelsToSkipModels() to skip all LongCat models - Add isTruncatedResponse() to detect truncation keywords - Update getStickyKey() to return undefined for banned platforms - Update setStickyModel() to preserve bannedPlatforms across updates - Update pre-routing logic to check bans before routing - Update error handling to ban LongCat on auth/rate-limit/truncation - Add truncation detection after stream completes - Add truncation detection in mid-stream error handling
… consecutive failures - Extend stickySessionMap with consecutiveFailures tracking per provider - Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures() - Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels - Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2) - Generalize truncation detection to all providers (post-stream + mid-stream) - Update getStickyKey() to check bannedPlatforms for any platform - Update pre-routing ban check to be generic (any banned platform) - Add success path counter reset on both streaming and non-streaming paths - Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels - Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases) - TypeScript compiles cleanly, all 150 tests pass
- Add isBanEligibleStatus() helper restricting to {500,502,503,504}
- Improve mid-stream truncation detection with aggregated error sources
- Pre-routing ban check now skips ALL banned platforms, not just preferredModel's
- Only clear preferredModel when provider is actually banned (not on first 5xx)
- Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)
…or other sessions during 3-min cooldown window
…del-only skip on 5xx
📝 WalkthroughWalkthroughThis PR introduces session-scoped platform banning and a sticky-session cooldown mechanism to the proxy layer. When requests encounter consecutive provider failures or truncation, the session bans that platform and routes subsequent requests through fallback models. Sticky sessions persist through cooldown windows without routing to LongCat, while the router preserves preferred models even when they appear in skip lists. ChangesSession Banning & Cooldown Integration
Sequence DiagramsequenceDiagram
participant Client
participant ProxyHandler
participant StickySessionMap
participant Router
participant Provider
Client->>ProxyHandler: POST /chat/completions (sticky session)
ProxyHandler->>StickySessionMap: check sticky session + bans
alt session banned from platform
ProxyHandler->>ProxyHandler: add banned models to skipModels
end
alt cooldown active for sticky LongCat
ProxyHandler->>ProxyHandler: add LongCat models to skipModels (other sessions only)
end
ProxyHandler->>Router: routeRequest with skipModels
Router->>Router: exclude skipModels unless preferred sticky model
Router-->>ProxyHandler: selected model
ProxyHandler->>Provider: forward request
Provider-->>ProxyHandler: 5xx error or truncated response
ProxyHandler->>ProxyHandler: detect ban-eligible status
alt 5xx or truncation
ProxyHandler->>StickySessionMap: recordConsecutiveFailure()
alt threshold reached
StickySessionMap->>StickySessionMap: ban platform + update skipModels
end
end
alt retry available
ProxyHandler->>Router: routeRequest with updated skipModels
Note over Router: LongCat now excluded, fall back to alternate
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request implements a session ban and fallback mechanism for LongCat and other providers, including a cooldown safeguard and generalized truncation/5xx error handling. The review feedback highlights a critical logic error in the cooldown safeguard where the active session skips itself while other sessions bypass the cooldown entirely. Additionally, the bypass added to the router's skipModels check breaks the fallback mechanism for sticky sessions during retryable errors. Finally, several newly added consecutive failure tracking functions are identified as dead code because the refactored error handler performs direct banning and skipping.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // LongCat sticky cooldown: if the sticky model is on LongCat and was used | ||
| // within the last 3 minutes, exclude LongCat from the bandit router for all | ||
| // other sessions. The current sticky session keeps its pinned LongCat route. | ||
| // This prevents LongCat from seeing multiple sessions/keys from the same IP. | ||
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === 'longcat') { | ||
| const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode); | ||
| const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined; | ||
| if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { | ||
| const ageMs = Date.now() - cooldownEntry.lastUsed; | ||
| addProviderModelsToSkipModels(skipModels, 'longcat'); | ||
| console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🚨 Critical Logic Error in Cooldown Safeguard
There is a fundamental logic error in how the LongCat cooldown safeguard is implemented. The goal of the cooldown is to prevent other sessions from routing to LongCat when a sticky LongCat session is active (hot). However, the current implementation does the exact opposite:
- For the active/hot session itself:
preferredModelis set to LongCat, so it enters this block, retrieves its own session entry, and adds all LongCat models to its ownskipModelsset. It then relies on a router-level bypass (entry.model_db_id !== preferredModelDbId) to avoid skipping itself. - For other sessions: When a different session (or a new session) sends a request, its
preferredModelis eitherundefinedor a different model. Therefore, it never enters this block, never adds LongCat to itsskipModels, and is free to route to LongCat via the bandit router.
This means the cooldown safeguard is completely ineffective at protecting LongCat from concurrent sessions, while introducing unnecessary complexity and bypasses for the active session.
Suggested Fix
To correctly implement the cooldown, the proxy should check if any session in the global stickySessionMap is currently "hot" on LongCat, and if so, add LongCat to the skipModels of the current request only if the current request is not already sticky-pinned to that LongCat model.
// LongCat sticky cooldown: if any OTHER session has used LongCat recently,
// exclude LongCat from the bandit router for the current session.
const currentSessionKey = getSessionKey(normalizedMessages, routingMode);
const now = Date.now();
let isAnyOtherLongcatHot = false;
for (const [key, entry] of stickySessionMap) {
if (key === currentSessionKey) continue;
if (now - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
const db = getDb();
const row = db.prepare('SELECT platform FROM models WHERE id = ?').get(entry.modelDbId) as { platform: string } | undefined;
if (row?.platform === 'longcat') {
isAnyOtherLongcatHot = true;
break;
}
}
}
if (isAnyOtherLongcatHot) {
addProviderModelsToSkipModels(skipModels, 'longcat');
console.log(`[Sticky] LongCat cooldown active due to another active session — excluding LongCat from bandit routing`);
}|
|
||
| for (const entry of sorted) { | ||
| if (skipModels?.has(entry.model_db_id)) continue; | ||
| if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue; |
There was a problem hiding this comment.
⚠️ Fallback Mechanism Broken for Sticky Sessions
The bypass && entry.model_db_id !== preferredModelDbId added to the skipModels check breaks the fallback mechanism for sticky sessions when their pinned model encounters a retryable error (such as a 5xx or timeout).
How it breaks:
- When a sticky model fails with a retryable error, it is added to
skipModels(e.g., at line 1555 or 1579 inproxy.ts) so that the retry loop can fall back to a different model. - On the next retry attempt,
routeRequestis called with the samepreferredModelDbIdand the updatedskipModelsset. - Because of
&& entry.model_db_id !== preferredModelDbId, the router ignores the fact that the preferred model is inskipModelsand routes to it again. - This forces the proxy to repeatedly retry the failing preferred model (or its other keys) up to
MAX_RETRIES(20 times), instead of falling back to another provider/model immediately.
Solution:
By fixing the cooldown safeguard logic in proxy.ts (so that the active session never adds its own preferred model to skipModels), we can completely remove this bypass from the router. This allows the router to correctly skip the preferred model if it fails during the retry loop.
| if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue; | |
| if (skipModels?.has(entry.model_db_id)) continue; |
| function recordConsecutiveFailure( | ||
| messages: ChatMessage[], | ||
| routingMode: RoutingMode, | ||
| provider: string, | ||
| skipModels: Set<number>, | ||
| modelDbId?: number, | ||
| ): void { | ||
| const key = getSessionKey(messages, routingMode); | ||
| if (!key) return; | ||
| let entry = stickySessionMap.get(key); | ||
| if (!entry) { | ||
| if (modelDbId === undefined) return; | ||
| entry = { modelDbId, lastUsed: Date.now() }; | ||
| } | ||
| if (!entry.consecutiveFailures) entry.consecutiveFailures = new Map(); | ||
| const count = (entry.consecutiveFailures.get(provider) ?? 0) + 1; | ||
| entry.consecutiveFailures.set(provider, count); | ||
| entry.lastUsed = Date.now(); | ||
| stickySessionMap.set(key, entry); | ||
| console.log(`[Sticky] consecutive 5xx for ${provider}: ${count}/2 session=${key.slice(0, 8)}`); | ||
| if (count >= 2) { | ||
| if (!entry.bannedPlatforms) entry.bannedPlatforms = new Set(); | ||
| entry.bannedPlatforms.add(provider); | ||
| addProviderModelsToSkipModels(skipModels, provider); | ||
| entry.consecutiveFailures.delete(provider); | ||
| entry.lastUsed = Date.now(); | ||
| stickySessionMap.set(key, entry); | ||
| console.log(`[Sticky] banned platform=${provider} for session=${key.slice(0, 8)} | bannedPlatforms=${Array.from(entry.bannedPlatforms).join(',')}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Dead Code: Unused Consecutive Failure Tracking
The functions recordConsecutiveFailure, resetConsecutiveFailures, and resetAllConsecutiveFailures (along with the consecutiveFailures field in stickySessionMap's value type) are now completely unused dead code.
The refactored error handling in handleChatCompletion directly bans LongCat or skips non-LongCat models on 5xx/truncation errors immediately, without using consecutive failure counters.
To keep the codebase clean and maintainable, please remove these unused functions, their exports, and the consecutiveFailures field from the stickySessionMap type definition.
(Note: You will also need to update the newly added provider-session-ban.test.ts to remove tests for these deleted functions, and instead add tests for the actual request-level banning/skipping behavior.)
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/routes/proxy.ts (1)
1412-1495:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMid-stream LongCat retryable errors still escape without a session ban.
This catch only bans LongCat on 5xx or truncation. A mid-stream timeout, 429, auth failure, or connection reset will go down the generic
response.failedpath and leave LongCat eligible on the next request, which breaks the “ban LongCat on any retryable failure” contract.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1412 - 1495, The handler currently only bans LongCat for 5xx (via getErrorStatus/isBanEligibleStatus) or truncation (isTruncatedResponse), allowing mid-stream timeout/429/auth/connection-reset errors to fall through to the generic response.failed path; add a reusable check (e.g., isRetryableStreamError(streamErr)) that detects retryable mid-stream failures (status 429, auth failures, timeouts, ECONNRESET, network errors, etc.), call it alongside the 5xx/truncation checks, and if route.platform === 'longcat' perform the same ban actions (banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId) and addProviderModelsToSkipModels(skipModels, 'longcat')) before ending the stream and logging so LongCat is excluded consistently on any retryable mid-stream failure.
🧹 Nitpick comments (1)
.roo/specs/longcat-sticky-cooldown/tasks.md (1)
6-8: 💤 Low valueClean up markdown emphasis markers.
Tasks 2 and 4 have spaces inside emphasis markers (e.g.,
** Replace**and** Update**), which triggers markdown linting warnings. While functional, removing these spaces improves document quality.✨ Suggested cleanup
-- [x] **Replace** existing cooldown check logic in [`handleChatCompletion()`](server/src/routes/proxy.ts:1243) — instead of clearing `preferredModel`/`preferredKeyId`, add LongCat models to `skipModels` via `addProviderModelsToSkipModels(skipModels, 'longcat')`. Keep `preferredModel`/`preferredKeyId` intact. -- [x] **Modify** `skipModels` check in [`server/src/services/router.ts`](server/src/services/router.ts:539) to exclude the sticky model: `if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;` -- [x] **Update** unit tests in [`server/src/__tests__/routes/proxy-tools.test.ts`](server/src/__tests__/routes/proxy-tools.test.ts) to match new behavior: cooldown adds LongCat to skipModels instead of clearing sticky preference +- [x] **Replace** existing cooldown check logic in [`handleChatCompletion()`](server/src/routes/proxy.ts:1243) — instead of clearing `preferredModel`/`preferredKeyId`, add LongCat models to `skipModels` via `addProviderModelsToSkipModels(skipModels, 'longcat')`. Keep `preferredModel`/`preferredKeyId` intact. +- [x] **Modify** `skipModels` check in [`server/src/services/router.ts`](server/src/services/router.ts:539) to exclude the sticky model: `if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;` +- [x] **Update** unit tests in [`server/src/__tests__/routes/proxy-tools.test.ts`](server/src/__tests__/routes/proxy-tools.test.ts) to match new behavior: cooldown adds LongCat to skipModels instead of clearing sticky preferenceAs per coding guidelines, these are markdown formatting improvements flagged by markdownlint-cli2 (MD037).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-sticky-cooldown/tasks.md around lines 6 - 8, Cleanup the markdown emphasis markers in the checklist lines by removing the extra spaces inside bold markers so they read "**Replace**" and "**Update**" instead of "** Replace**" and "** Update**"; locate the two checklist items referencing replacement of cooldown logic and updating unit tests (the tasks labeled 2 and 4) and edit their text to remove the inner-space in the bold delimiters to satisfy markdownlint MD037 while keeping the rest of the sentences unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.roo/specs/longcat-session-ban/design.md:
- Line 85: In the sentence starting "Records a platform ban in the sticky
session. Called when truncation, auth, or rate-limit errors is detected on
LongCat.", correct the subject-verb agreement by changing "is" to "are" so it
reads "Called when truncation, auth, or rate-limit errors are detected on
LongCat." Use the same surrounding text ("Records a platform ban in the sticky
session" / "LongCat") to locate the line in the design.md.
- Line 5: The doc is contradictory about router changes: update the sentence
that mentions "the router requires minimal changes — only the LongCat smart-auto
preference needs to respect session bans" to instead state that no changes to
router.ts are required; keep the rest as-is and explicitly note that all ban
detection and session management is implemented in proxy.ts and integrated into
handleChatCompletion(), and reference LongCat smart-auto as unaffected so
readers know router.ts does not need edits.
In @.roo/specs/longcat-session-ban/requirements.md:
- Around line 11-15: The Context section has multiple formatting and grammar
issues—remove the stray period and fix capitalization in the opening sentence
(replace ". and It uses" with "and it uses"), normalize the inline code snippet
that currently reads "`{ modelDbId, + optional `keyId` + `lastUsed` }`" to a
clear code span like `{ modelDbId, optional keyId, lastUsed }`, delete or merge
the orphan fragments ("and eviction." and the standalone "and"), break up and
rewrite the run-on paragraph so each idea is a clear sentence referencing the
sticky session implementation in server/src/routes/proxy.ts and the
longcat-sticky-key spec, and ensure references to functions/files
(clearStickyKey(), preferredModel, routeRequest(), shouldSkipModelOnRetry(),
isRetryableError()) remain correct and readable.
- Around line 5-7: Fix the grammar in the Overview sentence and spacing in the
numbered list: change "the system must to:" to "the system must:" in the
sentence starting "When a LongCat sticky session encounters an error — whether
it's an auth failure (401/40403) or a 'truncated' / 'conflict' error from the
provider — the system must to:" and ensure there are spaces after commas in the
numbered list items (e.g., "1. Detect the error, 2. Ban..." → "1. Detect the
error, 2. Ban..." with proper spacing after each comma) so the sentence reads
correctly and the list is properly spaced.
- Around line 24-27: The table in requirements.md has duplicated row number
"11"; update the second occurrence (the row listing `server/src/routes/proxy.ts`
with range 1036-1053) to "12" so row numbers are unique and sequential; ensure
any subsequent rows are renumbered accordingly if needed.
In @.roo/specs/longcat-session-ban/tasks.md:
- Around line 15-16: Task 2 in .roo/specs/longcat-session-ban/tasks.md contains
a duplicated list item "Add diagnostic logging"; remove the redundant duplicate
so only a single "Add diagnostic logging" entry remains for task 2 and ensure
the surrounding list/numbering (the task 2 bullet or numbered list) remains
consistent.
In @.roo/specs/longcat-sticky-cooldown/requirements.md:
- Around line 74-76: The requirement "No Router Changes" is incorrect because
the router's skipModels logic must allow a sticky session's preferred model;
update the router code handling skipModels in server/src/services/router.ts by
modifying the check to skip models only when entry.model_db_id is in skipModels
AND is not equal to the session's preferredModelDbId (i.e., change the
conditional to effectively: if (skipModels?.has(entry.model_db_id) &&
entry.model_db_id !== preferredModelDbId) continue), ensuring the router honors
sticky sessions while still skipping cooldowned models.
In @.roo/specs/provider-5xx-session-ban/design.md:
- Around line 1-10: The spec is out of sync with proxy.ts: update the design
text to describe the runtime behavior where LongCat providers are excluded
immediately on any 5xx or on detected truncation, non-LongCat providers are
skipped at the model level via skipModels.add(route.modelDbId) (no production
2-consecutive-failure threshold), truncation is detected by
isTruncatedResponse(...) using keyword heuristics (e.g., "truncated",
"context_length_exceeded", "token_limit"), and bannedPlatforms /
recordConsecutiveFailure are currently test-only exports; either amend the
document to match this immediate-exclusion behavior or move the older
2-consecutive-5xx design into archive/alternatives with a note that
recordConsecutiveFailure and consecutiveFailures are test-only.
In @.roo/specs/provider-5xx-session-ban/requirements.md:
- Around line 5-12: The requirements claim that both "5xx consecutive failure
ban" and "truncation detection ban" apply to all providers, but the proxy
implementation only calls banPlatformFromSession()/bannedPlatforms for
route.platform === 'longcat' while non-LongCat errors use
skipModels.add(route.modelDbId); update the implementation so both triggers use
banPlatformFromSession() and the bannedPlatforms/session ban flow for all
providers (remove the longcat-only guard around banPlatformFromSession and
ensure truncation and 5xx handling paths call banPlatformFromSession() and/or
update bannedPlatforms consistently), and keep skipModels behavior for
model-level skipping where appropriate (ensure functions referenced:
banPlatformFromSession, bannedPlatforms, skipModels, and the 5xx/truncation
handling code paths).
In @.roo/specs/provider-5xx-session-ban/tasks.md:
- Around line 9-29: Update the task list to reflect the final LongCat/model-skip
design instead of the old "2 consecutive 5xx bans the provider" flow: remove or
replace the checked items that describe "increment consecutiveFailures and ban
at count >= 2" and instead describe the shipped behavior where a LongCat
response triggers immediate LongCat exclusion and non-LongCat model-only
skipping; adjust the entries that mention recordConsecutiveFailure(),
resetConsecutiveFailures(), resetAllConsecutiveFailures(),
banPlatformFromSession(), bannedPlatforms, addProviderModelsToSkipModels(), and
consecutiveFailures to document the new immediate exclusion and model-only skip
behavior (including parameters, sticky entry handling, TTL refresh, and logging
semantics) so follow-up work and tests target the current design.
In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 208-259: The tests under recordConsecutiveFailure still assert the
old behavior (platform-wide bans after two 5xxs and truncation-driven bans for
any provider); update the test cases (the ones using getSessionKey,
stickySessionMap, skipModels and recordConsecutiveFailure) to assert the new
semantics: for platform "longcat" expect an immediate platform ban on first
failure (stickySessionMap entry.bannedPlatforms contains 'longcat' after one
recordConsecutiveFailure) while for non-LongCat providers (e.g., 'groq') expect
only the specific model id to be added to skipModels (and
entry.consecutiveFailures incremented) but not a platform-wide ban; adjust the
related assertions in the other referenced blocks (lines 261-312, 394-430,
433-443) to match these rules and remove assumptions about two-failure bans or
truncation causing platform-wide bans.
In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 895-973: The test never asserts which provider was chosen, so it
doesn't verify non-sticky-session exclusion; update the test (the "excludes
LongCat..." spec) to assert the mocked routing by checking the routedProvider
after the POST — e.g. expect(routedProvider).not.toBe('longcat') or
expect(routedProvider).toBe('groq') — and keep the existing session/key setup
that uses makeMessages and getSessionKey with stickySessionMap to ensure the
otherMessages represent a different session; ensure the mocked fetch spy
assigning routedProvider is in scope so the assertion observes the chosen
provider.
- Around line 840-849: Update the test POST calls to /api/keys to use the admin
credential instead of the unified/v1 helper; specifically, for each request(app,
'POST', '/api/keys', {...}) replace or augment the call so it supplies the admin
API key (the header used by admin-gated routes, e.g., Authorization: `Bearer
<adminKey>` or the test helper variant that attaches the admin key) so these
seeds exercise the /api/* admin gating (affects the calls shown and the other
occurrences at the ranges noted); keep the route and payload unchanged but
ensure auth is the admin key, not the v1/unified helper.
In `@server/src/routes/proxy.ts`:
- Around line 1243-1256: The cooldown check currently only runs when the current
request is already pinned (preferredModel), so it never excludes LongCat for
other sessions; instead, scan stickySessionMap for any sticky entry whose
platform is 'longcat' and whose lastUsed is within LONGCAT_STICKY_COOLDOWN_MS
and whose session key differs from the current session to trigger the exclusion.
Concretely, compute the current session key with
getSessionKey(normalizedMessages, routingMode), then iterate
stickySessionMap.values() (or entries) looking for an entry with entry.platform
=== 'longcat' and Date.now() - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS and
entry.sessionKey !== currentSessionKey; if found, call
addProviderModelsToSkipModels(skipModels, 'longcat') (and log) — do this
outside/independent of preferredModel so other sessions see LongCat excluded.
In `@server/src/services/router.ts`:
- Line 539: The current check lets preferredModelDbId bypass all skips; fix by
splitting skipSets into nonOverrideableSkips vs overrideableSkips and change the
routing condition so nonOverrideableSkips always win (e.g., if
(nonOverrideableSkips.has(entry.model_db_id)) continue; else if
(overrideableSkips.has(entry.model_db_id) && entry.model_db_id !==
preferredModelDbId) continue;), and update where skips are added (in proxy.ts)
to place hard failures into nonOverrideableSkips so sticky sessions cannot route
back to a hard-skipped model.
---
Outside diff comments:
In `@server/src/routes/proxy.ts`:
- Around line 1412-1495: The handler currently only bans LongCat for 5xx (via
getErrorStatus/isBanEligibleStatus) or truncation (isTruncatedResponse),
allowing mid-stream timeout/429/auth/connection-reset errors to fall through to
the generic response.failed path; add a reusable check (e.g.,
isRetryableStreamError(streamErr)) that detects retryable mid-stream failures
(status 429, auth failures, timeouts, ECONNRESET, network errors, etc.), call it
alongside the 5xx/truncation checks, and if route.platform === 'longcat' perform
the same ban actions (banPlatformFromSession(normalizedMessages, routingMode,
'longcat', route.modelDbId) and addProviderModelsToSkipModels(skipModels,
'longcat')) before ending the stream and logging so LongCat is excluded
consistently on any retryable mid-stream failure.
---
Nitpick comments:
In @.roo/specs/longcat-sticky-cooldown/tasks.md:
- Around line 6-8: Cleanup the markdown emphasis markers in the checklist lines
by removing the extra spaces inside bold markers so they read "**Replace**" and
"**Update**" instead of "** Replace**" and "** Update**"; locate the two
checklist items referencing replacement of cooldown logic and updating unit
tests (the tasks labeled 2 and 4) and edit their text to remove the inner-space
in the bold delimiters to satisfy markdownlint MD037 while keeping the rest of
the sentences unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 233973c3-d3bf-4112-bddc-808be094799e
📒 Files selected for processing (13)
.roo/specs/longcat-session-ban/design.md.roo/specs/longcat-session-ban/requirements.md.roo/specs/longcat-session-ban/tasks.md.roo/specs/longcat-sticky-cooldown/design.md.roo/specs/longcat-sticky-cooldown/requirements.md.roo/specs/longcat-sticky-cooldown/tasks.md.roo/specs/provider-5xx-session-ban/design.md.roo/specs/provider-5xx-session-ban/requirements.md.roo/specs/provider-5xx-session-ban/tasks.mdserver/src/__tests__/routes/provider-session-ban.test.tsserver/src/__tests__/routes/proxy-tools.test.tsserver/src/routes/proxy.tsserver/src/services/router.ts
|
|
||
| ## Architecture Overview | ||
|
|
||
| The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires minimal changes — only the LongCat smart-auto preference needs to respect session bans. All ban detection and session management happens in the proxy layer. |
There was a problem hiding this comment.
Clarify router change requirements.
Line 5 states "the router requires minimal changes — only the LongCat smart-auto preference needs to respect session bans," but lines 374-380 conclude "No router changes needed." This contradiction creates ambiguity about whether router.ts modifications are in scope.
📝 Suggested clarification
Revise line 5 to align with the final decision:
-The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires minimal changes — only the LongCat smart-auto preference needs to respect session bans. All ban detection and session management happens in the proxy layer.
+The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires no changes — the proxy passes `skipModels` containing all LongCat model IDs, and the router's existing skip logic handles the ban automatically. All ban detection and session management happens in the proxy layer.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires minimal changes — only the LongCat smart-auto preference needs to respect session bans. All ban detection and session management happens in the proxy layer. | |
| The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires no changes — the proxy passes `skipModels` containing all LongCat model IDs, and the router's existing skip logic handles the ban automatically. All ban detection and session management happens in the proxy layer. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/design.md at line 5, The doc is contradictory
about router changes: update the sentence that mentions "the router requires
minimal changes — only the LongCat smart-auto preference needs to respect
session bans" to instead state that no changes to router.ts are required; keep
the rest as-is and explicitly note that all ban detection and session management
is implemented in proxy.ts and integrated into handleChatCompletion(), and
reference LongCat smart-auto as unaffected so readers know router.ts does not
need edits.
|
|
||
| ### 2. `banPlatformFromSession()` — [`proxy.ts`](server/src/routes/proxy.ts) | ||
|
|
||
| Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat. |
There was a problem hiding this comment.
Fix subject-verb agreement.
"truncation, auth, or rate-limit errors is detected" should use "are" for the plural subject.
✏️ Proposed fix
-Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat.
+Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors are detected on LongCat.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat. | |
| Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors are detected on LongCat. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/design.md at line 85, In the sentence
starting "Records a platform ban in the sticky session. Called when truncation,
auth, or rate-limit errors is detected on LongCat.", correct the subject-verb
agreement by changing "is" to "are" so it reads "Called when truncation, auth,
or rate-limit errors are detected on LongCat." Use the same surrounding text
("Records a platform ban in the sticky session" / "LongCat") to locate the line
in the design.md.
| When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to: | ||
|
|
||
| 1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL) |
There was a problem hiding this comment.
Fix grammar and formatting in Overview section.
Multiple issues:
- Line 5: "must to:" should be "must:"
- Line 7: Missing spaces after commas in the numbered list
✏️ Proposed fixes
-When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to:
+When a LongCat sticky session encounters an error — whether it's an auth failure (401/403) or a "truncated" / "conflict" error from the provider — the system must:
-1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL)
+1. Detect the error
+2. Ban the entire `longcat` platform for this sticky session
+3. Fall back to the next best non-LongCat model via normal routing
+4. Update the sticky session to point to the new fallback model
+5. Never route this session to LongCat again (until the session expires via TTL)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/requirements.md around lines 5 - 7, Fix the
grammar in the Overview sentence and spacing in the numbered list: change "the
system must to:" to "the system must:" in the sentence starting "When a LongCat
sticky session encounters an error — whether it's an auth failure (401/40403) or
a 'truncated' / 'conflict' error from the provider — the system must to:" and
ensure there are spaces after commas in the numbered list items (e.g., "1.
Detect the error, 2. Ban..." → "1. Detect the error, 2. Ban..." with proper
spacing after each comma) so the sentence reads correctly and the list is
properly spaced.
| The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:13-112). and It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, + optional `keyId` + `lastUsed` } with a 30-min TTL and 500-entry max. | ||
|
|
||
| and eviction. | ||
|
|
||
| The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description | |
There was a problem hiding this comment.
Clean up Context section formatting and grammar.
The Context section contains multiple formatting errors:
- Extra period and incorrect capitalization: ". and It uses"
- Malformed inline code with "+ optional"
- Incomplete sentences: "and eviction." and standalone "and"
- Run-on text that is difficult to parse
🧰 Tools
🪛 LanguageTool
[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...within the same LongCat model, which is exactly what LongCat detects as "multiple API k...
(EXACTLY_PRECISELY)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...404, 408, 409, 422, 500, 502, 503, 504, andrate limit, quota, aborted, timeout`, ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...proxy will retry with a different model/keykey. However, [shouldSkipModelOnRetry()](...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...otation within the same model, which is exactly the behavior LongCat detects as "multip...
(EXACTLY_PRECISELY)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...89-98) only clears the sticky key but **keoes preferredKeyId to undefined — but t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.22.1)
[warning] 11-11: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/requirements.md around lines 11 - 15, The
Context section has multiple formatting and grammar issues—remove the stray
period and fix capitalization in the opening sentence (replace ". and It uses"
with "and it uses"), normalize the inline code snippet that currently reads "`{
modelDbId, + optional `keyId` + `lastUsed` }`" to a clear code span like `{
modelDbId, optional keyId, lastUsed }`, delete or merge the orphan fragments
("and eviction." and the standalone "and"), break up and rewrite the run-on
paragraph so each idea is a clear sentence referencing the sticky session
implementation in server/src/routes/proxy.ts and the longcat-sticky-key spec,
and ensure references to functions/files (clearStickyKey(), preferredModel,
routeRequest(), shouldSkipModelOnRetry(), isRetryableError()) remain correct and
readable.
| | | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat | | ||
| | | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path | | ||
| | | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions | | ||
| | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | |
There was a problem hiding this comment.
Fix duplicate row numbering in files table.
The table has two rows numbered as row 11 (lines 27-28). This creates ambiguity in the modification checklist.
📋 Suggested fix
Renumber the second occurrence of row 11 to row 12:
| | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions |
- | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |
+ | | 12 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat | | |
| | | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path | | |
| | | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions | | |
| | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | | |
| | | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat | | |
| | | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path | | |
| | | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions | | |
| | | 12 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/requirements.md around lines 24 - 27, The
table in requirements.md has duplicated row number "11"; update the second
occurrence (the row listing `server/src/routes/proxy.ts` with range 1036-1053)
to "12" so row numbers are unique and sequential; ensure any subsequent rows are
renumbered accordingly if needed.
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'longcat', | ||
| key: 'lc_cooldown_active_test', | ||
| label: 'cooldown-active-longcat', | ||
| }); | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'groq', | ||
| key: 'gsk_cooldown_active_test', | ||
| label: 'cooldown-active-groq', | ||
| }); |
There was a problem hiding this comment.
Use the admin key for these /api/keys setup calls.
These tests seed providers through /api/keys, but they rely on a helper that only attaches auth for /v1/*. That means the new coverage is not exercising the route-gating contract and can mask overlap regressions between admin and unified credentials.
As per coding guidelines, The admin key must gate /api/* routes; the unified API key must gate /v1/* routes — they must never overlap, and using one against the wrong route returns 401.
Also applies to: 909-918, 989-998, 1058-1062, 1123-1127, 1176-1180
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 840 - 849,
Update the test POST calls to /api/keys to use the admin credential instead of
the unified/v1 helper; specifically, for each request(app, 'POST', '/api/keys',
{...}) replace or augment the call so it supplies the admin API key (the header
used by admin-gated routes, e.g., Authorization: `Bearer <adminKey>` or the test
helper variant that attaches the admin key) so these seeds exercise the /api/*
admin gating (affects the calls shown and the other occurrences at the ranges
noted); keep the route and payload unchanged but ensure auth is the admin key,
not the v1/unified helper.
| it('excludes LongCat from bandit routing for non-sticky sessions during cooldown', async () => { | ||
| const db = getDb(); | ||
|
|
||
| // Set up a sticky session on LongCat for ONE session (within cooldown) | ||
| const stickyMessages = makeMessages('sticky longcat session during cooldown'); | ||
| const stickyKey = getSessionKey(stickyMessages, 'balanced'); | ||
| const longcatRow = db.prepare('SELECT id FROM models WHERE platform = ? AND enabled = 1').get('longcat') as { id: number } | undefined; | ||
| expect(longcatRow).toBeDefined(); | ||
| (stickySessionMap as Map<any, any>).set(stickyKey, { | ||
| modelDbId: longcatRow!.id, | ||
| lastUsed: Date.now() - 1000, // within cooldown | ||
| }); | ||
|
|
||
| // Add keys for both providers | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'longcat', | ||
| key: 'lc_cooldown_exclusion_test', | ||
| label: 'cooldown-exclusion-longcat', | ||
| }); | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'groq', | ||
| key: 'gsk_cooldown_exclusion_test', | ||
| label: 'cooldown-exclusion-groq', | ||
| }); | ||
|
|
||
| // Now send a request from a DIFFERENT session (no sticky LongCat) | ||
| // This session should NOT be able to route to LongCat because the | ||
| // other session's cooldown excludes LongCat from the bandit router. | ||
| // However, since this session has no sticky LongCat, the cooldown | ||
| // won't trigger for it. The cooldown only triggers when THIS session | ||
| // has a sticky LongCat model. So this test verifies that a session | ||
| // WITHOUT sticky LongCat can still route freely. | ||
| // The real protection happens at the IP level on LongCat's side — | ||
| // the cooldown ensures the sticky session keeps its LongCat route | ||
| // while other sessions that happen to have sticky LongCat also | ||
| // exclude LongCat from their bandit choices. | ||
| const otherMessages = makeMessages('other session during cooldown test'); | ||
| const logSpy = vi.spyOn(console, 'log'); | ||
| const origFetch = global.fetch; | ||
| let routedProvider = ''; | ||
|
|
||
| vi.spyOn(global, 'fetch').mockImplementation(async (url, init) => { | ||
| const urlStr = typeof url === 'string' ? url : url.toString(); | ||
| if (urlStr.startsWith('http://127.0.0.1') || urlStr.startsWith('http://localhost')) { | ||
| return origFetch(url, init); | ||
| } | ||
| if (!urlStr.includes('/chat/completions')) return origFetch(url, init); | ||
|
|
||
| if (urlStr.includes('longcat')) routedProvider = 'longcat'; | ||
| else if (urlStr.includes('groq')) routedProvider = 'groq'; | ||
|
|
||
| const body = JSON.parse((init as any).body); | ||
| return { | ||
| ok: true, | ||
| json: () => Promise.resolve({ | ||
| id: 'chatcmpl-other-session', | ||
| object: 'chat.completion', | ||
| created: 123, | ||
| model: body.model, | ||
| choices: [{ | ||
| index: 0, | ||
| message: { role: 'assistant', content: 'other session test response' }, | ||
| finish_reason: 'stop', | ||
| }], | ||
| usage: { prompt_tokens: 4, completion_tokens: 2, total_tokens: 6 }, | ||
| }), | ||
| } as any; | ||
| }); | ||
|
|
||
| const { status } = await request(app, 'POST', '/v1/chat/completions', { | ||
| messages: otherMessages, | ||
| }); | ||
|
|
||
| expect(status).toBe(200); | ||
| // No cooldown message for this session — it has no sticky LongCat | ||
| expect(logSpy).not.toHaveBeenCalledWith( | ||
| expect.stringContaining('[Sticky] LongCat cooldown active') | ||
| ); | ||
| }); |
There was a problem hiding this comment.
This test never verifies non-sticky-session exclusion.
The test name says LongCat should be excluded for other sessions during cooldown, but the body explicitly documents the opposite behavior and never asserts routedProvider. As written, it passes whether the second session routes to LongCat or not, so it doesn't protect the PR objective.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 895 - 973, The
test never asserts which provider was chosen, so it doesn't verify
non-sticky-session exclusion; update the test (the "excludes LongCat..." spec)
to assert the mocked routing by checking the routedProvider after the POST —
e.g. expect(routedProvider).not.toBe('longcat') or
expect(routedProvider).toBe('groq') — and keep the existing session/key setup
that uses makeMessages and getSessionKey with stickySessionMap to ensure the
otherMessages represent a different session; ensure the mocked fetch spy
assigning routedProvider is in scope so the assertion observes the chosen
provider.
| // LongCat sticky cooldown: if the sticky model is on LongCat and was used | ||
| // within the last 3 minutes, exclude LongCat from the bandit router for all | ||
| // other sessions. The current sticky session keeps its pinned LongCat route. | ||
| // This prevents LongCat from seeing multiple sessions/keys from the same IP. | ||
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === 'longcat') { | ||
| const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode); | ||
| const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined; | ||
| if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { | ||
| const ageMs = Date.now() - cooldownEntry.lastUsed; | ||
| addProviderModelsToSkipModels(skipModels, 'longcat'); | ||
| console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`); |
There was a problem hiding this comment.
This cooldown never applies to “other sessions.”
The branch only runs when the current request is already pinned to a LongCat preferredModel. Sessions that are not already sticky to LongCat never enter it, so they never add LongCat to skipModels. That means the advertised cross-session cooldown is effectively a no-op.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/routes/proxy.ts` around lines 1243 - 1256, The cooldown check
currently only runs when the current request is already pinned (preferredModel),
so it never excludes LongCat for other sessions; instead, scan stickySessionMap
for any sticky entry whose platform is 'longcat' and whose lastUsed is within
LONGCAT_STICKY_COOLDOWN_MS and whose session key differs from the current
session to trigger the exclusion. Concretely, compute the current session key
with getSessionKey(normalizedMessages, routingMode), then iterate
stickySessionMap.values() (or entries) looking for an entry with entry.platform
=== 'longcat' and Date.now() - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS and
entry.sessionKey !== currentSessionKey; if found, call
addProviderModelsToSkipModels(skipModels, 'longcat') (and log) — do this
outside/independent of preferredModel so other sessions see LongCat excluded.
| { | ||
| const streamTextToCheck = responseStreamContext ? responseStreamContext.outputText : streamedText; | ||
| if (isTruncatedResponse(streamTextToCheck)) { | ||
| if (route.platform === 'longcat') { | ||
| // LongCat: exclude entire provider immediately on truncation | ||
| console.warn(`[Proxy] Truncated stream content detected from LongCat — banning LongCat provider for session`); | ||
| banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId); | ||
| addProviderModelsToSkipModels(skipModels, 'longcat'); | ||
| } else { | ||
| // Non-LongCat: skip only this specific model, other models from same provider remain available | ||
| console.warn(`[Proxy] Truncated stream content detected from ${route.platform} — skipping model ${route.modelId} for session`); | ||
| skipModels.add(route.modelDbId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Don’t record truncated streams as successful sticky hits.
Once isTruncatedResponse() fires, this route has already been classified as a failure signal, but the code still falls through to recordSuccess() and setStickyModel(). That feeds bad success data back into routing and can re-pin the session to the same model/provider you just excluded.
Possible fix
{
const streamTextToCheck = responseStreamContext ? responseStreamContext.outputText : streamedText;
+ let wasTruncated = false;
if (isTruncatedResponse(streamTextToCheck)) {
+ wasTruncated = true;
if (route.platform === 'longcat') {
// LongCat: exclude entire provider immediately on truncation
console.warn(`[Proxy] Truncated stream content detected from LongCat — banning LongCat provider for session`);
banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId);
addProviderModelsToSkipModels(skipModels, 'longcat');
@@
}
}
+
+ if (wasTruncated) {
+ logRequest(route.platform, route.modelId, 'error', estimatedInputTokens, totalOutputTokens, Date.now() - start, ttfbMs, 'truncated response');
+ return;
+ }
}
@@
- recordSuccess(route.modelDbId);
- setStickyModel(normalizedMessages, route.modelDbId, routingMode, route.keyId);
- resetAllConsecutiveFailures(normalizedMessages, routingMode);
+ recordSuccess(route.modelDbId);
+ setStickyModel(normalizedMessages, route.modelDbId, routingMode, route.keyId);
+ resetAllConsecutiveFailures(normalizedMessages, routingMode);Also applies to: 1404-1407
|
|
||
| for (const entry of sorted) { | ||
| if (skipModels?.has(entry.model_db_id)) continue; | ||
| if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue; |
There was a problem hiding this comment.
Don't let sticky preference override hard model skips.
Line 539 makes preferredModelDbId bypass skipModels for every skip reason. That breaks the new non-LongCat failure flow: after proxy.ts adds the just-failed model to skipModels on a 5xx/truncation, a sticky session can still route straight back to that same model instead of falling back. Please split this into overrideable vs non-overrideable skips, or clear preferredModelDbId before rerouting on hard failures.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/services/router.ts` at line 539, The current check lets
preferredModelDbId bypass all skips; fix by splitting skipSets into
nonOverrideableSkips vs overrideableSkips and change the routing condition so
nonOverrideableSkips always win (e.g., if
(nonOverrideableSkips.has(entry.model_db_id)) continue; else if
(overrideableSkips.has(entry.model_db_id) && entry.model_db_id !==
preferredModelDbId) continue;), and update where skips are added (in proxy.ts)
to place hard failures into nonOverrideableSkips so sticky sessions cannot route
back to a hard-skipped model.
Summary
Replaces the previous
recordConsecutiveFailuremechanism (which banned entire providers after 2 consecutive 5xx for ALL providers) with differentiated failure handling:LongCat
banPlatformFromSession()+addProviderModelsToSkipModels()to block all LongCat modelsNon-LongCat providers
skipModels.add(route.modelDbId)), other models from the same provider remain availablesetCooldown) but allow other keys for the same model to be triedChanges in
server/src/routes/proxy.tsAll 156 existing tests pass. TypeScript compiles cleanly.
Summary by CodeRabbit
New Features
Tests