feat: longcat-sticky-cooldown#5
Conversation
Implements the LongCat session ban feature in proxy.ts. When LongCat detects multiple API key use (auth/rate-limit errors) or returns truncated responses, the platform is banned from the sticky session. Future requests in that session route to non-LongCat models. Changes: - Extend stickySessionMap with bannedPlatforms?: Set<string> - Add isSessionBannedFromPlatform() to check session bans - Add banPlatformFromSession() to record platform bans - Add addLongcatModelsToSkipModels() to skip all LongCat models - Add isTruncatedResponse() to detect truncation keywords - Update getStickyKey() to return undefined for banned platforms - Update setStickyModel() to preserve bannedPlatforms across updates - Update pre-routing logic to check bans before routing - Update error handling to ban LongCat on auth/rate-limit/truncation - Add truncation detection after stream completes - Add truncation detection in mid-stream error handling
… consecutive failures - Extend stickySessionMap with consecutiveFailures tracking per provider - Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures() - Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels - Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2) - Generalize truncation detection to all providers (post-stream + mid-stream) - Update getStickyKey() to check bannedPlatforms for any platform - Update pre-routing ban check to be generic (any banned platform) - Add success path counter reset on both streaming and non-streaming paths - Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels - Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases) - TypeScript compiles cleanly, all 150 tests pass
- Add isBanEligibleStatus() helper restricting to {500,502,503,504}
- Improve mid-stream truncation detection with aggregated error sources
- Pre-routing ban check now skips ALL banned platforms, not just preferredModel's
- Only clear preferredModel when provider is actually banned (not on first 5xx)
- Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)
📝 WalkthroughWalkthroughThis PR introduces sticky-session platform bans and a 3-minute LongCat cooldown safeguard, generalizing LongCat-specific logic to any provider. Session state now tracks banned platforms and per-provider 5xx failure counts, triggering bans after consecutive failures or truncation detection. All changes are in-memory with no DB schema modifications. ChangesSession Ban & Cooldown Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request generalizes the session ban and fallback mechanisms from LongCat to all providers, introducing a 5xx consecutive failure ban (after two consecutive 5xx errors) and extending truncation detection to all platforms. It also adds a three-minute cooldown safeguard for LongCat sticky sessions to bypass sticky preferences during rapid-fire requests, along with comprehensive unit and integration tests. The reviewer feedback focuses on performance optimizations in server/src/routes/proxy.ts, specifically recommending the reuse of fetched session variables to avoid redundant database queries and TTL checks during pre-routing, and adding a fast-path check to bypass database queries when the preferred model matches the failed route model.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // Check if session is banned from any platform — add all banned platforms' models to skipModels | ||
| // and clear preferredModel/preferredKeyId if it points to a banned platform. | ||
| const skipModels = new Set<number>(); | ||
| const sessionKey = getSessionKey(normalizedMessages, routingMode); | ||
| if (sessionKey) { | ||
| const entry = stickySessionMap.get(sessionKey); | ||
| if (entry) { | ||
| if (Date.now() - entry.lastUsed > STICKY_TTL_MS) { | ||
| stickySessionMap.delete(sessionKey); | ||
| } else if (entry.bannedPlatforms) { | ||
| for (const platform of entry.bannedPlatforms) { | ||
| addProviderModelsToSkipModels(skipModels, platform); | ||
| console.log(`[Sticky] session banned from ${platform}, adding to skipModels`); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow && isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) { | ||
| console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`); | ||
| preferredModel = undefined; | ||
| preferredKeyId = undefined; | ||
| } | ||
| } | ||
|
|
||
| // LongCat sticky cooldown: if the sticky model is on LongCat and was used | ||
| // within the last 3 minutes, bypass sticky preference for this request only. | ||
| // The bandit router picks freely — it may still route to LongCat organically. | ||
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === 'longcat') { | ||
| const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode); | ||
| const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined; | ||
| if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { | ||
| const ageMs = Date.now() - cooldownEntry.lastUsed; | ||
| console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`); | ||
| preferredModel = undefined; | ||
| preferredKeyId = undefined; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
There is a significant opportunity to optimize the pre-routing ban and cooldown checks. Currently, the code performs redundant database queries (SELECT platform FROM models WHERE id = ?), redundant session key lookups, and redundant TTL checks by calling isSessionBannedFromPlatform and re-fetching the session key/entry for the cooldown check.
By reusing the already fetched sessionKey and entry variables, and combining the database query for preferredModel's platform, we can eliminate all of these redundant operations on the hot path.
// Check if session is banned from any platform — add all banned platforms' models to skipModels
// and clear preferredModel/preferredKeyId if it points to a banned platform.
const skipModels = new Set<number>();
const sessionKey = getSessionKey(normalizedMessages, routingMode);
let entry = sessionKey ? stickySessionMap.get(sessionKey) : undefined;
if (entry) {
if (Date.now() - entry.lastUsed > STICKY_TTL_MS) {
stickySessionMap.delete(sessionKey);
entry = undefined;
} else if (entry.bannedPlatforms) {
for (const platform of entry.bannedPlatforms) {
addProviderModelsToSkipModels(skipModels, platform);
console.log(`[Sticky] session banned from ${platform}, adding to skipModels`);
}
}
}
if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow) {
if (entry?.bannedPlatforms?.has(prefRow.platform)) {
console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`);
preferredModel = undefined;
preferredKeyId = undefined;
} else if (prefRow.platform === 'longcat' && entry) {
if (Date.now() - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
const ageMs = Date.now() - entry.lastUsed;
console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${sessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
preferredModel = undefined;
preferredKeyId = undefined;
}
}
}
}| if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === route.platform) { | ||
| preferredModel = undefined; | ||
| preferredKeyId = undefined; | ||
| } | ||
| } |
There was a problem hiding this comment.
In the error handler, when a provider is banned, we query the database to check if the preferredModel's platform matches the failed route.platform.
In the vast majority of cases where the preferred model fails, preferredModel is exactly equal to route.modelDbId. We can add a fast-path check to compare these IDs directly, avoiding the database query entirely in the most common failure scenario.
| if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) { | |
| const db = getDb(); | |
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | |
| if (prefRow?.platform === route.platform) { | |
| preferredModel = undefined; | |
| preferredKeyId = undefined; | |
| } | |
| } | |
| if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) { | |
| if (preferredModel === route.modelDbId) { | |
| preferredModel = undefined; | |
| preferredKeyId = undefined; | |
| } else { | |
| const db = getDb(); | |
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | |
| if (prefRow?.platform === route.platform) { | |
| preferredModel = undefined; | |
| preferredKeyId = undefined; | |
| } | |
| } | |
| } |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.roo/specs/longcat-session-ban/requirements.md:
- Line 15: The requirements block around the LongCat sticky session logic is
corrupted and must be rewritten: replace the malformed paragraph (around the
broken merged sentences) with a clean, numbered FR list where each functional
requirement is a single concise bullet referencing the exact code symbols
(clearStickyKey, shouldSkipModelOnRetry, isRetryableError, routeRequest,
router.ts smart-auto logic, stickySessionMap, banPlatformFromSession) so
implementers can find the relevant code paths; fix all typos/garbled tokens
(e.g., "401/403", "model/keykey", "TheisRetryableError", "keoes") and ensure
each FR (FR-1..FR-13) is atomic, references the function(s) it affects, and maps
to the corresponding behavior (detection patterns, ban mechanism, skipModels
changes, TTL/expiry, no DB schema change, minimal router tweak, tests) so
reviewers and engineers can implement unambiguously.
- Around line 24-28: The requirements table contains duplicated/malformed rows
(duplicate index 11 and prefixes like "| | 8"/"| | 11") which breaks
traceability; open .roo/specs/longcat-session-ban/requirements.md, remove the
extra duplicate row, renumber the affected rows sequentially, and normalize the
row prefixes so each entry starts with a single pipe and correct index (e.g.,
fix the entries referencing server/src/routes/proxy.ts and
server/src/services/router.ts to have unique indices and consistent "| <index> |
[file](file:range) | Edit | description |" formatting) so all links and indices
are unique and well-formed.
In @.roo/specs/longcat-sticky-cooldown/design.md:
- Around line 56-63: The fenced code block shown (the sequence starting with "1.
preferredModel = getStickyModel(...)" through "6. Retry loop with
routeRequest(...)") lacks a language tag; update the block opening from ``` to a
tagged fence like ```text or ```typescript to satisfy markdown linting and
improve rendering while leaving the block contents unchanged—locate the block in
the longcat-sticky-cooldown design section (the snippet around the numbered
steps referencing getStickyModel, getStickyKey, bannedPlatforms, and
routeRequest) and add the language identifier.
In @.roo/specs/provider-5xx-session-ban/design.md:
- Around line 58-59: The markdown links in design.md that reference server code
(e.g., links pointing to proxy.ts via "../server/src/...") are using incorrect
relative paths; update each occurrence to a normalized path that resolves from
this document (for example remove the leading ".." to use "server/src/..." or
convert to a repo-root absolute/blob URL) so reviewers can click through to the
actual file (search for the mentions of proxy.ts and the other similar
"../server/src/..." links and replace them consistently).
In @.roo/specs/provider-5xx-session-ban/requirements.md:
- Around line 16-21: Update the incorrect relative links in this requirements.md
so they resolve from this file's location: replace the broken
"../server/src/routes/proxy.ts" and "../longcat-session-ban/" references with
correct relative paths that point to server/src/routes/proxy.ts and the
longcat-session-ban spec from this file (e.g., adjust to
"../../server/src/routes/proxy.ts" or the appropriate number of ../ segments);
ensure both the proxy file link and the longcat-session-ban spec link are valid
and navigable in-repo.
In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 194-205: The test sets DB pragmas and a transaction manually but
does not guarantee cleanup on failures; update the test around getDb(), the
PRAGMA/BEGIN/DELETE calls and the addProviderModelsToSkipModels assertion so
that PRAGMA foreign_keys and the transaction are always restored using a
try/finally: wrap the DB mutation and expect call in try, call
db.prepare('ROLLBACK').run() and db.prepare('PRAGMA foreign_keys = ON').run() in
the finally block so rollback and pragma restoration always happen even if the
assertion throws.
In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 840-849: The test setup calls POST /api/keys without admin auth,
which can hide misconfigured key-gating; change the key creation calls in
proxy-tools.test.ts to use the admin-authenticated helper (use the admin key
when calling request for '/api/keys', e.g., the same pattern used elsewhere in
the file) and then add at least one negative assertion that a unified API key
(the v1/unified key created) used against an /api/* route returns 401; locate
the request(...) calls that create keys and the later proxy calls (symbols:
request, '/api/keys', and the unified key variable/name) and update them so
admin auth is used for key setup and add a test asserting 401 when the unified
key is used against an /api/* endpoint.
In `@server/src/routes/proxy.ts`:
- Around line 212-215: The truncation-detection boolean is too broad because it
treats any occurrence of "conflict" as a truncation signal; update the
expression that checks the lower variable in server/src/routes/proxy.ts (the
return statement currently using lower.includes(...)) by removing the
lower.includes('conflict') clause so ordinary conflict errors no longer trigger
a truncation/ban; keep the other specific matchers (e.g., 'truncated',
'context_length_exceeded', 'token_limit', etc.), and run related tests or linter
to ensure no syntax regressions.
- Around line 1243-1257: The cooldown block should not run for client-explicit
model requests; update the condition that starts with "if (preferredModel)" so
it first checks that the preferred model was not explicitly requested by the
client (e.g., a boolean like explicitModelRequested or req.body.model presence).
In practice, add/use the existing flag that indicates an explicit client model
(or create one where routing decides preferredModel) and change the guard to
something like "if (preferredModel && !explicitModelRequested)" before calling
getSessionKey, stickySessionMap, and checking LONGCAT_STICKY_COOLDOWN_MS so
pinned models remain honored for explicit requests (leave preferredModel and
preferredKeyId untouched when explicitModelRequested is true).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 1a1395ab-919c-4e24-be41-3b291c63270c
📒 Files selected for processing (12)
.roo/specs/longcat-session-ban/design.md.roo/specs/longcat-session-ban/requirements.md.roo/specs/longcat-session-ban/tasks.md.roo/specs/longcat-sticky-cooldown/design.md.roo/specs/longcat-sticky-cooldown/requirements.md.roo/specs/longcat-sticky-cooldown/tasks.md.roo/specs/provider-5xx-session-ban/design.md.roo/specs/provider-5xx-session-ban/requirements.md.roo/specs/provider-5xx-session-ban/tasks.mdserver/src/__tests__/routes/provider-session-ban.test.tsserver/src/__tests__/routes/proxy-tools.test.tsserver/src/routes/proxy.ts
|
|
||
| and eviction. | ||
|
|
||
| The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description | |
There was a problem hiding this comment.
Corrupted requirements block needs rewrite before this spec is reliable.
Line 15 is heavily malformed (merged sentences, broken references, duplicated/garbled phrases like 401/40403, model/keykey, TheisRetryableError, keoes). This is likely to mislead implementation and review decisions.
Please split this into clean FR subsections with one requirement per bullet and valid code references.
🧰 Tools
🪛 LanguageTool
[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...within the same LongCat model, which is exactly what LongCat detects as "multiple API k...
(EXACTLY_PRECISELY)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...404, 408, 409, 422, 500, 502, 503, 504, andrate limit, quota, aborted, timeout`, ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...proxy will retry with a different model/keykey. However, [shouldSkipModelOnRetry()](...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...otation within the same model, which is exactly the behavior LongCat detects as "multip...
(EXACTLY_PRECISELY)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...89-98) only clears the sticky key but **keoes preferredKeyId to undefined — but t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.22.1)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/requirements.md at line 15, The requirements
block around the LongCat sticky session logic is corrupted and must be
rewritten: replace the malformed paragraph (around the broken merged sentences)
with a clean, numbered FR list where each functional requirement is a single
concise bullet referencing the exact code symbols (clearStickyKey,
shouldSkipModelOnRetry, isRetryableError, routeRequest, router.ts smart-auto
logic, stickySessionMap, banPlatformFromSession) so implementers can find the
relevant code paths; fix all typos/garbled tokens (e.g., "401/403",
"model/keykey", "TheisRetryableError", "keoes") and ensure each FR (FR-1..FR-13)
is atomic, references the function(s) it affects, and maps to the corresponding
behavior (detection patterns, ban mechanism, skipModels changes, TTL/expiry, no
DB schema change, minimal router tweak, tests) so reviewers and engineers can
implement unambiguously.
| | | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat | | ||
| | | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path | | ||
| | | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions | | ||
| | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | | ||
| | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | ## Out of Scope | - Persistent bans across server restarts ( in-memory only, same as existing sticky sessions) - Changes to the Thompson Sampling algorithm itself - Changes to rate limiting logic - - Changes to the fallback chain ordering in balanced mode - - Client-side UI changes - - Configuration UI for enabling/disabling bans per provider ( hardcoded to LongCat only - Changes to the `OpenAICompatProvider` class | No newline at end of file |
There was a problem hiding this comment.
Fix duplicated/malformed modification table entries.
The table has duplicated index 11 and malformed row prefixes (| | 8, | | 11), which breaks traceability from requirements to implementation tasks.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~28-~28: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... Sampling algorithm itself - Changes to rate limiting logic - - Changes to the fallback chain...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/requirements.md around lines 24 - 28, The
requirements table contains duplicated/malformed rows (duplicate index 11 and
prefixes like "| | 8"/"| | 11") which breaks traceability; open
.roo/specs/longcat-session-ban/requirements.md, remove the extra duplicate row,
renumber the affected rows sequentially, and normalize the row prefixes so each
entry starts with a single pipe and correct index (e.g., fix the entries
referencing server/src/routes/proxy.ts and server/src/services/router.ts to have
unique indices and consistent "| <index> | [file](file:range) | Edit |
description |" formatting) so all links and indices are unique and well-formed.
| ``` | ||
| 1. preferredModel = getStickyModel(...) // line 1199 | ||
| 2. preferredKeyId = getStickyKey(...) // line 1207-1212 | ||
| 3. skipModels from bannedPlatforms // line 1216-1230 | ||
| 4. Clear preferredModel if on banned platform // line 1232-1240 | ||
| 5. ← INSERT COOLDOWN CHECK HERE | ||
| 6. Retry loop with routeRequest(...) // line 1247+ | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced code block.
This block should be annotated (e.g., text or typescript) to satisfy markdown linting and improve rendering consistency.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 56-56: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-sticky-cooldown/design.md around lines 56 - 63, The
fenced code block shown (the sequence starting with "1. preferredModel =
getStickyModel(...)" through "6. Retry loop with routeRequest(...)") lacks a
language tag; update the block opening from ``` to a tagged fence like ```text
or ```typescript to satisfy markdown linting and improve rendering while leaving
the block contents unchanged—locate the block in the longcat-sticky-cooldown
design section (the snippet around the numbered steps referencing
getStickyModel, getStickyKey, bannedPlatforms, and routeRequest) and add the
language identifier.
| Current value type at [`proxy.ts:16`](../server/src/routes/proxy.ts:16): | ||
| ```typescript |
There was a problem hiding this comment.
Fix relative links to server/src/... paths.
From this directory depth, ../server/... likely won’t resolve correctly. Please normalize links so reviewers can navigate directly to referenced code locations.
Also applies to: 83-84, 123-124, 145-146, 165-166
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/provider-5xx-session-ban/design.md around lines 58 - 59, The
markdown links in design.md that reference server code (e.g., links pointing to
proxy.ts via "../server/src/...") are using incorrect relative paths; update
each occurrence to a normalized path that resolves from this document (for
example remove the leading ".." to use "server/src/..." or convert to a
repo-root absolute/blob URL) so reviewers can click through to the actual file
(search for the mentions of proxy.ts and the other similar "../server/src/..."
links and replace them consistently).
| The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts:16). It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, keyId?, bannedPlatforms?, lastUsed }` with a 30-min TTL and 500-entry max. | ||
|
|
||
| The existing LongCat session ban ([`longcat-session-ban` spec](../longcat-session-ban/)) added `bannedPlatforms`, `banPlatformFromSession()`, `isSessionBannedFromPlatform()`, `addLongcatModelsToSkipModels()`, and `isTruncatedResponse()`. This spec generalizes that infrastructure: the `bannedPlatforms` set and ban helper functions are reused, the `isTruncatedResponse()` function is retained and generalized to all providers, and the LongCat-specific auth/rate-limit error detection is replaced by general 5xx consecutive failure tracking. | ||
|
|
||
| The retry loop in `handleChatCompletion()` currently has LongCat-specific error handling at lines 1383-1402 that bans LongCat on auth errors and rate-limit errors. This is replaced by general 5xx consecutive failure detection that works for any provider. The truncation detection is retained but generalized from LongCat-only to all providers. | ||
|
|
There was a problem hiding this comment.
Correct relative code links in the Context section.
The ../server/src/... references are likely invalid from this file’s path. Please update to the correct relative path so links are navigable in-repo.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/provider-5xx-session-ban/requirements.md around lines 16 - 21,
Update the incorrect relative links in this requirements.md so they resolve from
this file's location: replace the broken "../server/src/routes/proxy.ts" and
"../longcat-session-ban/" references with correct relative paths that point to
server/src/routes/proxy.ts and the longcat-session-ban spec from this file
(e.g., adjust to "../../server/src/routes/proxy.ts" or the appropriate number of
../ segments); ensure both the proxy file link and the longcat-session-ban spec
link are valid and navigable in-repo.
| it('handles empty provider model list gracefully', () => { | ||
| const db = getDb(); | ||
| db.prepare('PRAGMA foreign_keys = OFF').run(); | ||
| db.prepare('BEGIN').run(); | ||
| db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run(); | ||
| db.prepare("DELETE FROM models WHERE platform = 'longcat'").run(); | ||
| const skipModels = new Set<number>(); | ||
| expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow(); | ||
| expect(skipModels.size).toBe(0); | ||
| db.prepare('ROLLBACK').run(); | ||
| db.prepare('PRAGMA foreign_keys = ON').run(); | ||
| }); |
There was a problem hiding this comment.
Always restore DB pragmas/transaction state with finally.
If this test fails before ROLLBACK/PRAGMA ... ON, later tests can run with leaked DB state and become flaky.
Suggested fix
- db.prepare('PRAGMA foreign_keys = OFF').run();
- db.prepare('BEGIN').run();
- db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
- db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
- const skipModels = new Set<number>();
- expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
- expect(skipModels.size).toBe(0);
- db.prepare('ROLLBACK').run();
- db.prepare('PRAGMA foreign_keys = ON').run();
+ db.prepare('PRAGMA foreign_keys = OFF').run();
+ db.prepare('BEGIN').run();
+ try {
+ db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
+ db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
+ const skipModels = new Set<number>();
+ expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
+ expect(skipModels.size).toBe(0);
+ } finally {
+ db.prepare('ROLLBACK').run();
+ db.prepare('PRAGMA foreign_keys = ON').run();
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('handles empty provider model list gracefully', () => { | |
| const db = getDb(); | |
| db.prepare('PRAGMA foreign_keys = OFF').run(); | |
| db.prepare('BEGIN').run(); | |
| db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run(); | |
| db.prepare("DELETE FROM models WHERE platform = 'longcat'").run(); | |
| const skipModels = new Set<number>(); | |
| expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow(); | |
| expect(skipModels.size).toBe(0); | |
| db.prepare('ROLLBACK').run(); | |
| db.prepare('PRAGMA foreign_keys = ON').run(); | |
| }); | |
| it('handles empty provider model list gracefully', () => { | |
| const db = getDb(); | |
| db.prepare('PRAGMA foreign_keys = OFF').run(); | |
| db.prepare('BEGIN').run(); | |
| try { | |
| db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run(); | |
| db.prepare("DELETE FROM models WHERE platform = 'longcat'").run(); | |
| const skipModels = new Set<number>(); | |
| expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow(); | |
| expect(skipModels.size).toBe(0); | |
| } finally { | |
| db.prepare('ROLLBACK').run(); | |
| db.prepare('PRAGMA foreign_keys = ON').run(); | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/__tests__/routes/provider-session-ban.test.ts` around lines 194 -
205, The test sets DB pragmas and a transaction manually but does not guarantee
cleanup on failures; update the test around getDb(), the PRAGMA/BEGIN/DELETE
calls and the addProviderModelsToSkipModels assertion so that PRAGMA
foreign_keys and the transaction are always restored using a try/finally: wrap
the DB mutation and expect call in try, call db.prepare('ROLLBACK').run() and
db.prepare('PRAGMA foreign_keys = ON').run() in the finally block so rollback
and pragma restoration always happen even if the assertion throws.
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'longcat', | ||
| key: 'lc_cooldown_active_test', | ||
| label: 'cooldown-active-longcat', | ||
| }); | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'groq', | ||
| key: 'gsk_cooldown_active_test', | ||
| label: 'cooldown-active-groq', | ||
| }); |
There was a problem hiding this comment.
Use admin-authenticated /api/keys setup (and lock key-separation behavior).
These new tests post to /api/keys without explicitly using the admin key, so they can pass even if /api/* key-gating is misconfigured. Please route setup through explicit admin auth and add at least one negative check that unified key on /api/* returns 401.
As per coding guidelines, "The admin key must gate /api/* routes; the unified API key must gate /v1/* routes — they must never overlap, and using one against the wrong route returns 401".
Also applies to: 904-913, 973-977, 1038-1042, 1091-1095
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 840 - 849, The
test setup calls POST /api/keys without admin auth, which can hide misconfigured
key-gating; change the key creation calls in proxy-tools.test.ts to use the
admin-authenticated helper (use the admin key when calling request for
'/api/keys', e.g., the same pattern used elsewhere in the file) and then add at
least one negative assertion that a unified API key (the v1/unified key created)
used against an /api/* route returns 401; locate the request(...) calls that
create keys and the later proxy calls (symbols: request, '/api/keys', and the
unified key variable/name) and update them so admin auth is used for key setup
and add a test asserting 401 when the unified key is used against an /api/*
endpoint.
| return lower.includes('truncated') || lower.includes('truncation') || | ||
| lower.includes('context_length_exceeded') || lower.includes('token_limit') || | ||
| lower.includes('maximum length') || lower.includes('response_length_limit') || | ||
| lower.includes('conflict'); |
There was a problem hiding this comment.
Remove broad conflict truncation matcher to avoid false bans.
Line 215 matches any "conflict" text as truncation. With provider-agnostic ban logic, ordinary conflict errors can now incorrectly ban a platform for the session.
Suggested fix
return lower.includes('truncated') || lower.includes('truncation') ||
lower.includes('context_length_exceeded') || lower.includes('token_limit') ||
lower.includes('maximum length') || lower.includes('response_length_limit') ||
- lower.includes('conflict');
+ false;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return lower.includes('truncated') || lower.includes('truncation') || | |
| lower.includes('context_length_exceeded') || lower.includes('token_limit') || | |
| lower.includes('maximum length') || lower.includes('response_length_limit') || | |
| lower.includes('conflict'); | |
| return lower.includes('truncated') || lower.includes('truncation') || | |
| lower.includes('context_length_exceeded') || lower.includes('token_limit') || | |
| lower.includes('maximum length') || lower.includes('response_length_limit') || | |
| false; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/routes/proxy.ts` around lines 212 - 215, The truncation-detection
boolean is too broad because it treats any occurrence of "conflict" as a
truncation signal; update the expression that checks the lower variable in
server/src/routes/proxy.ts (the return statement currently using
lower.includes(...)) by removing the lower.includes('conflict') clause so
ordinary conflict errors no longer trigger a truncation/ban; keep the other
specific matchers (e.g., 'truncated', 'context_length_exceeded', 'token_limit',
etc.), and run related tests or linter to ensure no syntax regressions.
| // LongCat sticky cooldown: if the sticky model is on LongCat and was used | ||
| // within the last 3 minutes, bypass sticky preference for this request only. | ||
| // The bandit router picks freely — it may still route to LongCat organically. | ||
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === 'longcat') { | ||
| const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode); | ||
| const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined; | ||
| if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { | ||
| const ageMs = Date.now() - cooldownEntry.lastUsed; | ||
| console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`); | ||
| preferredModel = undefined; | ||
| preferredKeyId = undefined; | ||
| } |
There was a problem hiding this comment.
Do not apply sticky cooldown to explicit model requests.
At Line 1246, cooldown applies whenever preferredModel exists, including when the client explicitly set model. That can override a pinned model and silently auto-route.
Suggested fix
- if (preferredModel) {
+ if (preferredModel && !requestedModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === 'longcat') {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/routes/proxy.ts` around lines 1243 - 1257, The cooldown block
should not run for client-explicit model requests; update the condition that
starts with "if (preferredModel)" so it first checks that the preferred model
was not explicitly requested by the client (e.g., a boolean like
explicitModelRequested or req.body.model presence). In practice, add/use the
existing flag that indicates an explicit client model (or create one where
routing decides preferredModel) and change the guard to something like "if
(preferredModel && !explicitModelRequested)" before calling getSessionKey,
stickySessionMap, and checking LONGCAT_STICKY_COOLDOWN_MS so pinned models
remain honored for explicit requests (leave preferredModel and preferredKeyId
untouched when explicitModelRequested is true).
Implements the
longcat-sticky-cooldownspec.Adds a 3-minute cooldown safeguard for LongCat sticky sessions: when a sticky session's preferred model is on the LongCat platform and was used within the last 3 minutes, the sticky preference is temporarily bypassed for that request only, allowing the bandit router to pick freely (it may still route to LongCat organically via the smart-mode boost).
Changes:
LONGCAT_STICKY_COOLDOWN_MSconstant (3 min) inserver/src/routes/proxy.tshandleChatCompletion()after the ban check, before the retry loopSummary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests