feat: longcat-sticky-cooldown by vi70x3 · Pull Request #5 · animaios/api-llm-localhost

vi70x3 · 2026-06-01T22:15:43Z

Implements the longcat-sticky-cooldown spec.

Adds a 3-minute cooldown safeguard for LongCat sticky sessions: when a sticky session's preferred model is on the LongCat platform and was used within the last 3 minutes, the sticky preference is temporarily bypassed for that request only, allowing the bandit router to pick freely (it may still route to LongCat organically via the smart-mode boost).

Changes:

Added LONGCAT_STICKY_COOLDOWN_MS constant (3 min) in server/src/routes/proxy.ts
Added cooldown check logic in handleChatCompletion() after the ban check, before the retry loop
Added 5 unit tests covering: cooldown active, cooldown expired, non-LongCat provider, ban precedence, no sticky session
All 155 tests pass with zero regressions

Summary by CodeRabbit

Release Notes

New Features
- Session-level provider banning: problematic providers are automatically blacklisted for the remainder of a conversation session, triggering seamless fallback to alternative models.
- Sticky session cooldown: recently used models are temporarily deprioritized for 3 minutes to improve routing diversity.
Bug Fixes
- Improved error detection and recovery: consecutive provider failures and truncated responses are now tracked, enabling faster fallback to reliable alternatives.
Tests
- Comprehensive provider session ban and cooldown test coverage added.

Implements the LongCat session ban feature in proxy.ts. When LongCat detects multiple API key use (auth/rate-limit errors) or returns truncated responses, the platform is banned from the sticky session. Future requests in that session route to non-LongCat models. Changes: - Extend stickySessionMap with bannedPlatforms?: Set<string> - Add isSessionBannedFromPlatform() to check session bans - Add banPlatformFromSession() to record platform bans - Add addLongcatModelsToSkipModels() to skip all LongCat models - Add isTruncatedResponse() to detect truncation keywords - Update getStickyKey() to return undefined for banned platforms - Update setStickyModel() to preserve bannedPlatforms across updates - Update pre-routing logic to check bans before routing - Update error handling to ban LongCat on auth/rate-limit/truncation - Add truncation detection after stream completes - Add truncation detection in mid-stream error handling

… consecutive failures - Extend stickySessionMap with consecutiveFailures tracking per provider - Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures() - Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels - Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2) - Generalize truncation detection to all providers (post-stream + mid-stream) - Update getStickyKey() to check bannedPlatforms for any platform - Update pre-routing ban check to be generic (any banned platform) - Add success path counter reset on both streaming and non-streaming paths - Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels - Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases) - TypeScript compiles cleanly, all 150 tests pass

- Add isBanEligibleStatus() helper restricting to {500,502,503,504} - Improve mid-stream truncation detection with aggregated error sources - Pre-routing ban check now skips ALL banned platforms, not just preferredModel's - Only clear preferredModel when provider is actually banned (not on first 5xx) - Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)

coderabbitai · 2026-06-01T22:15:54Z

📝 Walkthrough

Walkthrough

This PR introduces sticky-session platform bans and a 3-minute LongCat cooldown safeguard, generalizing LongCat-specific logic to any provider. Session state now tracks banned platforms and per-provider 5xx failure counts, triggering bans after consecutive failures or truncation detection. All changes are in-memory with no DB schema modifications.

Changes

Session Ban & Cooldown Implementation

Layer / File(s)	Summary
Feature Specification & Planning `.roo/specs/longcat-session-ban/`, `.roo/specs/longcat-sticky-cooldown/`, `.roo/specs/provider-5xx-session-ban/*`	Three interlinked specifications define LongCat session bans (auth/rate-limit/truncation triggers), generalized provider 5xx bans (consecutive failure tracking), and a 3-minute LongCat sticky-session cooldown. Includes functional/non-functional requirements, implementation tasks, and test coverage plans.
Sticky Session State Extension `server/src/routes/proxy.ts`	`stickySessionMap` entry type expanded to include `bannedPlatforms` Set and per-provider `consecutiveFailures` Map, enabling session-level ban and failure-state tracking without DB changes.
Ban & Failure Helper Functions `server/src/routes/proxy.ts`	Seven new exported utilities: `isSessionBannedFromPlatform()` checks ban status with TTL awareness, `banPlatformFromSession()` records bans and creates sticky entries, `addProviderModelsToSkipModels()` populates retry skip-sets by provider, `recordConsecutiveFailure()` tracks 5xx failures and triggers bans on threshold, reset helpers clear per-provider and global counters, and `isTruncatedResponse()` detects truncation patterns.
Session Lifecycle & Sticky Model Management `server/src/routes/proxy.ts`	Sticky model selection now checks banned platforms dynamically from the DB; `setStickyModel()` preserves existing ban/failure state when updating preferences. Added `isBanEligibleStatus()` helper to recognize 5xx statuses eligible for ban tracking.
Request Routing & Cooldown Integration `server/src/routes/proxy.ts`	`handleChatCompletion()` now computes `skipModels` from session-level banned platforms, clears preferred routing when preferred platform is banned, and applies 3-minute LongCat cooldown by suppressing sticky preferences when cooldown is active.
Stream & Mid-Stream Error Handling `server/src/routes/proxy.ts`	Post-stream truncation detection bans providers after successful streaming completion. Mid-stream error handling records consecutive 5xx failures, aggregates error text for truncation detection, and gracefully ends SSE streams while banning truncated providers.
Success Path & Failure Counter Reset `server/src/routes/proxy.ts`	Success paths reset per-provider and global failure counters to clear failure state. Outer retryable error handling generalizes 5xx failure tracking for any provider and conditionally clears preferred routing only when actually banned.
Unit Test Coverage `server/src/__tests__/routes/provider-session-ban.test.ts`	Comprehensive unit tests validate each helper: ban detection, ban recording with TTL refresh, provider model filtering, consecutive failure tracking with threshold bans, counter resets, and truncation pattern matching.
Integration Test Coverage `server/src/__tests__/routes/provider-session-ban.test.ts`, `server/src/__tests__/routes/proxy-tools.test.ts`	Integration tests verify end-to-end ban lifecycle, counter reset on success, and TTL expiration. Separate cooldown test suite validates suppression when active, preservation after expiry, non-interference with non-LongCat sessions, and ban precedence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

vi70x3/freellmapi#2: Earlier work introducing keyId/preferredKeyId support for LongCat sticky sessions; this PR extends that foundation by adding ban and failure-tracking state alongside the key preferences.

Poem

🐰 Hop along, sticky friends,
Bans now guard where retries end,
Five times failed, a provider's plight,
Cooldown rests the LongCat's flight,
Truncation caught, no second chance—
Smart routing leads the session's dance! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: longcat-sticky-cooldown' directly aligns with the main feature implemented: a 3-minute cooldown safeguard for LongCat sticky sessions that bypasses sticky preference when recently used.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/longcat-sticky-cooldown

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request generalizes the session ban and fallback mechanisms from LongCat to all providers, introducing a 5xx consecutive failure ban (after two consecutive 5xx errors) and extending truncation detection to all platforms. It also adds a three-minute cooldown safeguard for LongCat sticky sessions to bypass sticky preferences during rapid-fire requests, along with comprehensive unit and integration tests. The reviewer feedback focuses on performance optimizations in server/src/routes/proxy.ts, specifically recommending the reuse of fetched session variables to avoid redundant database queries and TTL checks during pre-routing, and adding a fast-path check to bypass database queries when the preferred model matches the failed route model.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-01T22:17:18Z

+  // Check if session is banned from any platform — add all banned platforms' models to skipModels
+  // and clear preferredModel/preferredKeyId if it points to a banned platform.
+  const skipModels = new Set<number>();
+  const sessionKey = getSessionKey(normalizedMessages, routingMode);
+  if (sessionKey) {
+    const entry = stickySessionMap.get(sessionKey);
+    if (entry) {
+      if (Date.now() - entry.lastUsed > STICKY_TTL_MS) {
+        stickySessionMap.delete(sessionKey);
+      } else if (entry.bannedPlatforms) {
+        for (const platform of entry.bannedPlatforms) {
+          addProviderModelsToSkipModels(skipModels, platform);
+          console.log(`[Sticky] session banned from ${platform}, adding to skipModels`);
+        }
+      }
+    }
+  }
+
+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow && isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) {
+      console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`);
+      preferredModel = undefined;
+      preferredKeyId = undefined;
+    }
+  }
+
+  // LongCat sticky cooldown: if the sticky model is on LongCat and was used
+  // within the last 3 minutes, bypass sticky preference for this request only.
+  // The bandit router picks freely — it may still route to LongCat organically.
+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow?.platform === 'longcat') {
+      const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
+      const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
+      if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
+        const ageMs = Date.now() - cooldownEntry.lastUsed;
+        console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
+        preferredModel = undefined;
+        preferredKeyId = undefined;
      }
    }
  }


There is a significant opportunity to optimize the pre-routing ban and cooldown checks. Currently, the code performs redundant database queries (SELECT platform FROM models WHERE id = ?), redundant session key lookups, and redundant TTL checks by calling isSessionBannedFromPlatform and re-fetching the session key/entry for the cooldown check.

By reusing the already fetched sessionKey and entry variables, and combining the database query for preferredModel's platform, we can eliminate all of these redundant operations on the hot path.

// Check if session is banned from any platform — add all banned platforms' models to skipModels // and clear preferredModel/preferredKeyId if it points to a banned platform. const skipModels = new Set<number>(); const sessionKey = getSessionKey(normalizedMessages, routingMode); let entry = sessionKey ? stickySessionMap.get(sessionKey) : undefined; if (entry) { if (Date.now() - entry.lastUsed > STICKY_TTL_MS) { stickySessionMap.delete(sessionKey); entry = undefined; } else if (entry.bannedPlatforms) { for (const platform of entry.bannedPlatforms) { addProviderModelsToSkipModels(skipModels, platform); console.log(`[Sticky] session banned from ${platform}, adding to skipModels`); } } } if (preferredModel) { const db = getDb(); const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; if (prefRow) { if (entry?.bannedPlatforms?.has(prefRow.platform)) { console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`); preferredModel = undefined; preferredKeyId = undefined; } else if (prefRow.platform === 'longcat' && entry) { if (Date.now() - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { const ageMs = Date.now() - entry.lastUsed; console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${sessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`); preferredModel = undefined; preferredKeyId = undefined; } } } }

gemini-code-assist · 2026-06-01T22:17:18Z

+        if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) {
+          const db = getDb();
+          const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+          if (prefRow?.platform === route.platform) {
+            preferredModel = undefined;
+            preferredKeyId = undefined;
+          }
+        }


In the error handler, when a provider is banned, we query the database to check if the preferredModel's platform matches the failed route.platform.

In the vast majority of cases where the preferred model fails, preferredModel is exactly equal to route.modelDbId. We can add a fast-path check to compare these IDs directly, avoiding the database query entirely in the most common failure scenario.

Suggested change

if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) {

const db = getDb();

const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;

if (prefRow?.platform === route.platform) {

preferredModel = undefined;

preferredKeyId = undefined;

}

}

if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) {

if (preferredModel === route.modelDbId) {

preferredModel = undefined;

preferredKeyId = undefined;

} else {

const db = getDb();

const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;

if (prefRow?.platform === route.platform) {

preferredModel = undefined;

preferredKeyId = undefined;

}

}

}

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/longcat-session-ban/requirements.md:
- Line 15: The requirements block around the LongCat sticky session logic is
corrupted and must be rewritten: replace the malformed paragraph (around the
broken merged sentences) with a clean, numbered FR list where each functional
requirement is a single concise bullet referencing the exact code symbols
(clearStickyKey, shouldSkipModelOnRetry, isRetryableError, routeRequest,
router.ts smart-auto logic, stickySessionMap, banPlatformFromSession) so
implementers can find the relevant code paths; fix all typos/garbled tokens
(e.g., "401/403", "model/keykey", "TheisRetryableError", "keoes") and ensure
each FR (FR-1..FR-13) is atomic, references the function(s) it affects, and maps
to the corresponding behavior (detection patterns, ban mechanism, skipModels
changes, TTL/expiry, no DB schema change, minimal router tweak, tests) so
reviewers and engineers can implement unambiguously.
- Around line 24-28: The requirements table contains duplicated/malformed rows
(duplicate index 11 and prefixes like "| | 8"/"| | 11") which breaks
traceability; open .roo/specs/longcat-session-ban/requirements.md, remove the
extra duplicate row, renumber the affected rows sequentially, and normalize the
row prefixes so each entry starts with a single pipe and correct index (e.g.,
fix the entries referencing server/src/routes/proxy.ts and
server/src/services/router.ts to have unique indices and consistent "| <index> |
[file](file:range) | Edit | description |" formatting) so all links and indices
are unique and well-formed.

In @.roo/specs/longcat-sticky-cooldown/design.md:
- Around line 56-63: The fenced code block shown (the sequence starting with "1.
preferredModel = getStickyModel(...)" through "6. Retry loop with
routeRequest(...)") lacks a language tag; update the block opening from ``` to a
tagged fence like ```text or ```typescript to satisfy markdown linting and
improve rendering while leaving the block contents unchanged—locate the block in
the longcat-sticky-cooldown design section (the snippet around the numbered
steps referencing getStickyModel, getStickyKey, bannedPlatforms, and
routeRequest) and add the language identifier.

In @.roo/specs/provider-5xx-session-ban/design.md:
- Around line 58-59: The markdown links in design.md that reference server code
(e.g., links pointing to proxy.ts via "../server/src/...") are using incorrect
relative paths; update each occurrence to a normalized path that resolves from
this document (for example remove the leading ".." to use "server/src/..." or
convert to a repo-root absolute/blob URL) so reviewers can click through to the
actual file (search for the mentions of proxy.ts and the other similar
"../server/src/..." links and replace them consistently).

In @.roo/specs/provider-5xx-session-ban/requirements.md:
- Around line 16-21: Update the incorrect relative links in this requirements.md
so they resolve from this file's location: replace the broken
"../server/src/routes/proxy.ts" and "../longcat-session-ban/" references with
correct relative paths that point to server/src/routes/proxy.ts and the
longcat-session-ban spec from this file (e.g., adjust to
"../../server/src/routes/proxy.ts" or the appropriate number of ../ segments);
ensure both the proxy file link and the longcat-session-ban spec link are valid
and navigable in-repo.

In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 194-205: The test sets DB pragmas and a transaction manually but
does not guarantee cleanup on failures; update the test around getDb(), the
PRAGMA/BEGIN/DELETE calls and the addProviderModelsToSkipModels assertion so
that PRAGMA foreign_keys and the transaction are always restored using a
try/finally: wrap the DB mutation and expect call in try, call
db.prepare('ROLLBACK').run() and db.prepare('PRAGMA foreign_keys = ON').run() in
the finally block so rollback and pragma restoration always happen even if the
assertion throws.

In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 840-849: The test setup calls POST /api/keys without admin auth,
which can hide misconfigured key-gating; change the key creation calls in
proxy-tools.test.ts to use the admin-authenticated helper (use the admin key
when calling request for '/api/keys', e.g., the same pattern used elsewhere in
the file) and then add at least one negative assertion that a unified API key
(the v1/unified key created) used against an /api/* route returns 401; locate
the request(...) calls that create keys and the later proxy calls (symbols:
request, '/api/keys', and the unified key variable/name) and update them so
admin auth is used for key setup and add a test asserting 401 when the unified
key is used against an /api/* endpoint.

In `@server/src/routes/proxy.ts`:
- Around line 212-215: The truncation-detection boolean is too broad because it
treats any occurrence of "conflict" as a truncation signal; update the
expression that checks the lower variable in server/src/routes/proxy.ts (the
return statement currently using lower.includes(...)) by removing the
lower.includes('conflict') clause so ordinary conflict errors no longer trigger
a truncation/ban; keep the other specific matchers (e.g., 'truncated',
'context_length_exceeded', 'token_limit', etc.), and run related tests or linter
to ensure no syntax regressions.
- Around line 1243-1257: The cooldown block should not run for client-explicit
model requests; update the condition that starts with "if (preferredModel)" so
it first checks that the preferred model was not explicitly requested by the
client (e.g., a boolean like explicitModelRequested or req.body.model presence).
In practice, add/use the existing flag that indicates an explicit client model
(or create one where routing decides preferredModel) and change the guard to
something like "if (preferredModel && !explicitModelRequested)" before calling
getSessionKey, stickySessionMap, and checking LONGCAT_STICKY_COOLDOWN_MS so
pinned models remain honored for explicit requests (leave preferredModel and
preferredKeyId untouched when explicitModelRequested is true).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1a1395ab-919c-4e24-be41-3b291c63270c

📥 Commits

Reviewing files that changed from the base of the PR and between d9c5a73 and 7dd4189.

📒 Files selected for processing (12)

.roo/specs/longcat-session-ban/design.md
.roo/specs/longcat-session-ban/requirements.md
.roo/specs/longcat-session-ban/tasks.md
.roo/specs/longcat-sticky-cooldown/design.md
.roo/specs/longcat-sticky-cooldown/requirements.md
.roo/specs/longcat-sticky-cooldown/tasks.md
.roo/specs/provider-5xx-session-ban/design.md
.roo/specs/provider-5xx-session-ban/requirements.md
.roo/specs/provider-5xx-session-ban/tasks.md
server/src/__tests__/routes/provider-session-ban.test.ts
server/src/__tests__/routes/proxy-tools.test.ts
server/src/routes/proxy.ts

coderabbitai · 2026-06-01T22:22:48Z

+
+ and eviction.
+
+ The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description |


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Corrupted requirements block needs rewrite before this spec is reliable.

Line 15 is heavily malformed (merged sentences, broken references, duplicated/garbled phrases like 401/40403, model/keykey, TheisRetryableError, keoes). This is likely to mislead implementation and review decisions.

Please split this into clean FR subsections with one requirement per bullet and valid code references.

🧰 Tools

🪛 LanguageTool

[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...within the same LongCat model, which is exactly what LongCat detects as "multiple API k...

(EXACTLY_PRECISELY)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...404, 408, 409, 422, 500, 502, 503, 504, andrate limit, quota, aborted, timeout`, ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...proxy will retry with a different model/keykey. However, [shouldSkipModelOnRetry()](...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...otation within the same model, which is exactly the behavior LongCat detects as "multip...

(EXACTLY_PRECISELY)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...89-98) only clears the sticky key but **keoes preferredKeyId to undefined — but t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.1)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/requirements.md at line 15, The requirements block around the LongCat sticky session logic is corrupted and must be rewritten: replace the malformed paragraph (around the broken merged sentences) with a clean, numbered FR list where each functional requirement is a single concise bullet referencing the exact code symbols (clearStickyKey, shouldSkipModelOnRetry, isRetryableError, routeRequest, router.ts smart-auto logic, stickySessionMap, banPlatformFromSession) so implementers can find the relevant code paths; fix all typos/garbled tokens (e.g., "401/403", "model/keykey", "TheisRetryableError", "keoes") and ensure each FR (FR-1..FR-13) is atomic, references the function(s) it affects, and maps to the corresponding behavior (detection patterns, ban mechanism, skipModels changes, TTL/expiry, no DB schema change, minimal router tweak, tests) so reviewers and engineers can implement unambiguously.

coderabbitai · 2026-06-01T22:22:48Z

+ | | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat |
+| | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path |
+| | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions |
+ | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |
+ | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | ## Out of Scope | - Persistent bans across server restarts ( in-memory only, same as existing sticky sessions) - Changes to the Thompson Sampling algorithm itself - Changes to rate limiting logic - - Changes to the fallback chain ordering in balanced mode - - Client-side UI changes - - Configuration UI for enabling/disabling bans per provider ( hardcoded to LongCat only - Changes to the `OpenAICompatProvider` class |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix duplicated/malformed modification table entries.

The table has duplicated index 11 and malformed row prefixes (| | 8, | | 11), which breaks traceability from requirements to implementation tasks.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~28-~28: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... Sampling algorithm itself - Changes to rate limiting logic - - Changes to the fallback chain...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/requirements.md around lines 24 - 28, The requirements table contains duplicated/malformed rows (duplicate index 11 and prefixes like "| | 8"/"| | 11") which breaks traceability; open .roo/specs/longcat-session-ban/requirements.md, remove the extra duplicate row, renumber the affected rows sequentially, and normalize the row prefixes so each entry starts with a single pipe and correct index (e.g., fix the entries referencing server/src/routes/proxy.ts and server/src/services/router.ts to have unique indices and consistent "| <index> | [file](file:range) | Edit | description |" formatting) so all links and indices are unique and well-formed.

coderabbitai · 2026-06-01T22:22:49Z

+```
+1. preferredModel = getStickyModel(...)        // line 1199
+2. preferredKeyId = getStickyKey(...)          // line 1207-1212
+3. skipModels from bannedPlatforms              // line 1216-1230
+4. Clear preferredModel if on banned platform  // line 1232-1240
+5. ← INSERT COOLDOWN CHECK HERE
+6. Retry loop with routeRequest(...)           // line 1247+
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

This block should be annotated (e.g., text or typescript) to satisfy markdown linting and improve rendering consistency.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 56-56: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-sticky-cooldown/design.md around lines 56 - 63, The fenced code block shown (the sequence starting with "1. preferredModel = getStickyModel(...)" through "6. Retry loop with routeRequest(...)") lacks a language tag; update the block opening from ``` to a tagged fence like ```text or ```typescript to satisfy markdown linting and improve rendering while leaving the block contents unchanged—locate the block in the longcat-sticky-cooldown design section (the snippet around the numbered steps referencing getStickyModel, getStickyKey, bannedPlatforms, and routeRequest) and add the language identifier.

coderabbitai · 2026-06-01T22:22:49Z

+Current value type at [`proxy.ts:16`](../server/src/routes/proxy.ts:16):
+```typescript


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix relative links to server/src/... paths.

From this directory depth, ../server/... likely won’t resolve correctly. Please normalize links so reviewers can navigate directly to referenced code locations.

Also applies to: 83-84, 123-124, 145-146, 165-166

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/provider-5xx-session-ban/design.md around lines 58 - 59, The markdown links in design.md that reference server code (e.g., links pointing to proxy.ts via "../server/src/...") are using incorrect relative paths; update each occurrence to a normalized path that resolves from this document (for example remove the leading ".." to use "server/src/..." or convert to a repo-root absolute/blob URL) so reviewers can click through to the actual file (search for the mentions of proxy.ts and the other similar "../server/src/..." links and replace them consistently).

coderabbitai · 2026-06-01T22:22:49Z

+The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts:16). It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, keyId?, bannedPlatforms?, lastUsed }` with a 30-min TTL and 500-entry max.
+
+The existing LongCat session ban ([`longcat-session-ban` spec](../longcat-session-ban/)) added `bannedPlatforms`, `banPlatformFromSession()`, `isSessionBannedFromPlatform()`, `addLongcatModelsToSkipModels()`, and `isTruncatedResponse()`. This spec generalizes that infrastructure: the `bannedPlatforms` set and ban helper functions are reused, the `isTruncatedResponse()` function is retained and generalized to all providers, and the LongCat-specific auth/rate-limit error detection is replaced by general 5xx consecutive failure tracking.
+
+The retry loop in `handleChatCompletion()` currently has LongCat-specific error handling at lines 1383-1402 that bans LongCat on auth errors and rate-limit errors. This is replaced by general 5xx consecutive failure detection that works for any provider. The truncation detection is retained but generalized from LongCat-only to all providers.
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct relative code links in the Context section.

The ../server/src/... references are likely invalid from this file’s path. Please update to the correct relative path so links are navigable in-repo.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/provider-5xx-session-ban/requirements.md around lines 16 - 21, Update the incorrect relative links in this requirements.md so they resolve from this file's location: replace the broken "../server/src/routes/proxy.ts" and "../longcat-session-ban/" references with correct relative paths that point to server/src/routes/proxy.ts and the longcat-session-ban spec from this file (e.g., adjust to "../../server/src/routes/proxy.ts" or the appropriate number of ../ segments); ensure both the proxy file link and the longcat-session-ban spec link are valid and navigable in-repo.

coderabbitai · 2026-06-01T22:22:49Z

+    it('handles empty provider model list gracefully', () => {
+      const db = getDb();
+      db.prepare('PRAGMA foreign_keys = OFF').run();
+      db.prepare('BEGIN').run();
+      db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
+      db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
+      const skipModels = new Set<number>();
+      expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
+      expect(skipModels.size).toBe(0);
+      db.prepare('ROLLBACK').run();
+      db.prepare('PRAGMA foreign_keys = ON').run();
+    });


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Always restore DB pragmas/transaction state with finally.

If this test fails before ROLLBACK/PRAGMA ... ON, later tests can run with leaked DB state and become flaky.

Suggested fix

- db.prepare('PRAGMA foreign_keys = OFF').run(); - db.prepare('BEGIN').run(); - db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run(); - db.prepare("DELETE FROM models WHERE platform = 'longcat'").run(); - const skipModels = new Set<number>(); - expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow(); - expect(skipModels.size).toBe(0); - db.prepare('ROLLBACK').run(); - db.prepare('PRAGMA foreign_keys = ON').run(); + db.prepare('PRAGMA foreign_keys = OFF').run(); + db.prepare('BEGIN').run(); + try { + db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run(); + db.prepare("DELETE FROM models WHERE platform = 'longcat'").run(); + const skipModels = new Set<number>(); + expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow(); + expect(skipModels.size).toBe(0); + } finally { + db.prepare('ROLLBACK').run(); + db.prepare('PRAGMA foreign_keys = ON').run(); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it('handles empty provider model list gracefully', () => {

const db = getDb();

db.prepare('PRAGMA foreign_keys = OFF').run();

db.prepare('BEGIN').run();

db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();

db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();

const skipModels = new Set<number>();

expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();

expect(skipModels.size).toBe(0);

db.prepare('ROLLBACK').run();

db.prepare('PRAGMA foreign_keys = ON').run();

});

it('handles empty provider model list gracefully', () => {

const db = getDb();

db.prepare('PRAGMA foreign_keys = OFF').run();

db.prepare('BEGIN').run();

try {

db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();

db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();

const skipModels = new Set<number>();

expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();

expect(skipModels.size).toBe(0);

} finally {

db.prepare('ROLLBACK').run();

db.prepare('PRAGMA foreign_keys = ON').run();

}

});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/provider-session-ban.test.ts` around lines 194 - 205, The test sets DB pragmas and a transaction manually but does not guarantee cleanup on failures; update the test around getDb(), the PRAGMA/BEGIN/DELETE calls and the addProviderModelsToSkipModels assertion so that PRAGMA foreign_keys and the transaction are always restored using a try/finally: wrap the DB mutation and expect call in try, call db.prepare('ROLLBACK').run() and db.prepare('PRAGMA foreign_keys = ON').run() in the finally block so rollback and pragma restoration always happen even if the assertion throws.

coderabbitai · 2026-06-01T22:22:49Z

+    await request(app, 'POST', '/api/keys', {
+      platform: 'longcat',
+      key: 'lc_cooldown_active_test',
+      label: 'cooldown-active-longcat',
+    });
+    await request(app, 'POST', '/api/keys', {
+      platform: 'groq',
+      key: 'gsk_cooldown_active_test',
+      label: 'cooldown-active-groq',
+    });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use admin-authenticated /api/keys setup (and lock key-separation behavior).

These new tests post to /api/keys without explicitly using the admin key, so they can pass even if /api/* key-gating is misconfigured. Please route setup through explicit admin auth and add at least one negative check that unified key on /api/* returns 401.

As per coding guidelines, "The admin key must gate /api/* routes; the unified API key must gate /v1/* routes — they must never overlap, and using one against the wrong route returns 401".

Also applies to: 904-913, 973-977, 1038-1042, 1091-1095

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 840 - 849, The test setup calls POST /api/keys without admin auth, which can hide misconfigured key-gating; change the key creation calls in proxy-tools.test.ts to use the admin-authenticated helper (use the admin key when calling request for '/api/keys', e.g., the same pattern used elsewhere in the file) and then add at least one negative assertion that a unified API key (the v1/unified key created) used against an /api/* route returns 401; locate the request(...) calls that create keys and the later proxy calls (symbols: request, '/api/keys', and the unified key variable/name) and update them so admin auth is used for key setup and add a test asserting 401 when the unified key is used against an /api/* endpoint.

coderabbitai · 2026-06-01T22:22:49Z

+  return lower.includes('truncated') || lower.includes('truncation') ||
+    lower.includes('context_length_exceeded') || lower.includes('token_limit') ||
+    lower.includes('maximum length') || lower.includes('response_length_limit') ||
+    lower.includes('conflict');


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove broad conflict truncation matcher to avoid false bans.

Line 215 matches any "conflict" text as truncation. With provider-agnostic ban logic, ordinary conflict errors can now incorrectly ban a platform for the session.

Suggested fix

return lower.includes('truncated') || lower.includes('truncation') || lower.includes('context_length_exceeded') || lower.includes('token_limit') || lower.includes('maximum length') || lower.includes('response_length_limit') || - lower.includes('conflict'); + false;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return lower.includes('truncated') || lower.includes('truncation') ||

lower.includes('context_length_exceeded') || lower.includes('token_limit') ||

lower.includes('maximum length') || lower.includes('response_length_limit') ||

lower.includes('conflict');

return lower.includes('truncated') || lower.includes('truncation') ||

lower.includes('context_length_exceeded') || lower.includes('token_limit') ||

lower.includes('maximum length') || lower.includes('response_length_limit') ||

false;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 212 - 215, The truncation-detection boolean is too broad because it treats any occurrence of "conflict" as a truncation signal; update the expression that checks the lower variable in server/src/routes/proxy.ts (the return statement currently using lower.includes(...)) by removing the lower.includes('conflict') clause so ordinary conflict errors no longer trigger a truncation/ban; keep the other specific matchers (e.g., 'truncated', 'context_length_exceeded', 'token_limit', etc.), and run related tests or linter to ensure no syntax regressions.

coderabbitai · 2026-06-01T22:22:49Z

+  // LongCat sticky cooldown: if the sticky model is on LongCat and was used
+  // within the last 3 minutes, bypass sticky preference for this request only.
+  // The bandit router picks freely — it may still route to LongCat organically.
+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow?.platform === 'longcat') {
+      const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
+      const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
+      if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
+        const ageMs = Date.now() - cooldownEntry.lastUsed;
+        console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
+        preferredModel = undefined;
+        preferredKeyId = undefined;
      }


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Do not apply sticky cooldown to explicit model requests.

At Line 1246, cooldown applies whenever preferredModel exists, including when the client explicitly set model. That can override a pinned model and silently auto-route.

Suggested fix

- if (preferredModel) { + if (preferredModel && !requestedModel) { const db = getDb(); const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; if (prefRow?.platform === 'longcat') {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1243 - 1257, The cooldown block should not run for client-explicit model requests; update the condition that starts with "if (preferredModel)" so it first checks that the preferred model was not explicitly requested by the client (e.g., a boolean like explicitModelRequested or req.body.model presence). In practice, add/use the existing flag that indicates an explicit client model (or create one where routing decides preferredModel) and change the guard to something like "if (preferredModel && !explicitModelRequested)" before calling getSessionKey, stickySessionMap, and checking LONGCAT_STICKY_COOLDOWN_MS so pinned models remain honored for explicit requests (leave preferredModel and preferredKeyId untouched when explicitModelRequested is true).

vi70x3 added 6 commits June 1, 2026 22:00

fix: rm excessive checks

27a9f2f

test: add units

8772ba4

feat(proxy): add LongCat sticky session cooldown safeguard

7dd4189

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot suggested changes Jun 1, 2026

View reviewed changes

vi70x3 closed this Jun 1, 2026


		and eviction.

		The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the same API key within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but keep the sticky model pinned to LongCat via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. and LongCat detects different keys usage for the same session → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly does NOT skip the model for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification \| # \| File \| Change Type \| Description \|

		Current value type at [`proxy.ts:16`](../server/src/routes/proxy.ts:16):
		```typescript

Conversation

vi70x3 commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vi70x3 commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading