feat: longcat sticky session cooldown safeguard by vi70x3 · Pull Request #6 · animaios/api-llm-localhost

vi70x3 · 2026-06-01T22:36:05Z

Implements the longcat-sticky-cooldown spec.

Summary

When a sticky session is pinned to a LongCat model AND was used within the last 3 minutes, exclude LongCat entirely from the bandit router for all other sessions. The current sticky session keeps its pinned LongCat model+key — only that session may use LongCat. This prevents LongCat from seeing multiple sessions/keys from the same IP.

Changes

server/src/routes/proxy.ts — Added LONGCAT_STICKY_COOLDOWN_MS = 3 * 60 * 1000 constant
server/src/routes/proxy.ts — Cooldown check adds LongCat models to skipModels (instead of clearing sticky preference). Sticky session keeps its LongCat pin.
server/src/services/router.ts — Modified skipModels check to exclude the sticky model: && entry.model_db_id !== preferredModelDbId — ensures the sticky session's LongCat model is never skipped
server/src/__tests__/routes/proxy-tools.test.ts — Updated 6 unit tests covering: cooldown active (preserves sticky + excludes LongCat from bandit), cooldown expired, non-LongCat provider, ban precedence, no sticky session, other sessions during cooldown

All 156 tests pass with zero regressions.

Summary by CodeRabbit

New Features
- Automatic per-session provider bans after repeated failures (5xx or truncated responses) with fallback to other providers
- Sticky-session cooldown to temporarily exclude LongCat from routing while preserving current pinned sessions
Bug Fixes
- Better detection of truncated and mid-stream failures to trigger safe fallbacks
- Sticky routing now respects exclusions without unpinning valid sticky sessions
Tests
- Added unit and integration tests covering session bans, cooldowns, truncation detection, and routing behavior

Implements the LongCat session ban feature in proxy.ts. When LongCat detects multiple API key use (auth/rate-limit errors) or returns truncated responses, the platform is banned from the sticky session. Future requests in that session route to non-LongCat models. Changes: - Extend stickySessionMap with bannedPlatforms?: Set<string> - Add isSessionBannedFromPlatform() to check session bans - Add banPlatformFromSession() to record platform bans - Add addLongcatModelsToSkipModels() to skip all LongCat models - Add isTruncatedResponse() to detect truncation keywords - Update getStickyKey() to return undefined for banned platforms - Update setStickyModel() to preserve bannedPlatforms across updates - Update pre-routing logic to check bans before routing - Update error handling to ban LongCat on auth/rate-limit/truncation - Add truncation detection after stream completes - Add truncation detection in mid-stream error handling

… consecutive failures - Extend stickySessionMap with consecutiveFailures tracking per provider - Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures() - Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels - Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2) - Generalize truncation detection to all providers (post-stream + mid-stream) - Update getStickyKey() to check bannedPlatforms for any platform - Update pre-routing ban check to be generic (any banned platform) - Add success path counter reset on both streaming and non-streaming paths - Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels - Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases) - TypeScript compiles cleanly, all 150 tests pass

- Add isBanEligibleStatus() helper restricting to {500,502,503,504} - Improve mid-stream truncation detection with aggregated error sources - Pre-routing ban check now skips ALL banned platforms, not just preferredModel's - Only clear preferredModel when provider is actually banned (not on first 5xx) - Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)

…or other sessions during 3-min cooldown window

coderabbitai · 2026-06-01T22:36:15Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 307e6cb3-d9a3-4fd2-8bee-1f09a69b7457

📥 Commits

Reviewing files that changed from the base of the PR and between 5f6f0b0 and 95722bc.

📒 Files selected for processing (1)

server/src/routes/proxy.ts

📝 Walkthrough

Walkthrough

This PR implements a sophisticated sticky-session banning and fallback mechanism for the proxy routing layer. It extends session state to track per-platform bans and consecutive failures, adds provider-agnostic helper functions, and wires error detection throughout the retry loop to automatically ban platforms after consecutive 5xx responses or truncation, with a cooldown feature to exclude recently-pinned LongCat models from other sessions.

Changes

Specification & Design Documentation

Layer / File(s)	Summary
LongCat Session Ban Specification `.roo/specs/longcat-session-ban/design.md`, `.roo/specs/longcat-session-ban/requirements.md`, `.roo/specs/longcat-session-ban/tasks.md`	Design, functional and non-functional requirements, and task checklist for LongCat-specific session banning on auth/rate-limit/truncation errors. Documents ban detection, fallback via skipModels, state preservation across model updates, and edge cases.
LongCat Sticky Cooldown Specification `.roo/specs/longcat-sticky-cooldown/design.md`, `.roo/specs/longcat-sticky-cooldown/requirements.md`, `.roo/specs/longcat-sticky-cooldown/tasks.md`	Design, requirements, and tasks for a 3-minute cooldown window that excludes LongCat from bandit routing for other sessions when a sticky session recently used LongCat, preserving the pinned model via router skip-logic exception.
Provider 5xx Session Ban Specification `.roo/specs/provider-5xx-session-ban/design.md`, `.roo/specs/provider-5xx-session-ban/requirements.md`, `.roo/specs/provider-5xx-session-ban/tasks.md`	Design, requirements, and tasks for generalizing session banning to any provider based on 2 consecutive 5xx responses or truncation detection, tracking per-provider failure counters in-session, and banning with skipModels.

Session Banning Implementation

Layer / File(s)	Summary
Sticky Session State Extension & Helpers `server/src/routes/proxy.ts` (lines 16–234)	Extends `stickySessionMap` to persist `bannedPlatforms` and `consecutiveFailures` per session. Adds exported helper functions: `isSessionBannedFromPlatform`, `banPlatformFromSession`, `addProviderModelsToSkipModels`, `recordConsecutiveFailure`, `resetConsecutiveFailures`, `resetAllConsecutiveFailures`. Includes `isBanEligibleStatus` to classify 5xx codes (500/502/503/504).
Sticky Model & Key Selection with Ban Awareness `server/src/routes/proxy.ts` (lines 74–83, 257–264)	`setStickyModel` preserves `bannedPlatforms` and `consecutiveFailures` when updating model/key. Pre-routing checks whether the sticky session's banned platforms include the sticky model's platform; if so, the sticky model is skipped, enabling fallback.
Router Skip Model Exception `server/src/services/router.ts` (line 539)	Modifies `routeRequest` skip condition to allow `preferredModelDbId` to bypass `skipModels`, ensuring pinned sticky models are always selectable despite being in the skip set.
Request Routing with Ban Derivation & Cooldown `server/src/routes/proxy.ts` (lines 1203–1263)	Pre-routing derives `skipModels` from session `bannedPlatforms`. Implements LongCat cooldown to add LongCat models to `skipModels` for other sessions when the sticky route recently used LongCat (within `LONGCAT_STICKY_COOLDOWN_MS`). Clears `preferredModel`/`preferredKeyId` when the preferred model’s platform is banned. Manages sticky key preference for fallback routes.
Truncation & Error Detection `server/src/routes/proxy.ts` (lines 1340–1350, 1399–1446, 1490, 1510–1536)	Post-stream truncation detection bans the platform if output appears truncated. Mid-stream error handling detects consecutive 5xx failures and truncation across error payload fields; truncation bans the platform and ends streaming gracefully. Outer retry loop tracks consecutive failures per provider and bans on threshold. Success paths reset all consecutive-failure counters. Auth error comment clarifies sticky-key clearing behavior.

Session Banning & Cooldown Tests

Layer / File(s)	Summary
Provider Session Ban Test Suite `server/src/__tests__/routes/provider-session-ban.test.ts`	Comprehensive unit and integration tests for session ban helpers: `isSessionBannedFromPlatform` (TTL expiry, missing state), `banPlatformFromSession` (creation, update, duplicate prevention), `addProviderModelsToSkipModels` (enabled models, empty provider), `recordConsecutiveFailure` (counter increment, ban threshold, provider independence), failure/ban resets, and `isTruncatedResponse` (multiple patterns, non-string inputs). Integration tests validate ban persistence across model changes, TTL behavior, skip-model interaction, consecutive-failure-triggered bans, and counter resets on success.
LongCat Sticky Cooldown Test Suite `server/src/__tests__/routes/proxy-tools.test.ts` (lines 5, 804–1219)	Tests LongCat cooldown: cooldown-active vs cooldown-expired routing outcomes, non-LongCat sticky session exemption, ban precedence over cooldown logging, and no cooldown when sticky session is absent. Validates console logging expectations for cooldown state and routing behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

vi70x3/freellmapi#2: The sticky-session ban logic interacts with LongCat-specific sticky-key behavior by extending session state and suppressing preferredKeyId for banned LongCat platforms.

Poem

🐰 A rabbit bounces through the stickiest of routes,
Banning platforms twice when failures bring doubts,
Cooldowns hush the LongCat's recent gleam,
Truncated streams wake from their broken dream. ✨🐾

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: longcat sticky session cooldown safeguard' directly and accurately describes the main feature implemented—a cooldown mechanism for LongCat sticky sessions. It is concise, specific, and clearly indicates the primary change.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/longcat-sticky-cooldown

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request generalizes the session ban and fallback mechanisms (including 5xx consecutive failures and truncation detection) to all providers, and introduces a 3-minute cooldown safeguard for LongCat sticky sessions. Feedback on the changes highlights a critical logic error in the cooldown safeguard implementation: it currently only checks if the current session is a hot LongCat session instead of checking if any active session in the map is hot, which fails to protect other sessions from routing to LongCat. Additionally, it is recommended to optimize redundant database queries in the hot path.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-01T22:37:45Z

+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow && isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) {
+      console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`);
+      preferredModel = undefined;
+      preferredKeyId = undefined;
+    }
+  }
+
+  // LongCat sticky cooldown: if the sticky model is on LongCat and was used
+  // within the last 3 minutes, exclude LongCat from the bandit router for all
+  // other sessions. The current sticky session keeps its pinned LongCat route.
+  // This prevents LongCat from seeing multiple sessions/keys from the same IP.
+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow?.platform === 'longcat') {
+      const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
+      const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
+      if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
+        const ageMs = Date.now() - cooldownEntry.lastUsed;
+        addProviderModelsToSkipModels(skipModels, 'longcat');
+        console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
      }
    }
  }


Issue: Cooldown Safeguard Logic Error (Other Sessions Not Protected)

The current implementation of the LongCat sticky session cooldown safeguard only checks if the current session has a sticky LongCat model and is within the cooldown window. If a request comes from a different session (e.g., a new session or one sticky on another provider), preferredModel is not LongCat, so the cooldown check is skipped entirely. Consequently, other sessions are never restricted from routing to LongCat during the cooldown period, which completely defeats the purpose of the safeguard.

Additionally, the current code performs the exact same database query (SELECT platform FROM models WHERE id = ?) twice on the same preferredModel in the hot path, which is redundant and inefficient.

Solution

To correctly implement the safeguard, we should check if any active session in stickySessionMap is a hot LongCat session. If so, we exclude LongCat models from bandit routing for all other sessions by adding them to skipModels. The router's bypass logic (entry.model_db_id !== preferredModelDbId) will still allow the hot session itself to use its pinned model.

We can also optimize this by querying the enabled LongCat model IDs once and checking the map in memory, completely avoiding redundant database queries.

let preferredModelPlatform: string | undefined; if (preferredModel) { const db = getDb(); const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; if (prefRow) { preferredModelPlatform = prefRow.platform; if (isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) { console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`); preferredModel = undefined; preferredKeyId = undefined; preferredModelPlatform = undefined; } } } // LongCat sticky cooldown: if any sticky session is pinned to a LongCat model // and was used within the last 3 minutes, exclude LongCat entirely from the // bandit router for all other sessions. const now = Date.now(); let hasHotLongcatSession = false; let hotSessionKey = ''; let hotSessionAgeMs = 0; const db = getDb(); const longcatModels = db.prepare("SELECT id FROM models WHERE platform = 'longcat' AND enabled = 1").all() as Array<{ id: number }>; const longcatModelIds = new Set(longcatModels.map(m => m.id)); for (const [sKey, entry] of stickySessionMap) { if (now - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS && longcatModelIds.has(entry.modelDbId)) { hasHotLongcatSession = true; hotSessionKey = sKey; hotSessionAgeMs = now - entry.lastUsed; break; } } if (hasHotLongcatSession) { addProviderModelsToSkipModels(skipModels, 'longcat'); console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | active session=${hotSessionKey.slice(0, 8)} | lastUsed=${hotSessionAgeMs}ms ago`); }

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

server/src/__tests__/routes/provider-session-ban.test.ts (1)

196-205: ⚡ Quick win

Use try/finally for transactional test cleanup.

If an assertion fails between Line 196 and Line 205, ROLLBACK/PRAGMA foreign_keys = ON may never run, contaminating later tests.

Suggested test hardening

       const db = getDb();
       db.prepare('PRAGMA foreign_keys = OFF').run();
       db.prepare('BEGIN').run();
-      db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
-      db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
-      const skipModels = new Set<number>();
-      expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
-      expect(skipModels.size).toBe(0);
-      db.prepare('ROLLBACK').run();
-      db.prepare('PRAGMA foreign_keys = ON').run();
+      try {
+        db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
+        db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
+        const skipModels = new Set<number>();
+        expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
+        expect(skipModels.size).toBe(0);
+      } finally {
+        db.prepare('ROLLBACK').run();
+        db.prepare('PRAGMA foreign_keys = ON').run();
+      }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/provider-session-ban.test.ts` around lines 196 -
205, Wrap the transactional block that manipulates the DB and runs assertions in
a try/finally so cleanup always runs: call db.prepare('PRAGMA foreign_keys =
OFF').run(); db.prepare('BEGIN').run(); perform the DELETEs and the test
assertions (including addProviderModelsToSkipModels and assertions on
skipModels) inside the try, and in the finally always execute
db.prepare('ROLLBACK').run(); and db.prepare('PRAGMA foreign_keys = ON').run();
so foreign keys and transaction state are restored even if an assertion throws.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/longcat-session-ban/requirements.md:
- Around line 5-15: The requirements file has a malformed merged block (lines
5–15) that garbles Context and Functional Requirements and introduces typos;
please split that run-on paragraph into distinct sections (Context, Existing
Behavior, Functional Requirements) and reformat FR-1..FR-13 as separate,
numbered bullets, fixing typos and broken sentences (e.g., "keoes", duplicated
phrases, misplaced references) and ensuring each FR references the correct code
symbols: clearStickyKey(), shouldSkipModelOnRetry(), isRetryableError(),
routeRequest(), getStickyModel(), and the LongCat smart-auto logic in router.ts;
ensure detection patterns and ban semantics (bannedPlatforms set on
stickySessionMap, TTL behavior, skipModels update) are clearly stated, keep NFRs
and file-change list intact, and make bannedPlatforms optional for backward
compatibility.

In @.roo/specs/longcat-sticky-cooldown/requirements.md:
- Around line 74-77: Update the requirements doc to remove or reword NFR-3 so it
no longer asserts “No Router Changes” because the implementation requires
allowing preferredModelDbId to bypass skipModels in routeRequest; explicitly
state that the router's routeRequest behavior must allow preferredModelDbId to
override entries in the skipModels set (or remove the NFR-3 clause entirely) and
reference the skipModels and preferredModelDbId behavior in routeRequest to
avoid contradictory acceptance criteria.

In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 895-973: The test currently asserts the opposite of the intended
cooldown behavior for other sessions; update the test to assert the non-sticky
session does NOT route to LongCat by adding an assertion on the routedProvider
(e.g., expect(routedProvider).toBe('groq') or
expect(routedProvider).not.toBe('longcat')) after the POST to
/v1/chat/completions, remove or correct the contradictory inline comments that
claim no exclusion, and ensure the mocked fetch logic (the routedProvider
variable set in the global.fetch spy) is used for this assertion so the test
actually verifies the cooldown exclusion behavior tied to
stickySessionMap/getSessionKey/makeMessages.
- Around line 839-849: The new POST calls to /api/keys in
server/src/__tests__/routes/proxy-tools.test.ts are missing admin-key
authentication and must be updated to include the admin-key header; update each
request(app, 'POST', '/api/keys', {...}) call to send the admin header (e.g. add
headers: { 'x-admin-key': ADMIN_KEY } or the project’s admin header constant) so
the route is exercised under admin-key gating, and add one negative test that
calls POST /api/keys with a unified API key header (e.g. the unified key header
used for /v1/*) asserting a 401 response to prove unified keys are rejected for
/api/* paths. Ensure you mirror this change for the other occurrences noted
(around lines referenced in the review: 909-918, 989-998, 1058-1062, 1123-1127,
1176-1180).

In `@server/src/routes/proxy.ts`:
- Around line 1247-1257: The cooldown check currently only looks up the current
session key; instead, scan stickySessionMap for any entry (other than the
current session key returned by getSessionKey(normalizedMessages, routingMode))
whose platform is 'longcat' and whose lastUsed is within
LONGCAT_STICKY_COOLDOWN_MS, and if found call
addProviderModelsToSkipModels(skipModels, 'longcat') and log the exclusion
(include the offending session key slice and ageMs); use the same symbols
prefRow/platform check, getSessionKey, stickySessionMap,
LONGCAT_STICKY_COOLDOWN_MS, addProviderModelsToSkipModels, skipModels,
normalizedMessages, and routingMode to locate and update the code.
- Around line 1416-1422: The mid-stream error parser that pushes serialized
error payloads into truncationTexts can throw when JSON.stringify encounters
circular structures; in the block handling streamErr.response.data and
streamErr.body (look for truncationTexts and streamErr in proxy.ts) wrap JSON
serialization in a safe routine: attempt JSON.stringify inside a try/catch and
fall back to a non-throwing representation (e.g., use String(...),
util.inspect(...), or a small safeStringify that returns a placeholder on error)
so that pushing to truncationTexts never throws and preserves original string
values when already strings.

---

Nitpick comments:
In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 196-205: Wrap the transactional block that manipulates the DB and
runs assertions in a try/finally so cleanup always runs: call db.prepare('PRAGMA
foreign_keys = OFF').run(); db.prepare('BEGIN').run(); perform the DELETEs and
the test assertions (including addProviderModelsToSkipModels and assertions on
skipModels) inside the try, and in the finally always execute
db.prepare('ROLLBACK').run(); and db.prepare('PRAGMA foreign_keys = ON').run();
so foreign keys and transaction state are restored even if an assertion throws.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cd1eb9f2-728a-4785-9164-b2607d6815e8

📥 Commits

Reviewing files that changed from the base of the PR and between d9c5a73 and 5f6f0b0.

📒 Files selected for processing (13)

.roo/specs/longcat-session-ban/design.md
.roo/specs/longcat-session-ban/requirements.md
.roo/specs/longcat-session-ban/tasks.md
.roo/specs/longcat-sticky-cooldown/design.md
.roo/specs/longcat-sticky-cooldown/requirements.md
.roo/specs/longcat-sticky-cooldown/tasks.md
.roo/specs/provider-5xx-session-ban/design.md
.roo/specs/provider-5xx-session-ban/requirements.md
.roo/specs/provider-5xx-session-ban/tasks.md
server/src/__tests__/routes/provider-session-ban.test.ts
server/src/__tests__/routes/proxy-tools.test.ts
server/src/routes/proxy.ts
server/src/services/router.ts

coderabbitai · 2026-06-01T22:41:47Z

+When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to:
+
+1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL)
+
+## Context
+
+ The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:13-112). and It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, + optional `keyId` + `lastUsed` } with a 30-min TTL and 500-entry max.
+
+ and eviction.
+
+ The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix malformed requirement block formatting.

Line 5 through Line 15 is merged into a run-on block with broken section boundaries and typos, making functional requirements ambiguous to implement and review.

🧰 Tools

🪛 LanguageTool

[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...within the same LongCat model, which is exactly what LongCat detects as "multiple API k...

(EXACTLY_PRECISELY)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...404, 408, 409, 422, 500, 502, 503, 504, andrate limit, quota, aborted, timeout`, ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...proxy will retry with a different model/keykey. However, [shouldSkipModelOnRetry()](...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...otation within the same model, which is exactly the behavior LongCat detects as "multip...

(EXACTLY_PRECISELY)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...89-98) only clears the sticky key but **keoes preferredKeyId to undefined — but t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.1)

[warning] 11-11: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/requirements.md around lines 5 - 15, The requirements file has a malformed merged block (lines 5–15) that garbles Context and Functional Requirements and introduces typos; please split that run-on paragraph into distinct sections (Context, Existing Behavior, Functional Requirements) and reformat FR-1..FR-13 as separate, numbered bullets, fixing typos and broken sentences (e.g., "keoes", duplicated phrases, misplaced references) and ensuring each FR references the correct code symbols: clearStickyKey(), shouldSkipModelOnRetry(), isRetryableError(), routeRequest(), getStickyModel(), and the LongCat smart-auto logic in router.ts; ensure detection patterns and ban semantics (bannedPlatforms set on stickySessionMap, TTL behavior, skipModels update) are clearly stated, keep NFRs and file-change list intact, and make bannedPlatforms optional for backward compatibility.

coderabbitai · 2026-06-01T22:41:47Z

+### NFR-3: No Router Changes
+
+The router already supports `skipModels` — no changes to [`server/src/services/router.ts`](server/src/services/router.ts) are needed. The cooldown only adds LongCat model IDs to the existing `skipModels` set in the proxy layer.
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the “No Router Changes” requirement — it conflicts with the implemented contract.

NFR-3 says no router changes are needed, but this feature depends on allowing preferredModelDbId to bypass skipModels in routeRequest(). Please update this requirement to avoid contradictory acceptance criteria.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-sticky-cooldown/requirements.md around lines 74 - 77, Update the requirements doc to remove or reword NFR-3 so it no longer asserts “No Router Changes” because the implementation requires allowing preferredModelDbId to bypass skipModels in routeRequest; explicitly state that the router's routeRequest behavior must allow preferredModelDbId to override entries in the skipModels set (or remove the NFR-3 clause entirely) and reference the skipModels and preferredModelDbId behavior in routeRequest to avoid contradictory acceptance criteria.

coderabbitai · 2026-06-01T22:41:47Z

+    // Add API keys via proper endpoint (encrypts correctly) so routing can succeed
+    await request(app, 'POST', '/api/keys', {
+      platform: 'longcat',
+      key: 'lc_cooldown_active_test',
+      label: 'cooldown-active-longcat',
+    });
+    await request(app, 'POST', '/api/keys', {
+      platform: 'groq',
+      key: 'gsk_cooldown_active_test',
+      label: 'cooldown-active-groq',
+    });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

New /api/keys test calls are missing admin-key authentication coverage.

Per route-gating policy, /api/* must be admin-key protected and distinct from /v1/* unified-key auth. These additions keep codifying unauthenticated /api/keys calls, which can mask regressions in access control. Add explicit admin-key headers (and ideally one 401 negative test with unified key on /api/*). As per coding guidelines, "The admin key must gate /api/* routes; the unified API key must gate /v1/* routes — they must never overlap, and using one against the wrong route returns 401".

Also applies to: 909-918, 989-998, 1058-1062, 1123-1127, 1176-1180

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 839 - 849, The new POST calls to /api/keys in server/src/__tests__/routes/proxy-tools.test.ts are missing admin-key authentication and must be updated to include the admin-key header; update each request(app, 'POST', '/api/keys', {...}) call to send the admin header (e.g. add headers: { 'x-admin-key': ADMIN_KEY } or the project’s admin header constant) so the route is exercised under admin-key gating, and add one negative test that calls POST /api/keys with a unified API key header (e.g. the unified key header used for /v1/*) asserting a 401 response to prove unified keys are rejected for /api/* paths. Ensure you mirror this change for the other occurrences noted (around lines referenced in the review: 909-918, 989-998, 1058-1062, 1123-1127, 1176-1180).

coderabbitai · 2026-06-01T22:41:47Z

+  it('excludes LongCat from bandit routing for non-sticky sessions during cooldown', async () => {
+    const db = getDb();
+
+    // Set up a sticky session on LongCat for ONE session (within cooldown)
+    const stickyMessages = makeMessages('sticky longcat session during cooldown');
+    const stickyKey = getSessionKey(stickyMessages, 'balanced');
+    const longcatRow = db.prepare('SELECT id FROM models WHERE platform = ? AND enabled = 1').get('longcat') as { id: number } | undefined;
+    expect(longcatRow).toBeDefined();
+    (stickySessionMap as Map<any, any>).set(stickyKey, {
+      modelDbId: longcatRow!.id,
+      lastUsed: Date.now() - 1000, // within cooldown
+    });
+
+    // Add keys for both providers
+    await request(app, 'POST', '/api/keys', {
+      platform: 'longcat',
+      key: 'lc_cooldown_exclusion_test',
+      label: 'cooldown-exclusion-longcat',
+    });
+    await request(app, 'POST', '/api/keys', {
+      platform: 'groq',
+      key: 'gsk_cooldown_exclusion_test',
+      label: 'cooldown-exclusion-groq',
+    });
+
+    // Now send a request from a DIFFERENT session (no sticky LongCat)
+    // This session should NOT be able to route to LongCat because the
+    // other session's cooldown excludes LongCat from the bandit router.
+    // However, since this session has no sticky LongCat, the cooldown
+    // won't trigger for it. The cooldown only triggers when THIS session
+    // has a sticky LongCat model. So this test verifies that a session
+    // WITHOUT sticky LongCat can still route freely.
+    // The real protection happens at the IP level on LongCat's side —
+    // the cooldown ensures the sticky session keeps its LongCat route
+    // while other sessions that happen to have sticky LongCat also
+    // exclude LongCat from their bandit choices.
+    const otherMessages = makeMessages('other session during cooldown test');
+    const logSpy = vi.spyOn(console, 'log');
+    const origFetch = global.fetch;
+    let routedProvider = '';
+
+    vi.spyOn(global, 'fetch').mockImplementation(async (url, init) => {
+      const urlStr = typeof url === 'string' ? url : url.toString();
+      if (urlStr.startsWith('http://127.0.0.1') || urlStr.startsWith('http://localhost')) {
+        return origFetch(url, init);
+      }
+      if (!urlStr.includes('/chat/completions')) return origFetch(url, init);
+
+      if (urlStr.includes('longcat')) routedProvider = 'longcat';
+      else if (urlStr.includes('groq')) routedProvider = 'groq';
+
+      const body = JSON.parse((init as any).body);
+      return {
+        ok: true,
+        json: () => Promise.resolve({
+          id: 'chatcmpl-other-session',
+          object: 'chat.completion',
+          created: 123,
+          model: body.model,
+          choices: [{
+            index: 0,
+            message: { role: 'assistant', content: 'other session test response' },
+            finish_reason: 'stop',
+          }],
+          usage: { prompt_tokens: 4, completion_tokens: 2, total_tokens: 6 },
+        }),
+      } as any;
+    });
+
+    const { status } = await request(app, 'POST', '/v1/chat/completions', {
+      messages: otherMessages,
+    });
+
+    expect(status).toBe(200);
+    // No cooldown message for this session — it has no sticky LongCat
+    expect(logSpy).not.toHaveBeenCalledWith(
+      expect.stringContaining('[Sticky] LongCat cooldown active')
+    );
+  });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This test asserts the opposite of the cooldown requirement for “other sessions.”

The test name says non-sticky sessions should be excluded from LongCat during cooldown, but the inline assertions/comments expect no exclusion and never verify provider selection. That weakens coverage for the core safeguard. Assert that the non-sticky request does not route to LongCat (e.g., expect(routedProvider).toBe('groq') or not.toBe('longcat')) and remove contradictory commentary.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 895 - 973, The test currently asserts the opposite of the intended cooldown behavior for other sessions; update the test to assert the non-sticky session does NOT route to LongCat by adding an assertion on the routedProvider (e.g., expect(routedProvider).toBe('groq') or expect(routedProvider).not.toBe('longcat')) after the POST to /v1/chat/completions, remove or correct the contradictory inline comments that claim no exclusion, and ensure the mocked fetch logic (the routedProvider variable set in the global.fetch spy) is used for this assertion so the test actually verifies the cooldown exclusion behavior tied to stickySessionMap/getSessionKey/makeMessages.

coderabbitai · 2026-06-01T22:41:47Z

+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow?.platform === 'longcat') {
+      const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
+      const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
+      if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
+        const ageMs = Date.now() - cooldownEntry.lastUsed;
+        addProviderModelsToSkipModels(skipModels, 'longcat');
+        console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
      }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cooldown scope only checks the current session, not “other sessions.”

At Line 1251-Line 1253, cooldown state is derived from the current session key only. That means sessions without a LongCat preferred model won’t have LongCat excluded, even if another sticky LongCat session used it within the cooldown window.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1247 - 1257, The cooldown check currently only looks up the current session key; instead, scan stickySessionMap for any entry (other than the current session key returned by getSessionKey(normalizedMessages, routingMode)) whose platform is 'longcat' and whose lastUsed is within LONGCAT_STICKY_COOLDOWN_MS, and if found call addProviderModelsToSkipModels(skipModels, 'longcat') and log the exclusion (include the offending session key slice and ageMs); use the same symbols prefRow/platform check, getSessionKey, stickySessionMap, LONGCAT_STICKY_COOLDOWN_MS, addProviderModelsToSkipModels, skipModels, normalizedMessages, and routingMode to locate and update the code.

…del-only skip on 5xx

vi70x3 added 7 commits June 1, 2026 22:00

fix: rm excessive checks

27a9f2f

test: add units

8772ba4

feat(proxy): add LongCat sticky session cooldown safeguard

7dd4189

feat(router): longcat sticky cooldown — exclude LongCat from bandit f…

5f6f0b0

…or other sessions during 3-min cooldown window

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot suggested changes Jun 1, 2026

View reviewed changes

vi70x3 force-pushed the spec/longcat-sticky-cooldown branch from 666040e to 5f6f0b0 Compare June 1, 2026 23:16

refactor(proxy): LongCat immediate provider exclusion, non-LongCat mo…

95722bc

…del-only skip on 5xx

vi70x3 closed this Jun 2, 2026

		### NFR-3: No Router Changes

		The router already supports `skipModels` — no changes to [`server/src/services/router.ts`](server/src/services/router.ts) are needed. The cooldown only adds LongCat model IDs to the existing `skipModels` set in the proxy layer.

Conversation

vi70x3 commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Issue: Cooldown Safeguard Logic Error (Other Sessions Not Protected)

Solution

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vi70x3 commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading