feat: longcat sticky session cooldown safeguard#6
Conversation
Implements the LongCat session ban feature in proxy.ts. When LongCat detects multiple API key use (auth/rate-limit errors) or returns truncated responses, the platform is banned from the sticky session. Future requests in that session route to non-LongCat models. Changes: - Extend stickySessionMap with bannedPlatforms?: Set<string> - Add isSessionBannedFromPlatform() to check session bans - Add banPlatformFromSession() to record platform bans - Add addLongcatModelsToSkipModels() to skip all LongCat models - Add isTruncatedResponse() to detect truncation keywords - Update getStickyKey() to return undefined for banned platforms - Update setStickyModel() to preserve bannedPlatforms across updates - Update pre-routing logic to check bans before routing - Update error handling to ban LongCat on auth/rate-limit/truncation - Add truncation detection after stream completes - Add truncation detection in mid-stream error handling
… consecutive failures - Extend stickySessionMap with consecutiveFailures tracking per provider - Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures() - Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels - Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2) - Generalize truncation detection to all providers (post-stream + mid-stream) - Update getStickyKey() to check bannedPlatforms for any platform - Update pre-routing ban check to be generic (any banned platform) - Add success path counter reset on both streaming and non-streaming paths - Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels - Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases) - TypeScript compiles cleanly, all 150 tests pass
- Add isBanEligibleStatus() helper restricting to {500,502,503,504}
- Improve mid-stream truncation detection with aggregated error sources
- Pre-routing ban check now skips ALL banned platforms, not just preferredModel's
- Only clear preferredModel when provider is actually banned (not on first 5xx)
- Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)
…or other sessions during 3-min cooldown window
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR implements a sophisticated sticky-session banning and fallback mechanism for the proxy routing layer. It extends session state to track per-platform bans and consecutive failures, adds provider-agnostic helper functions, and wires error detection throughout the retry loop to automatically ban platforms after consecutive 5xx responses or truncation, with a cooldown feature to exclude recently-pinned LongCat models from other sessions. ChangesSpecification & Design Documentation
Session Banning Implementation
Session Banning & Cooldown Tests
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request generalizes the session ban and fallback mechanisms (including 5xx consecutive failures and truncation detection) to all providers, and introduces a 3-minute cooldown safeguard for LongCat sticky sessions. Feedback on the changes highlights a critical logic error in the cooldown safeguard implementation: it currently only checks if the current session is a hot LongCat session instead of checking if any active session in the map is hot, which fails to protect other sessions from routing to LongCat. Additionally, it is recommended to optimize redundant database queries in the hot path.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow && isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) { | ||
| console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`); | ||
| preferredModel = undefined; | ||
| preferredKeyId = undefined; | ||
| } | ||
| } | ||
|
|
||
| // LongCat sticky cooldown: if the sticky model is on LongCat and was used | ||
| // within the last 3 minutes, exclude LongCat from the bandit router for all | ||
| // other sessions. The current sticky session keeps its pinned LongCat route. | ||
| // This prevents LongCat from seeing multiple sessions/keys from the same IP. | ||
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === 'longcat') { | ||
| const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode); | ||
| const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined; | ||
| if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { | ||
| const ageMs = Date.now() - cooldownEntry.lastUsed; | ||
| addProviderModelsToSkipModels(skipModels, 'longcat'); | ||
| console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Issue: Cooldown Safeguard Logic Error (Other Sessions Not Protected)
The current implementation of the LongCat sticky session cooldown safeguard only checks if the current session has a sticky LongCat model and is within the cooldown window. If a request comes from a different session (e.g., a new session or one sticky on another provider), preferredModel is not LongCat, so the cooldown check is skipped entirely. Consequently, other sessions are never restricted from routing to LongCat during the cooldown period, which completely defeats the purpose of the safeguard.
Additionally, the current code performs the exact same database query (SELECT platform FROM models WHERE id = ?) twice on the same preferredModel in the hot path, which is redundant and inefficient.
Solution
To correctly implement the safeguard, we should check if any active session in stickySessionMap is a hot LongCat session. If so, we exclude LongCat models from bandit routing for all other sessions by adding them to skipModels. The router's bypass logic (entry.model_db_id !== preferredModelDbId) will still allow the hot session itself to use its pinned model.
We can also optimize this by querying the enabled LongCat model IDs once and checking the map in memory, completely avoiding redundant database queries.
let preferredModelPlatform: string | undefined;
if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow) {
preferredModelPlatform = prefRow.platform;
if (isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) {
console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`);
preferredModel = undefined;
preferredKeyId = undefined;
preferredModelPlatform = undefined;
}
}
}
// LongCat sticky cooldown: if any sticky session is pinned to a LongCat model
// and was used within the last 3 minutes, exclude LongCat entirely from the
// bandit router for all other sessions.
const now = Date.now();
let hasHotLongcatSession = false;
let hotSessionKey = '';
let hotSessionAgeMs = 0;
const db = getDb();
const longcatModels = db.prepare("SELECT id FROM models WHERE platform = 'longcat' AND enabled = 1").all() as Array<{ id: number }>;
const longcatModelIds = new Set(longcatModels.map(m => m.id));
for (const [sKey, entry] of stickySessionMap) {
if (now - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS && longcatModelIds.has(entry.modelDbId)) {
hasHotLongcatSession = true;
hotSessionKey = sKey;
hotSessionAgeMs = now - entry.lastUsed;
break;
}
}
if (hasHotLongcatSession) {
addProviderModelsToSkipModels(skipModels, 'longcat');
console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | active session=${hotSessionKey.slice(0, 8)} | lastUsed=${hotSessionAgeMs}ms ago`);
}There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
server/src/__tests__/routes/provider-session-ban.test.ts (1)
196-205: ⚡ Quick winUse
try/finallyfor transactional test cleanup.If an assertion fails between Line 196 and Line 205,
ROLLBACK/PRAGMA foreign_keys = ONmay never run, contaminating later tests.Suggested test hardening
const db = getDb(); db.prepare('PRAGMA foreign_keys = OFF').run(); db.prepare('BEGIN').run(); - db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run(); - db.prepare("DELETE FROM models WHERE platform = 'longcat'").run(); - const skipModels = new Set<number>(); - expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow(); - expect(skipModels.size).toBe(0); - db.prepare('ROLLBACK').run(); - db.prepare('PRAGMA foreign_keys = ON').run(); + try { + db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run(); + db.prepare("DELETE FROM models WHERE platform = 'longcat'").run(); + const skipModels = new Set<number>(); + expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow(); + expect(skipModels.size).toBe(0); + } finally { + db.prepare('ROLLBACK').run(); + db.prepare('PRAGMA foreign_keys = ON').run(); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/provider-session-ban.test.ts` around lines 196 - 205, Wrap the transactional block that manipulates the DB and runs assertions in a try/finally so cleanup always runs: call db.prepare('PRAGMA foreign_keys = OFF').run(); db.prepare('BEGIN').run(); perform the DELETEs and the test assertions (including addProviderModelsToSkipModels and assertions on skipModels) inside the try, and in the finally always execute db.prepare('ROLLBACK').run(); and db.prepare('PRAGMA foreign_keys = ON').run(); so foreign keys and transaction state are restored even if an assertion throws.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.roo/specs/longcat-session-ban/requirements.md:
- Around line 5-15: The requirements file has a malformed merged block (lines
5–15) that garbles Context and Functional Requirements and introduces typos;
please split that run-on paragraph into distinct sections (Context, Existing
Behavior, Functional Requirements) and reformat FR-1..FR-13 as separate,
numbered bullets, fixing typos and broken sentences (e.g., "keoes", duplicated
phrases, misplaced references) and ensuring each FR references the correct code
symbols: clearStickyKey(), shouldSkipModelOnRetry(), isRetryableError(),
routeRequest(), getStickyModel(), and the LongCat smart-auto logic in router.ts;
ensure detection patterns and ban semantics (bannedPlatforms set on
stickySessionMap, TTL behavior, skipModels update) are clearly stated, keep NFRs
and file-change list intact, and make bannedPlatforms optional for backward
compatibility.
In @.roo/specs/longcat-sticky-cooldown/requirements.md:
- Around line 74-77: Update the requirements doc to remove or reword NFR-3 so it
no longer asserts “No Router Changes” because the implementation requires
allowing preferredModelDbId to bypass skipModels in routeRequest; explicitly
state that the router's routeRequest behavior must allow preferredModelDbId to
override entries in the skipModels set (or remove the NFR-3 clause entirely) and
reference the skipModels and preferredModelDbId behavior in routeRequest to
avoid contradictory acceptance criteria.
In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 895-973: The test currently asserts the opposite of the intended
cooldown behavior for other sessions; update the test to assert the non-sticky
session does NOT route to LongCat by adding an assertion on the routedProvider
(e.g., expect(routedProvider).toBe('groq') or
expect(routedProvider).not.toBe('longcat')) after the POST to
/v1/chat/completions, remove or correct the contradictory inline comments that
claim no exclusion, and ensure the mocked fetch logic (the routedProvider
variable set in the global.fetch spy) is used for this assertion so the test
actually verifies the cooldown exclusion behavior tied to
stickySessionMap/getSessionKey/makeMessages.
- Around line 839-849: The new POST calls to /api/keys in
server/src/__tests__/routes/proxy-tools.test.ts are missing admin-key
authentication and must be updated to include the admin-key header; update each
request(app, 'POST', '/api/keys', {...}) call to send the admin header (e.g. add
headers: { 'x-admin-key': ADMIN_KEY } or the project’s admin header constant) so
the route is exercised under admin-key gating, and add one negative test that
calls POST /api/keys with a unified API key header (e.g. the unified key header
used for /v1/*) asserting a 401 response to prove unified keys are rejected for
/api/* paths. Ensure you mirror this change for the other occurrences noted
(around lines referenced in the review: 909-918, 989-998, 1058-1062, 1123-1127,
1176-1180).
In `@server/src/routes/proxy.ts`:
- Around line 1247-1257: The cooldown check currently only looks up the current
session key; instead, scan stickySessionMap for any entry (other than the
current session key returned by getSessionKey(normalizedMessages, routingMode))
whose platform is 'longcat' and whose lastUsed is within
LONGCAT_STICKY_COOLDOWN_MS, and if found call
addProviderModelsToSkipModels(skipModels, 'longcat') and log the exclusion
(include the offending session key slice and ageMs); use the same symbols
prefRow/platform check, getSessionKey, stickySessionMap,
LONGCAT_STICKY_COOLDOWN_MS, addProviderModelsToSkipModels, skipModels,
normalizedMessages, and routingMode to locate and update the code.
- Around line 1416-1422: The mid-stream error parser that pushes serialized
error payloads into truncationTexts can throw when JSON.stringify encounters
circular structures; in the block handling streamErr.response.data and
streamErr.body (look for truncationTexts and streamErr in proxy.ts) wrap JSON
serialization in a safe routine: attempt JSON.stringify inside a try/catch and
fall back to a non-throwing representation (e.g., use String(...),
util.inspect(...), or a small safeStringify that returns a placeholder on error)
so that pushing to truncationTexts never throws and preserves original string
values when already strings.
---
Nitpick comments:
In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 196-205: Wrap the transactional block that manipulates the DB and
runs assertions in a try/finally so cleanup always runs: call db.prepare('PRAGMA
foreign_keys = OFF').run(); db.prepare('BEGIN').run(); perform the DELETEs and
the test assertions (including addProviderModelsToSkipModels and assertions on
skipModels) inside the try, and in the finally always execute
db.prepare('ROLLBACK').run(); and db.prepare('PRAGMA foreign_keys = ON').run();
so foreign keys and transaction state are restored even if an assertion throws.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: cd1eb9f2-728a-4785-9164-b2607d6815e8
📒 Files selected for processing (13)
.roo/specs/longcat-session-ban/design.md.roo/specs/longcat-session-ban/requirements.md.roo/specs/longcat-session-ban/tasks.md.roo/specs/longcat-sticky-cooldown/design.md.roo/specs/longcat-sticky-cooldown/requirements.md.roo/specs/longcat-sticky-cooldown/tasks.md.roo/specs/provider-5xx-session-ban/design.md.roo/specs/provider-5xx-session-ban/requirements.md.roo/specs/provider-5xx-session-ban/tasks.mdserver/src/__tests__/routes/provider-session-ban.test.tsserver/src/__tests__/routes/proxy-tools.test.tsserver/src/routes/proxy.tsserver/src/services/router.ts
| When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to: | ||
|
|
||
| 1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL) | ||
|
|
||
| ## Context | ||
|
|
||
| The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:13-112). and It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, + optional `keyId` + `lastUsed` } with a 30-min TTL and 500-entry max. | ||
|
|
||
| and eviction. | ||
|
|
||
| The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description | |
There was a problem hiding this comment.
Fix malformed requirement block formatting.
Line 5 through Line 15 is merged into a run-on block with broken section boundaries and typos, making functional requirements ambiguous to implement and review.
🧰 Tools
🪛 LanguageTool
[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...within the same LongCat model, which is exactly what LongCat detects as "multiple API k...
(EXACTLY_PRECISELY)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...404, 408, 409, 422, 500, 502, 503, 504, andrate limit, quota, aborted, timeout`, ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...proxy will retry with a different model/keykey. However, [shouldSkipModelOnRetry()](...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...otation within the same model, which is exactly the behavior LongCat detects as "multip...
(EXACTLY_PRECISELY)
[grammar] ~15-~15: Ensure spelling is correct
Context: ...89-98) only clears the sticky key but **keoes preferredKeyId to undefined — but t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.22.1)
[warning] 11-11: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
[warning] 15-15: Spaces inside code span elements
(MD038, no-space-in-code)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-session-ban/requirements.md around lines 5 - 15, The
requirements file has a malformed merged block (lines 5–15) that garbles Context
and Functional Requirements and introduces typos; please split that run-on
paragraph into distinct sections (Context, Existing Behavior, Functional
Requirements) and reformat FR-1..FR-13 as separate, numbered bullets, fixing
typos and broken sentences (e.g., "keoes", duplicated phrases, misplaced
references) and ensuring each FR references the correct code symbols:
clearStickyKey(), shouldSkipModelOnRetry(), isRetryableError(), routeRequest(),
getStickyModel(), and the LongCat smart-auto logic in router.ts; ensure
detection patterns and ban semantics (bannedPlatforms set on stickySessionMap,
TTL behavior, skipModels update) are clearly stated, keep NFRs and file-change
list intact, and make bannedPlatforms optional for backward compatibility.
| ### NFR-3: No Router Changes | ||
|
|
||
| The router already supports `skipModels` — no changes to [`server/src/services/router.ts`](server/src/services/router.ts) are needed. The cooldown only adds LongCat model IDs to the existing `skipModels` set in the proxy layer. | ||
|
|
There was a problem hiding this comment.
Remove the “No Router Changes” requirement — it conflicts with the implemented contract.
NFR-3 says no router changes are needed, but this feature depends on allowing preferredModelDbId to bypass skipModels in routeRequest(). Please update this requirement to avoid contradictory acceptance criteria.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/longcat-sticky-cooldown/requirements.md around lines 74 - 77,
Update the requirements doc to remove or reword NFR-3 so it no longer asserts
“No Router Changes” because the implementation requires allowing
preferredModelDbId to bypass skipModels in routeRequest; explicitly state that
the router's routeRequest behavior must allow preferredModelDbId to override
entries in the skipModels set (or remove the NFR-3 clause entirely) and
reference the skipModels and preferredModelDbId behavior in routeRequest to
avoid contradictory acceptance criteria.
| // Add API keys via proper endpoint (encrypts correctly) so routing can succeed | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'longcat', | ||
| key: 'lc_cooldown_active_test', | ||
| label: 'cooldown-active-longcat', | ||
| }); | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'groq', | ||
| key: 'gsk_cooldown_active_test', | ||
| label: 'cooldown-active-groq', | ||
| }); |
There was a problem hiding this comment.
New /api/keys test calls are missing admin-key authentication coverage.
Per route-gating policy, /api/* must be admin-key protected and distinct from /v1/* unified-key auth. These additions keep codifying unauthenticated /api/keys calls, which can mask regressions in access control. Add explicit admin-key headers (and ideally one 401 negative test with unified key on /api/*). As per coding guidelines, "The admin key must gate /api/* routes; the unified API key must gate /v1/* routes — they must never overlap, and using one against the wrong route returns 401".
Also applies to: 909-918, 989-998, 1058-1062, 1123-1127, 1176-1180
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 839 - 849, The
new POST calls to /api/keys in server/src/__tests__/routes/proxy-tools.test.ts
are missing admin-key authentication and must be updated to include the
admin-key header; update each request(app, 'POST', '/api/keys', {...}) call to
send the admin header (e.g. add headers: { 'x-admin-key': ADMIN_KEY } or the
project’s admin header constant) so the route is exercised under admin-key
gating, and add one negative test that calls POST /api/keys with a unified API
key header (e.g. the unified key header used for /v1/*) asserting a 401 response
to prove unified keys are rejected for /api/* paths. Ensure you mirror this
change for the other occurrences noted (around lines referenced in the review:
909-918, 989-998, 1058-1062, 1123-1127, 1176-1180).
| it('excludes LongCat from bandit routing for non-sticky sessions during cooldown', async () => { | ||
| const db = getDb(); | ||
|
|
||
| // Set up a sticky session on LongCat for ONE session (within cooldown) | ||
| const stickyMessages = makeMessages('sticky longcat session during cooldown'); | ||
| const stickyKey = getSessionKey(stickyMessages, 'balanced'); | ||
| const longcatRow = db.prepare('SELECT id FROM models WHERE platform = ? AND enabled = 1').get('longcat') as { id: number } | undefined; | ||
| expect(longcatRow).toBeDefined(); | ||
| (stickySessionMap as Map<any, any>).set(stickyKey, { | ||
| modelDbId: longcatRow!.id, | ||
| lastUsed: Date.now() - 1000, // within cooldown | ||
| }); | ||
|
|
||
| // Add keys for both providers | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'longcat', | ||
| key: 'lc_cooldown_exclusion_test', | ||
| label: 'cooldown-exclusion-longcat', | ||
| }); | ||
| await request(app, 'POST', '/api/keys', { | ||
| platform: 'groq', | ||
| key: 'gsk_cooldown_exclusion_test', | ||
| label: 'cooldown-exclusion-groq', | ||
| }); | ||
|
|
||
| // Now send a request from a DIFFERENT session (no sticky LongCat) | ||
| // This session should NOT be able to route to LongCat because the | ||
| // other session's cooldown excludes LongCat from the bandit router. | ||
| // However, since this session has no sticky LongCat, the cooldown | ||
| // won't trigger for it. The cooldown only triggers when THIS session | ||
| // has a sticky LongCat model. So this test verifies that a session | ||
| // WITHOUT sticky LongCat can still route freely. | ||
| // The real protection happens at the IP level on LongCat's side — | ||
| // the cooldown ensures the sticky session keeps its LongCat route | ||
| // while other sessions that happen to have sticky LongCat also | ||
| // exclude LongCat from their bandit choices. | ||
| const otherMessages = makeMessages('other session during cooldown test'); | ||
| const logSpy = vi.spyOn(console, 'log'); | ||
| const origFetch = global.fetch; | ||
| let routedProvider = ''; | ||
|
|
||
| vi.spyOn(global, 'fetch').mockImplementation(async (url, init) => { | ||
| const urlStr = typeof url === 'string' ? url : url.toString(); | ||
| if (urlStr.startsWith('http://127.0.0.1') || urlStr.startsWith('http://localhost')) { | ||
| return origFetch(url, init); | ||
| } | ||
| if (!urlStr.includes('/chat/completions')) return origFetch(url, init); | ||
|
|
||
| if (urlStr.includes('longcat')) routedProvider = 'longcat'; | ||
| else if (urlStr.includes('groq')) routedProvider = 'groq'; | ||
|
|
||
| const body = JSON.parse((init as any).body); | ||
| return { | ||
| ok: true, | ||
| json: () => Promise.resolve({ | ||
| id: 'chatcmpl-other-session', | ||
| object: 'chat.completion', | ||
| created: 123, | ||
| model: body.model, | ||
| choices: [{ | ||
| index: 0, | ||
| message: { role: 'assistant', content: 'other session test response' }, | ||
| finish_reason: 'stop', | ||
| }], | ||
| usage: { prompt_tokens: 4, completion_tokens: 2, total_tokens: 6 }, | ||
| }), | ||
| } as any; | ||
| }); | ||
|
|
||
| const { status } = await request(app, 'POST', '/v1/chat/completions', { | ||
| messages: otherMessages, | ||
| }); | ||
|
|
||
| expect(status).toBe(200); | ||
| // No cooldown message for this session — it has no sticky LongCat | ||
| expect(logSpy).not.toHaveBeenCalledWith( | ||
| expect.stringContaining('[Sticky] LongCat cooldown active') | ||
| ); | ||
| }); |
There was a problem hiding this comment.
This test asserts the opposite of the cooldown requirement for “other sessions.”
The test name says non-sticky sessions should be excluded from LongCat during cooldown, but the inline assertions/comments expect no exclusion and never verify provider selection. That weakens coverage for the core safeguard. Assert that the non-sticky request does not route to LongCat (e.g., expect(routedProvider).toBe('groq') or not.toBe('longcat')) and remove contradictory commentary.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 895 - 973, The
test currently asserts the opposite of the intended cooldown behavior for other
sessions; update the test to assert the non-sticky session does NOT route to
LongCat by adding an assertion on the routedProvider (e.g.,
expect(routedProvider).toBe('groq') or
expect(routedProvider).not.toBe('longcat')) after the POST to
/v1/chat/completions, remove or correct the contradictory inline comments that
claim no exclusion, and ensure the mocked fetch logic (the routedProvider
variable set in the global.fetch spy) is used for this assertion so the test
actually verifies the cooldown exclusion behavior tied to
stickySessionMap/getSessionKey/makeMessages.
| if (preferredModel) { | ||
| const db = getDb(); | ||
| const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined; | ||
| if (prefRow?.platform === 'longcat') { | ||
| const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode); | ||
| const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined; | ||
| if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { | ||
| const ageMs = Date.now() - cooldownEntry.lastUsed; | ||
| addProviderModelsToSkipModels(skipModels, 'longcat'); | ||
| console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`); | ||
| } |
There was a problem hiding this comment.
Cooldown scope only checks the current session, not “other sessions.”
At Line 1251-Line 1253, cooldown state is derived from the current session key only. That means sessions without a LongCat preferred model won’t have LongCat excluded, even if another sticky LongCat session used it within the cooldown window.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/routes/proxy.ts` around lines 1247 - 1257, The cooldown check
currently only looks up the current session key; instead, scan stickySessionMap
for any entry (other than the current session key returned by
getSessionKey(normalizedMessages, routingMode)) whose platform is 'longcat' and
whose lastUsed is within LONGCAT_STICKY_COOLDOWN_MS, and if found call
addProviderModelsToSkipModels(skipModels, 'longcat') and log the exclusion
(include the offending session key slice and ageMs); use the same symbols
prefRow/platform check, getSessionKey, stickySessionMap,
LONGCAT_STICKY_COOLDOWN_MS, addProviderModelsToSkipModels, skipModels,
normalizedMessages, and routingMode to locate and update the code.
666040e to
5f6f0b0
Compare
…del-only skip on 5xx
Implements the
longcat-sticky-cooldownspec.Summary
When a sticky session is pinned to a LongCat model AND was used within the last 3 minutes, exclude LongCat entirely from the bandit router for all other sessions. The current sticky session keeps its pinned LongCat model+key — only that session may use LongCat. This prevents LongCat from seeing multiple sessions/keys from the same IP.
Changes
server/src/routes/proxy.ts— AddedLONGCAT_STICKY_COOLDOWN_MS = 3 * 60 * 1000constantserver/src/routes/proxy.ts— Cooldown check adds LongCat models toskipModels(instead of clearing sticky preference). Sticky session keeps its LongCat pin.server/src/services/router.ts— ModifiedskipModelscheck to exclude the sticky model:&& entry.model_db_id !== preferredModelDbId— ensures the sticky session's LongCat model is never skippedserver/src/__tests__/routes/proxy-tools.test.ts— Updated 6 unit tests covering: cooldown active (preserves sticky + excludes LongCat from bandit), cooldown expired, non-LongCat provider, ban precedence, no sticky session, other sessions during cooldownAll 156 tests pass with zero regressions.
Summary by CodeRabbit
New Features
Bug Fixes
Tests