Skip to content

feat: longcat-sticky-cooldown#5

Closed
vi70x3 wants to merge 6 commits into
mainfrom
spec/longcat-sticky-cooldown
Closed

feat: longcat-sticky-cooldown#5
vi70x3 wants to merge 6 commits into
mainfrom
spec/longcat-sticky-cooldown

Conversation

@vi70x3

@vi70x3 vi70x3 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Implements the longcat-sticky-cooldown spec.

Adds a 3-minute cooldown safeguard for LongCat sticky sessions: when a sticky session's preferred model is on the LongCat platform and was used within the last 3 minutes, the sticky preference is temporarily bypassed for that request only, allowing the bandit router to pick freely (it may still route to LongCat organically via the smart-mode boost).

Changes:

  • Added LONGCAT_STICKY_COOLDOWN_MS constant (3 min) in server/src/routes/proxy.ts
  • Added cooldown check logic in handleChatCompletion() after the ban check, before the retry loop
  • Added 5 unit tests covering: cooldown active, cooldown expired, non-LongCat provider, ban precedence, no sticky session
  • All 155 tests pass with zero regressions

Summary by CodeRabbit

Release Notes

  • New Features

    • Session-level provider banning: problematic providers are automatically blacklisted for the remainder of a conversation session, triggering seamless fallback to alternative models.
    • Sticky session cooldown: recently used models are temporarily deprioritized for 3 minutes to improve routing diversity.
  • Bug Fixes

    • Improved error detection and recovery: consecutive provider failures and truncated responses are now tracked, enabling faster fallback to reliable alternatives.
  • Tests

    • Comprehensive provider session ban and cooldown test coverage added.

vi70x3 added 6 commits June 1, 2026 22:00
Implements the LongCat session ban feature in proxy.ts. When LongCat
detects multiple API key use (auth/rate-limit errors) or returns
truncated responses, the platform is banned from the sticky session.
Future requests in that session route to non-LongCat models.

Changes:
- Extend stickySessionMap with bannedPlatforms?: Set<string>
- Add isSessionBannedFromPlatform() to check session bans
- Add banPlatformFromSession() to record platform bans
- Add addLongcatModelsToSkipModels() to skip all LongCat models
- Add isTruncatedResponse() to detect truncation keywords
- Update getStickyKey() to return undefined for banned platforms
- Update setStickyModel() to preserve bannedPlatforms across updates
- Update pre-routing logic to check bans before routing
- Update error handling to ban LongCat on auth/rate-limit/truncation
- Add truncation detection after stream completes
- Add truncation detection in mid-stream error handling
… consecutive failures

- Extend stickySessionMap with consecutiveFailures tracking per provider
- Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures()
- Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels
- Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2)
- Generalize truncation detection to all providers (post-stream + mid-stream)
- Update getStickyKey() to check bannedPlatforms for any platform
- Update pre-routing ban check to be generic (any banned platform)
- Add success path counter reset on both streaming and non-streaming paths
- Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels
- Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases)
- TypeScript compiles cleanly, all 150 tests pass
- Add isBanEligibleStatus() helper restricting to {500,502,503,504}
- Improve mid-stream truncation detection with aggregated error sources
- Pre-routing ban check now skips ALL banned platforms, not just preferredModel's
- Only clear preferredModel when provider is actually banned (not on first 5xx)
- Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces sticky-session platform bans and a 3-minute LongCat cooldown safeguard, generalizing LongCat-specific logic to any provider. Session state now tracks banned platforms and per-provider 5xx failure counts, triggering bans after consecutive failures or truncation detection. All changes are in-memory with no DB schema modifications.

Changes

Session Ban & Cooldown Implementation

Layer / File(s) Summary
Feature Specification & Planning
.roo/specs/longcat-session-ban/*, .roo/specs/longcat-sticky-cooldown/*, .roo/specs/provider-5xx-session-ban/*
Three interlinked specifications define LongCat session bans (auth/rate-limit/truncation triggers), generalized provider 5xx bans (consecutive failure tracking), and a 3-minute LongCat sticky-session cooldown. Includes functional/non-functional requirements, implementation tasks, and test coverage plans.
Sticky Session State Extension
server/src/routes/proxy.ts
stickySessionMap entry type expanded to include bannedPlatforms Set and per-provider consecutiveFailures Map, enabling session-level ban and failure-state tracking without DB changes.
Ban & Failure Helper Functions
server/src/routes/proxy.ts
Seven new exported utilities: isSessionBannedFromPlatform() checks ban status with TTL awareness, banPlatformFromSession() records bans and creates sticky entries, addProviderModelsToSkipModels() populates retry skip-sets by provider, recordConsecutiveFailure() tracks 5xx failures and triggers bans on threshold, reset helpers clear per-provider and global counters, and isTruncatedResponse() detects truncation patterns.
Session Lifecycle & Sticky Model Management
server/src/routes/proxy.ts
Sticky model selection now checks banned platforms dynamically from the DB; setStickyModel() preserves existing ban/failure state when updating preferences. Added isBanEligibleStatus() helper to recognize 5xx statuses eligible for ban tracking.
Request Routing & Cooldown Integration
server/src/routes/proxy.ts
handleChatCompletion() now computes skipModels from session-level banned platforms, clears preferred routing when preferred platform is banned, and applies 3-minute LongCat cooldown by suppressing sticky preferences when cooldown is active.
Stream & Mid-Stream Error Handling
server/src/routes/proxy.ts
Post-stream truncation detection bans providers after successful streaming completion. Mid-stream error handling records consecutive 5xx failures, aggregates error text for truncation detection, and gracefully ends SSE streams while banning truncated providers.
Success Path & Failure Counter Reset
server/src/routes/proxy.ts
Success paths reset per-provider and global failure counters to clear failure state. Outer retryable error handling generalizes 5xx failure tracking for any provider and conditionally clears preferred routing only when actually banned.
Unit Test Coverage
server/src/__tests__/routes/provider-session-ban.test.ts
Comprehensive unit tests validate each helper: ban detection, ban recording with TTL refresh, provider model filtering, consecutive failure tracking with threshold bans, counter resets, and truncation pattern matching.
Integration Test Coverage
server/src/__tests__/routes/provider-session-ban.test.ts, server/src/__tests__/routes/proxy-tools.test.ts
Integration tests verify end-to-end ban lifecycle, counter reset on success, and TTL expiration. Separate cooldown test suite validates suppression when active, preservation after expiry, non-interference with non-LongCat sessions, and ban precedence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • vi70x3/freellmapi#2: Earlier work introducing keyId/preferredKeyId support for LongCat sticky sessions; this PR extends that foundation by adding ban and failure-tracking state alongside the key preferences.

Poem

🐰 Hop along, sticky friends,
Bans now guard where retries end,
Five times failed, a provider's plight,
Cooldown rests the LongCat's flight,
Truncation caught, no second chance—
Smart routing leads the session's dance! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: longcat-sticky-cooldown' directly aligns with the main feature implemented: a 3-minute cooldown safeguard for LongCat sticky sessions that bypasses sticky preference when recently used.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec/longcat-sticky-cooldown

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request generalizes the session ban and fallback mechanisms from LongCat to all providers, introducing a 5xx consecutive failure ban (after two consecutive 5xx errors) and extending truncation detection to all platforms. It also adds a three-minute cooldown safeguard for LongCat sticky sessions to bypass sticky preferences during rapid-fire requests, along with comprehensive unit and integration tests. The reviewer feedback focuses on performance optimizations in server/src/routes/proxy.ts, specifically recommending the reuse of fetched session variables to avoid redundant database queries and TTL checks during pre-routing, and adding a fast-path check to bypass database queries when the preferred model matches the failed route model.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1215 to 1259
// Check if session is banned from any platform — add all banned platforms' models to skipModels
// and clear preferredModel/preferredKeyId if it points to a banned platform.
const skipModels = new Set<number>();
const sessionKey = getSessionKey(normalizedMessages, routingMode);
if (sessionKey) {
const entry = stickySessionMap.get(sessionKey);
if (entry) {
if (Date.now() - entry.lastUsed > STICKY_TTL_MS) {
stickySessionMap.delete(sessionKey);
} else if (entry.bannedPlatforms) {
for (const platform of entry.bannedPlatforms) {
addProviderModelsToSkipModels(skipModels, platform);
console.log(`[Sticky] session banned from ${platform}, adding to skipModels`);
}
}
}
}

if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow && isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) {
console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`);
preferredModel = undefined;
preferredKeyId = undefined;
}
}

// LongCat sticky cooldown: if the sticky model is on LongCat and was used
// within the last 3 minutes, bypass sticky preference for this request only.
// The bandit router picks freely — it may still route to LongCat organically.
if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === 'longcat') {
const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
const ageMs = Date.now() - cooldownEntry.lastUsed;
console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
preferredModel = undefined;
preferredKeyId = undefined;
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a significant opportunity to optimize the pre-routing ban and cooldown checks. Currently, the code performs redundant database queries (SELECT platform FROM models WHERE id = ?), redundant session key lookups, and redundant TTL checks by calling isSessionBannedFromPlatform and re-fetching the session key/entry for the cooldown check.

By reusing the already fetched sessionKey and entry variables, and combining the database query for preferredModel's platform, we can eliminate all of these redundant operations on the hot path.

  // Check if session is banned from any platform — add all banned platforms' models to skipModels
  // and clear preferredModel/preferredKeyId if it points to a banned platform.
  const skipModels = new Set<number>();
  const sessionKey = getSessionKey(normalizedMessages, routingMode);
  let entry = sessionKey ? stickySessionMap.get(sessionKey) : undefined;
  if (entry) {
    if (Date.now() - entry.lastUsed > STICKY_TTL_MS) {
      stickySessionMap.delete(sessionKey);
      entry = undefined;
    } else if (entry.bannedPlatforms) {
      for (const platform of entry.bannedPlatforms) {
        addProviderModelsToSkipModels(skipModels, platform);
        console.log(`[Sticky] session banned from ${platform}, adding to skipModels`);
      }
    }
  }

  if (preferredModel) {
    const db = getDb();
    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
    if (prefRow) {
      if (entry?.bannedPlatforms?.has(prefRow.platform)) {
        console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`);
        preferredModel = undefined;
        preferredKeyId = undefined;
      } else if (prefRow.platform === 'longcat' && entry) {
        if (Date.now() - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
          const ageMs = Date.now() - entry.lastUsed;
          console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${sessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
          preferredModel = undefined;
          preferredKeyId = undefined;
        }
      }
    }
  }

Comment on lines +1516 to +1523
if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === route.platform) {
preferredModel = undefined;
preferredKeyId = undefined;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the error handler, when a provider is banned, we query the database to check if the preferredModel's platform matches the failed route.platform.

In the vast majority of cases where the preferred model fails, preferredModel is exactly equal to route.modelDbId. We can add a fast-path check to compare these IDs directly, avoiding the database query entirely in the most common failure scenario.

Suggested change
if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === route.platform) {
preferredModel = undefined;
preferredKeyId = undefined;
}
}
if (preferredModel && isSessionBannedFromPlatform(normalizedMessages, routingMode, route.platform)) {
if (preferredModel === route.modelDbId) {
preferredModel = undefined;
preferredKeyId = undefined;
} else {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === route.platform) {
preferredModel = undefined;
preferredKeyId = undefined;
}
}
}

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/longcat-session-ban/requirements.md:
- Line 15: The requirements block around the LongCat sticky session logic is
corrupted and must be rewritten: replace the malformed paragraph (around the
broken merged sentences) with a clean, numbered FR list where each functional
requirement is a single concise bullet referencing the exact code symbols
(clearStickyKey, shouldSkipModelOnRetry, isRetryableError, routeRequest,
router.ts smart-auto logic, stickySessionMap, banPlatformFromSession) so
implementers can find the relevant code paths; fix all typos/garbled tokens
(e.g., "401/403", "model/keykey", "TheisRetryableError", "keoes") and ensure
each FR (FR-1..FR-13) is atomic, references the function(s) it affects, and maps
to the corresponding behavior (detection patterns, ban mechanism, skipModels
changes, TTL/expiry, no DB schema change, minimal router tweak, tests) so
reviewers and engineers can implement unambiguously.
- Around line 24-28: The requirements table contains duplicated/malformed rows
(duplicate index 11 and prefixes like "| | 8"/"| | 11") which breaks
traceability; open .roo/specs/longcat-session-ban/requirements.md, remove the
extra duplicate row, renumber the affected rows sequentially, and normalize the
row prefixes so each entry starts with a single pipe and correct index (e.g.,
fix the entries referencing server/src/routes/proxy.ts and
server/src/services/router.ts to have unique indices and consistent "| <index> |
[file](file:range) | Edit | description |" formatting) so all links and indices
are unique and well-formed.

In @.roo/specs/longcat-sticky-cooldown/design.md:
- Around line 56-63: The fenced code block shown (the sequence starting with "1.
preferredModel = getStickyModel(...)" through "6. Retry loop with
routeRequest(...)") lacks a language tag; update the block opening from ``` to a
tagged fence like ```text or ```typescript to satisfy markdown linting and
improve rendering while leaving the block contents unchanged—locate the block in
the longcat-sticky-cooldown design section (the snippet around the numbered
steps referencing getStickyModel, getStickyKey, bannedPlatforms, and
routeRequest) and add the language identifier.

In @.roo/specs/provider-5xx-session-ban/design.md:
- Around line 58-59: The markdown links in design.md that reference server code
(e.g., links pointing to proxy.ts via "../server/src/...") are using incorrect
relative paths; update each occurrence to a normalized path that resolves from
this document (for example remove the leading ".." to use "server/src/..." or
convert to a repo-root absolute/blob URL) so reviewers can click through to the
actual file (search for the mentions of proxy.ts and the other similar
"../server/src/..." links and replace them consistently).

In @.roo/specs/provider-5xx-session-ban/requirements.md:
- Around line 16-21: Update the incorrect relative links in this requirements.md
so they resolve from this file's location: replace the broken
"../server/src/routes/proxy.ts" and "../longcat-session-ban/" references with
correct relative paths that point to server/src/routes/proxy.ts and the
longcat-session-ban spec from this file (e.g., adjust to
"../../server/src/routes/proxy.ts" or the appropriate number of ../ segments);
ensure both the proxy file link and the longcat-session-ban spec link are valid
and navigable in-repo.

In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 194-205: The test sets DB pragmas and a transaction manually but
does not guarantee cleanup on failures; update the test around getDb(), the
PRAGMA/BEGIN/DELETE calls and the addProviderModelsToSkipModels assertion so
that PRAGMA foreign_keys and the transaction are always restored using a
try/finally: wrap the DB mutation and expect call in try, call
db.prepare('ROLLBACK').run() and db.prepare('PRAGMA foreign_keys = ON').run() in
the finally block so rollback and pragma restoration always happen even if the
assertion throws.

In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 840-849: The test setup calls POST /api/keys without admin auth,
which can hide misconfigured key-gating; change the key creation calls in
proxy-tools.test.ts to use the admin-authenticated helper (use the admin key
when calling request for '/api/keys', e.g., the same pattern used elsewhere in
the file) and then add at least one negative assertion that a unified API key
(the v1/unified key created) used against an /api/* route returns 401; locate
the request(...) calls that create keys and the later proxy calls (symbols:
request, '/api/keys', and the unified key variable/name) and update them so
admin auth is used for key setup and add a test asserting 401 when the unified
key is used against an /api/* endpoint.

In `@server/src/routes/proxy.ts`:
- Around line 212-215: The truncation-detection boolean is too broad because it
treats any occurrence of "conflict" as a truncation signal; update the
expression that checks the lower variable in server/src/routes/proxy.ts (the
return statement currently using lower.includes(...)) by removing the
lower.includes('conflict') clause so ordinary conflict errors no longer trigger
a truncation/ban; keep the other specific matchers (e.g., 'truncated',
'context_length_exceeded', 'token_limit', etc.), and run related tests or linter
to ensure no syntax regressions.
- Around line 1243-1257: The cooldown block should not run for client-explicit
model requests; update the condition that starts with "if (preferredModel)" so
it first checks that the preferred model was not explicitly requested by the
client (e.g., a boolean like explicitModelRequested or req.body.model presence).
In practice, add/use the existing flag that indicates an explicit client model
(or create one where routing decides preferredModel) and change the guard to
something like "if (preferredModel && !explicitModelRequested)" before calling
getSessionKey, stickySessionMap, and checking LONGCAT_STICKY_COOLDOWN_MS so
pinned models remain honored for explicit requests (leave preferredModel and
preferredKeyId untouched when explicitModelRequested is true).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1a1395ab-919c-4e24-be41-3b291c63270c

📥 Commits

Reviewing files that changed from the base of the PR and between d9c5a73 and 7dd4189.

📒 Files selected for processing (12)
  • .roo/specs/longcat-session-ban/design.md
  • .roo/specs/longcat-session-ban/requirements.md
  • .roo/specs/longcat-session-ban/tasks.md
  • .roo/specs/longcat-sticky-cooldown/design.md
  • .roo/specs/longcat-sticky-cooldown/requirements.md
  • .roo/specs/longcat-sticky-cooldown/tasks.md
  • .roo/specs/provider-5xx-session-ban/design.md
  • .roo/specs/provider-5xx-session-ban/requirements.md
  • .roo/specs/provider-5xx-session-ban/tasks.md
  • server/src/__tests__/routes/provider-session-ban.test.ts
  • server/src/__tests__/routes/proxy-tools.test.ts
  • server/src/routes/proxy.ts


and eviction.

The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Corrupted requirements block needs rewrite before this spec is reliable.

Line 15 is heavily malformed (merged sentences, broken references, duplicated/garbled phrases like 401/40403, model/keykey, TheisRetryableError, keoes). This is likely to mislead implementation and review decisions.

Please split this into clean FR subsections with one requirement per bullet and valid code references.

🧰 Tools
🪛 LanguageTool

[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...within the same LongCat model, which is exactly what LongCat detects as "multiple API k...

(EXACTLY_PRECISELY)


[grammar] ~15-~15: Ensure spelling is correct
Context: ...404, 408, 409, 422, 500, 502, 503, 504, andrate limit, quota, aborted, timeout`, ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~15-~15: Ensure spelling is correct
Context: ...proxy will retry with a different model/keykey. However, [shouldSkipModelOnRetry()](...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...otation within the same model, which is exactly the behavior LongCat detects as "multip...

(EXACTLY_PRECISELY)


[grammar] ~15-~15: Ensure spelling is correct
Context: ...89-98) only clears the sticky key but **keoes preferredKeyId to undefined — but t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.1)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)


[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-session-ban/requirements.md at line 15, The requirements
block around the LongCat sticky session logic is corrupted and must be
rewritten: replace the malformed paragraph (around the broken merged sentences)
with a clean, numbered FR list where each functional requirement is a single
concise bullet referencing the exact code symbols (clearStickyKey,
shouldSkipModelOnRetry, isRetryableError, routeRequest, router.ts smart-auto
logic, stickySessionMap, banPlatformFromSession) so implementers can find the
relevant code paths; fix all typos/garbled tokens (e.g., "401/403",
"model/keykey", "TheisRetryableError", "keoes") and ensure each FR (FR-1..FR-13)
is atomic, references the function(s) it affects, and maps to the corresponding
behavior (detection patterns, ban mechanism, skipModels changes, TTL/expiry, no
DB schema change, minimal router tweak, tests) so reviewers and engineers can
implement unambiguously.

Comment on lines +24 to +28
| | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat |
| | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path |
| | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions |
| | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |
| 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | ## Out of Scope | - Persistent bans across server restarts ( in-memory only, same as existing sticky sessions) - Changes to the Thompson Sampling algorithm itself - Changes to rate limiting logic - - Changes to the fallback chain ordering in balanced mode - - Client-side UI changes - - Configuration UI for enabling/disabling bans per provider ( hardcoded to LongCat only - Changes to the `OpenAICompatProvider` class | No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix duplicated/malformed modification table entries.

The table has duplicated index 11 and malformed row prefixes (| | 8, | | 11), which breaks traceability from requirements to implementation tasks.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~28-~28: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... Sampling algorithm itself - Changes to rate limiting logic - - Changes to the fallback chain...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-session-ban/requirements.md around lines 24 - 28, The
requirements table contains duplicated/malformed rows (duplicate index 11 and
prefixes like "| | 8"/"| | 11") which breaks traceability; open
.roo/specs/longcat-session-ban/requirements.md, remove the extra duplicate row,
renumber the affected rows sequentially, and normalize the row prefixes so each
entry starts with a single pipe and correct index (e.g., fix the entries
referencing server/src/routes/proxy.ts and server/src/services/router.ts to have
unique indices and consistent "| <index> | [file](file:range) | Edit |
description |" formatting) so all links and indices are unique and well-formed.

Comment on lines +56 to +63
```
1. preferredModel = getStickyModel(...) // line 1199
2. preferredKeyId = getStickyKey(...) // line 1207-1212
3. skipModels from bannedPlatforms // line 1216-1230
4. Clear preferredModel if on banned platform // line 1232-1240
5. ← INSERT COOLDOWN CHECK HERE
6. Retry loop with routeRequest(...) // line 1247+
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

This block should be annotated (e.g., text or typescript) to satisfy markdown linting and improve rendering consistency.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 56-56: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-sticky-cooldown/design.md around lines 56 - 63, The
fenced code block shown (the sequence starting with "1. preferredModel =
getStickyModel(...)" through "6. Retry loop with routeRequest(...)") lacks a
language tag; update the block opening from ``` to a tagged fence like ```text
or ```typescript to satisfy markdown linting and improve rendering while leaving
the block contents unchanged—locate the block in the longcat-sticky-cooldown
design section (the snippet around the numbered steps referencing
getStickyModel, getStickyKey, bannedPlatforms, and routeRequest) and add the
language identifier.

Comment on lines +58 to +59
Current value type at [`proxy.ts:16`](../server/src/routes/proxy.ts:16):
```typescript

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix relative links to server/src/... paths.

From this directory depth, ../server/... likely won’t resolve correctly. Please normalize links so reviewers can navigate directly to referenced code locations.

Also applies to: 83-84, 123-124, 145-146, 165-166

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/provider-5xx-session-ban/design.md around lines 58 - 59, The
markdown links in design.md that reference server code (e.g., links pointing to
proxy.ts via "../server/src/...") are using incorrect relative paths; update
each occurrence to a normalized path that resolves from this document (for
example remove the leading ".." to use "server/src/..." or convert to a
repo-root absolute/blob URL) so reviewers can click through to the actual file
(search for the mentions of proxy.ts and the other similar "../server/src/..."
links and replace them consistently).

Comment on lines +16 to +21
The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts:16). It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, keyId?, bannedPlatforms?, lastUsed }` with a 30-min TTL and 500-entry max.

The existing LongCat session ban ([`longcat-session-ban` spec](../longcat-session-ban/)) added `bannedPlatforms`, `banPlatformFromSession()`, `isSessionBannedFromPlatform()`, `addLongcatModelsToSkipModels()`, and `isTruncatedResponse()`. This spec generalizes that infrastructure: the `bannedPlatforms` set and ban helper functions are reused, the `isTruncatedResponse()` function is retained and generalized to all providers, and the LongCat-specific auth/rate-limit error detection is replaced by general 5xx consecutive failure tracking.

The retry loop in `handleChatCompletion()` currently has LongCat-specific error handling at lines 1383-1402 that bans LongCat on auth errors and rate-limit errors. This is replaced by general 5xx consecutive failure detection that works for any provider. The truncation detection is retained but generalized from LongCat-only to all providers.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct relative code links in the Context section.

The ../server/src/... references are likely invalid from this file’s path. Please update to the correct relative path so links are navigable in-repo.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/provider-5xx-session-ban/requirements.md around lines 16 - 21,
Update the incorrect relative links in this requirements.md so they resolve from
this file's location: replace the broken "../server/src/routes/proxy.ts" and
"../longcat-session-ban/" references with correct relative paths that point to
server/src/routes/proxy.ts and the longcat-session-ban spec from this file
(e.g., adjust to "../../server/src/routes/proxy.ts" or the appropriate number of
../ segments); ensure both the proxy file link and the longcat-session-ban spec
link are valid and navigable in-repo.

Comment on lines +194 to +205
it('handles empty provider model list gracefully', () => {
const db = getDb();
db.prepare('PRAGMA foreign_keys = OFF').run();
db.prepare('BEGIN').run();
db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
const skipModels = new Set<number>();
expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
expect(skipModels.size).toBe(0);
db.prepare('ROLLBACK').run();
db.prepare('PRAGMA foreign_keys = ON').run();
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Always restore DB pragmas/transaction state with finally.

If this test fails before ROLLBACK/PRAGMA ... ON, later tests can run with leaked DB state and become flaky.

Suggested fix
-      db.prepare('PRAGMA foreign_keys = OFF').run();
-      db.prepare('BEGIN').run();
-      db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
-      db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
-      const skipModels = new Set<number>();
-      expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
-      expect(skipModels.size).toBe(0);
-      db.prepare('ROLLBACK').run();
-      db.prepare('PRAGMA foreign_keys = ON').run();
+      db.prepare('PRAGMA foreign_keys = OFF').run();
+      db.prepare('BEGIN').run();
+      try {
+        db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
+        db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
+        const skipModels = new Set<number>();
+        expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
+        expect(skipModels.size).toBe(0);
+      } finally {
+        db.prepare('ROLLBACK').run();
+        db.prepare('PRAGMA foreign_keys = ON').run();
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('handles empty provider model list gracefully', () => {
const db = getDb();
db.prepare('PRAGMA foreign_keys = OFF').run();
db.prepare('BEGIN').run();
db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
const skipModels = new Set<number>();
expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
expect(skipModels.size).toBe(0);
db.prepare('ROLLBACK').run();
db.prepare('PRAGMA foreign_keys = ON').run();
});
it('handles empty provider model list gracefully', () => {
const db = getDb();
db.prepare('PRAGMA foreign_keys = OFF').run();
db.prepare('BEGIN').run();
try {
db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
const skipModels = new Set<number>();
expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
expect(skipModels.size).toBe(0);
} finally {
db.prepare('ROLLBACK').run();
db.prepare('PRAGMA foreign_keys = ON').run();
}
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/provider-session-ban.test.ts` around lines 194 -
205, The test sets DB pragmas and a transaction manually but does not guarantee
cleanup on failures; update the test around getDb(), the PRAGMA/BEGIN/DELETE
calls and the addProviderModelsToSkipModels assertion so that PRAGMA
foreign_keys and the transaction are always restored using a try/finally: wrap
the DB mutation and expect call in try, call db.prepare('ROLLBACK').run() and
db.prepare('PRAGMA foreign_keys = ON').run() in the finally block so rollback
and pragma restoration always happen even if the assertion throws.

Comment on lines +840 to +849
await request(app, 'POST', '/api/keys', {
platform: 'longcat',
key: 'lc_cooldown_active_test',
label: 'cooldown-active-longcat',
});
await request(app, 'POST', '/api/keys', {
platform: 'groq',
key: 'gsk_cooldown_active_test',
label: 'cooldown-active-groq',
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use admin-authenticated /api/keys setup (and lock key-separation behavior).

These new tests post to /api/keys without explicitly using the admin key, so they can pass even if /api/* key-gating is misconfigured. Please route setup through explicit admin auth and add at least one negative check that unified key on /api/* returns 401.

As per coding guidelines, "The admin key must gate /api/* routes; the unified API key must gate /v1/* routes — they must never overlap, and using one against the wrong route returns 401".

Also applies to: 904-913, 973-977, 1038-1042, 1091-1095

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 840 - 849, The
test setup calls POST /api/keys without admin auth, which can hide misconfigured
key-gating; change the key creation calls in proxy-tools.test.ts to use the
admin-authenticated helper (use the admin key when calling request for
'/api/keys', e.g., the same pattern used elsewhere in the file) and then add at
least one negative assertion that a unified API key (the v1/unified key created)
used against an /api/* route returns 401; locate the request(...) calls that
create keys and the later proxy calls (symbols: request, '/api/keys', and the
unified key variable/name) and update them so admin auth is used for key setup
and add a test asserting 401 when the unified key is used against an /api/*
endpoint.

Comment on lines +212 to +215
return lower.includes('truncated') || lower.includes('truncation') ||
lower.includes('context_length_exceeded') || lower.includes('token_limit') ||
lower.includes('maximum length') || lower.includes('response_length_limit') ||
lower.includes('conflict');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove broad conflict truncation matcher to avoid false bans.

Line 215 matches any "conflict" text as truncation. With provider-agnostic ban logic, ordinary conflict errors can now incorrectly ban a platform for the session.

Suggested fix
   return lower.includes('truncated') || lower.includes('truncation') ||
     lower.includes('context_length_exceeded') || lower.includes('token_limit') ||
     lower.includes('maximum length') || lower.includes('response_length_limit') ||
-    lower.includes('conflict');
+    false;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return lower.includes('truncated') || lower.includes('truncation') ||
lower.includes('context_length_exceeded') || lower.includes('token_limit') ||
lower.includes('maximum length') || lower.includes('response_length_limit') ||
lower.includes('conflict');
return lower.includes('truncated') || lower.includes('truncation') ||
lower.includes('context_length_exceeded') || lower.includes('token_limit') ||
lower.includes('maximum length') || lower.includes('response_length_limit') ||
false;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 212 - 215, The truncation-detection
boolean is too broad because it treats any occurrence of "conflict" as a
truncation signal; update the expression that checks the lower variable in
server/src/routes/proxy.ts (the return statement currently using
lower.includes(...)) by removing the lower.includes('conflict') clause so
ordinary conflict errors no longer trigger a truncation/ban; keep the other
specific matchers (e.g., 'truncated', 'context_length_exceeded', 'token_limit',
etc.), and run related tests or linter to ensure no syntax regressions.

Comment on lines +1243 to 1257
// LongCat sticky cooldown: if the sticky model is on LongCat and was used
// within the last 3 minutes, bypass sticky preference for this request only.
// The bandit router picks freely — it may still route to LongCat organically.
if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === 'longcat') {
const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
const ageMs = Date.now() - cooldownEntry.lastUsed;
console.log(`[Sticky] LongCat cooldown active — bypassing sticky preference for session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
preferredModel = undefined;
preferredKeyId = undefined;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Do not apply sticky cooldown to explicit model requests.

At Line 1246, cooldown applies whenever preferredModel exists, including when the client explicitly set model. That can override a pinned model and silently auto-route.

Suggested fix
-  if (preferredModel) {
+  if (preferredModel && !requestedModel) {
     const db = getDb();
     const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
     if (prefRow?.platform === 'longcat') {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1243 - 1257, The cooldown block
should not run for client-explicit model requests; update the condition that
starts with "if (preferredModel)" so it first checks that the preferred model
was not explicitly requested by the client (e.g., a boolean like
explicitModelRequested or req.body.model presence). In practice, add/use the
existing flag that indicates an explicit client model (or create one where
routing decides preferredModel) and change the guard to something like "if
(preferredModel && !explicitModelRequested)" before calling getSessionKey,
stickySessionMap, and checking LONGCAT_STICKY_COOLDOWN_MS so pinned models
remain honored for explicit requests (leave preferredModel and preferredKeyId
untouched when explicitModelRequested is true).

@vi70x3 vi70x3 closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant