Skip to content

Spec/longcat sticky cooldown#8

Merged
vi70x3 merged 10 commits into
mainfrom
spec/longcat-sticky-cooldown
Jun 2, 2026
Merged

Spec/longcat sticky cooldown#8
vi70x3 merged 10 commits into
mainfrom
spec/longcat-sticky-cooldown

Conversation

@vi70x3

@vi70x3 vi70x3 commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Session-scoped provider bans with automatic per-session fallback to alternative models/providers.
    • Sticky-session cooldown that preserves a session's pinned model while preventing other sessions from routing to that provider briefly.
    • Improved truncation detection for streamed and non-streamed responses to trigger safe fallbacks.
  • Tests

    • Added unit and integration tests covering session bans, cooldown behavior, truncation detection, and routing/fallback scenarios.

vi70x3 added 9 commits June 1, 2026 22:00
Implements the LongCat session ban feature in proxy.ts. When LongCat
detects multiple API key use (auth/rate-limit errors) or returns
truncated responses, the platform is banned from the sticky session.
Future requests in that session route to non-LongCat models.

Changes:
- Extend stickySessionMap with bannedPlatforms?: Set<string>
- Add isSessionBannedFromPlatform() to check session bans
- Add banPlatformFromSession() to record platform bans
- Add addLongcatModelsToSkipModels() to skip all LongCat models
- Add isTruncatedResponse() to detect truncation keywords
- Update getStickyKey() to return undefined for banned platforms
- Update setStickyModel() to preserve bannedPlatforms across updates
- Update pre-routing logic to check bans before routing
- Update error handling to ban LongCat on auth/rate-limit/truncation
- Add truncation detection after stream completes
- Add truncation detection in mid-stream error handling
… consecutive failures

- Extend stickySessionMap with consecutiveFailures tracking per provider
- Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures()
- Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels
- Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2)
- Generalize truncation detection to all providers (post-stream + mid-stream)
- Update getStickyKey() to check bannedPlatforms for any platform
- Update pre-routing ban check to be generic (any banned platform)
- Add success path counter reset on both streaming and non-streaming paths
- Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels
- Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases)
- TypeScript compiles cleanly, all 150 tests pass
- Add isBanEligibleStatus() helper restricting to {500,502,503,504}
- Improve mid-stream truncation detection with aggregated error sources
- Pre-routing ban check now skips ALL banned platforms, not just preferredModel's
- Only clear preferredModel when provider is actually banned (not on first 5xx)
- Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)
…or other sessions during 3-min cooldown window
## Verification Results (8 findings)

| # | Finding | Verdict | Action |
|---|---------|---------|--------|
| 1 | Cooldown safeguard logic error | INVALID (by design) | No change needed |
| 2 | Router skipModels bypass | INVALID (already implemented) | No change needed |
| 3 | Dead code: consecutive failure tracking | PARTIALLY VALID | Fixed |
| 4 | Mid-stream retryable error handling | VALID | Fixed |
| 5 | provider-session-ban.test.ts old assertions | INVALID (already updated) | No change needed |
| 6 | proxy-tools.test.ts missing provider assertion | PARTIALLY VALID | Fixed |
| 7 | proxy-tools.test.ts missing admin auth | INVALID (intentional in test env) | No change needed |
| 8 | Documentation/specs issues | VALID | Fixed |

## Changes Made

### `server/src/routes/proxy.ts`
- **Removed dead code**: `recordConsecutiveFailure`, `resetConsecutiveFailures` functions, `consecutiveFailures` field from stickySessionMap type, and their exports. Kept `resetAllConsecutiveFailures` (actively used in production success paths).
- **Added `isRetryableStreamError()` helper**: Detects 429, 401, 403, 408, timeouts, ECONNRESET, ECONNREFUSED, socket hang up, network errors, aborted, rate limit/quota messages, and auth failure messages.
- **Added mid-stream retryable error handling for LongCat**: New condition block after truncation check that bans LongCat via `banPlatformFromSession()` + `addProviderModelsToSkipModels()` on any retryable mid-stream error (not just 5xx/truncation).

### `server/src/__tests__/routes/provider-session-ban.test.ts`
- Removed imports and 7 test cases for deleted `recordConsecutiveFailure` and `resetConsecutiveFailures` functions
- Refactored integration tests to use `banPlatformFromSession()` directly instead of `recordConsecutiveFailure`
- Updated `resetAllConsecutiveFailures` test for no-op behavior
- Added new integration test for `banPlatformFromSession` → `isSessionBannedFromPlatform` lifecycle
- **Result: 36/36 tests pass**

### `server/src/__tests__/routes/proxy-tools.test.ts`
- Added `expect(routedProvider).not.toBe('')` assertion to the "excludes LongCat from bandit routing for non-sticky sessions during cooldown" test
- **Result: 16/16 tests pass**

### Documentation/Specs (8 files)
- `.roo/specs/longcat-session-ban/design.md`: Fixed grammar ("errors are detected"), updated router statement to clarify no router.ts changes needed
- `.roo/specs/longcat-session-ban/requirements.md`: Fixed grammar ("must:"), normalized code spans, deleted orphan fragments, fixed table row numbering
- `.roo/specs/longcat-session-ban/tasks.md`: Removed duplicate "Add diagnostic logging" entry
- `.roo/specs/longcat-sticky-cooldown/requirements.md`: Updated "No Router Changes" to accurately describe the skipModels bypass
- `.roo/specs/longcat-sticky-cooldown/design.md`: Updated skipModels code reference to include `&& entry.model_db_id !== preferredModelDbId`
- `.roo/specs/provider-5xx-session-ban/design.md`: Full rewrite to match current runtime behavior (immediate LongCat exclusion, model-level non-LongCat skip, removed dead code references)
- `.roo/specs/provider-5xx-session-ban/requirements.md`: Updated to describe differentiated LongCat vs non-LongCat behavior
- `.roo/specs/provider-5xx-session-ban/tasks.md`: Updated to reflect shipped behavior, removed dead code task references

## Final Validation
- TypeScript compilation: ✅ Clean (0 errors)
- Full test suite: ✅ 15 test files, 148 tests pass, 0 failures, no regressions
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8aa3ac23-426e-4439-bf7d-67433bc2806d

📥 Commits

Reviewing files that changed from the base of the PR and between e0661b7 and 258f04e.

📒 Files selected for processing (1)
  • server/src/routes/proxy.ts

📝 Walkthrough

Walkthrough

This PR implements per-session provider bans, a LongCat sticky-session cooldown safeguard, truncation detection, and retry/routing refactors: adds ban helpers and skipModels population, preserves ban state in sticky entries, adjusts router to let preferred models bypass skips, and adds unit/integration tests covering bans and cooldown.

Changes

Session Ban & Cooldown Feature

Layer / File(s) Summary
Design & Requirements Specifications
.roo/specs/longcat-session-ban/*, .roo/specs/longcat-sticky-cooldown/*, .roo/specs/provider-5xx-session-ban/*
Designs, requirements, and task lists for LongCat session ban & fallback, LongCat sticky cooldown safeguard, and provider 5xx session ban (provider-agnostic ban/skip behavior).
Session Ban State & Helper Functions
server/src/routes/proxy.ts
Adds bannedPlatforms?: Set<string> to stickySessionMap, LONGCAT_STICKY_COOLDOWN_MS, and exported helpers: isSessionBannedFromPlatform(), banPlatformFromSession(), addProviderModelsToSkipModels(), resetAllConsecutiveFailures(), and isTruncatedResponse(); preserves bannedPlatforms across sticky updates.
Sticky Cooldown Safeguard Implementation
server/src/routes/proxy.ts, server/src/services/router.ts
Implements cooldown by adding LongCat models to skipModels when a sticky LongCat session was used recently, and updates router routeRequest so preferredModelDbId can bypass skipModels.
Sticky Model Routing with Ban & Cooldown Integration
server/src/routes/proxy.ts
Refactors routing preference setup to derive preferredKeyId from sticky entries, build skipModels from per-session bans and cooldown, clear sticky preferences when banned, and initialize retry state with integrated preferences prior to routing.
Error Handling & Retry Refactor with Ban-Eligibility Logic
server/src/routes/proxy.ts
Adds isBanEligibleStatus() and isRetryableStreamError() helpers; applies truncation detection post-stream and mid-stream to trigger provider bans or model skips (LongCat vs non-LongCat differences); reworks mid-stream handling to end SSE gracefully for retryable/truncation cases; restructures non-stream retry behavior for ban-eligible 5xx vs other retryable errors.
Provider Session Ban Test Coverage
server/src/__tests__/routes/provider-session-ban.test.ts
New tests covering ban checking, recording with TTL refresh and field preservation, skip-model population, lifecycle across model changes and TTL expiry, provider isolation, and truncation detection.
Sticky Cooldown Integration Tests
server/src/__tests__/routes/proxy-tools.test.ts
New cooldown tests verifying sticky LongCat routing during cooldown, exclusion of other sessions, cooldown expiry behavior, non-LongCat sticky behavior, ban precedence, and no-sticky-session scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • vi70x3/freellmapi#2: Both PRs modify sticky-session and preferred-key logic in server/src/routes/proxy.ts; this PR adds ban detection and cooldown suppression layered on top of sticky key behavior.

Poem

🐰 Soft paws tap on the session map,
Bans and cooldowns in a tidy wrap.
LongCat bowed out, fallbacks stand tall,
Sticky keys remember, skipModels call.
Hopping onward — routing wins for all!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Spec/longcat sticky cooldown' is only partially related to the changeset. While the PR does include specs for LongCat sticky cooldown, it also contains substantial implementation code and specs for related features (session banning, provider 5xx handling), making the title incomplete and not representative of the full scope. Consider a more comprehensive title such as 'Implement LongCat session ban, sticky cooldown, and provider 5xx handling' or keep it narrower by clarifying this PR includes implementation beyond just the spec.
Docstring Coverage ⚠️ Warning Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec/longcat-sticky-cooldown

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a session ban and fallback mechanism for the LongCat provider, along with a 3-minute cooldown safeguard and differentiated 5xx error handling for LongCat versus non-LongCat providers. It introduces platform-wide bans for LongCat upon 5xx errors or truncation, while non-LongCat providers fallback at the model level. The review feedback highlights an optimization opportunity to cache the platform of the preferred model in a local variable, avoiding redundant synchronous database queries during the session ban and cooldown checks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1191 to 1217
if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow && isSessionBannedFromPlatform(normalizedMessages, routingMode, prefRow.platform)) {
console.log(`[Sticky] skipping preferredModel=${preferredModel} (${prefRow.platform} banned for session)`);
preferredModel = undefined;
preferredKeyId = undefined;
}
}

// LongCat sticky cooldown: if the sticky model is on LongCat and was used
// within the last 3 minutes, exclude LongCat from the bandit router for all
// other sessions. The current sticky session keeps its pinned LongCat route.
// This prevents LongCat from seeing multiple sessions/keys from the same IP.
if (preferredModel) {
const db = getDb();
const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
if (prefRow?.platform === 'longcat') {
const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
const ageMs = Date.now() - cooldownEntry.lastUsed;
addProviderModelsToSkipModels(skipModels, 'longcat');
console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The database is queried twice consecutively to retrieve the platform of the preferredModel (once for the session ban check and once for the LongCat sticky cooldown check). We can optimize this by querying the platform once, storing it in a local variable preferredModelPlatform, and reusing it. This avoids redundant synchronous database queries on every chat completion request.

Additionally, this preferredModelPlatform variable can be reused in the error handling blocks below (e.g., lines 1438, 1539, 1559) to eliminate even more redundant database queries.

  let preferredModelPlatform: string | undefined;
  if (preferredModel) {
    const db = getDb();
    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
    preferredModelPlatform = prefRow?.platform;
    if (preferredModelPlatform && isSessionBannedFromPlatform(normalizedMessages, routingMode, preferredModelPlatform)) {
      console.log(`[Sticky] skipping preferredModel=${preferredModel} (${preferredModelPlatform} banned for session)`);
      preferredModel = undefined;
      preferredKeyId = undefined;
      preferredModelPlatform = undefined;
    }
  }

  // LongCat sticky cooldown: if the sticky model is on LongCat and was used
  // within the last 3 minutes, exclude LongCat from the bandit router for all
  // other sessions. The current sticky session keeps its pinned LongCat route.
  // This prevents LongCat from seeing multiple sessions/keys from the same IP.
  if (preferredModel && preferredModelPlatform === 'longcat') {
    const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
    const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
    if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
      const ageMs = Date.now() - cooldownEntry.lastUsed;
      addProviderModelsToSkipModels(skipModels, 'longcat');
      console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
    }
  }

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (7)
server/src/routes/proxy.ts (3)

139-144: 💤 Low value

Consider removing this dead code.

This function is a no-op but is still exported and called in two places. If consecutive failure tracking was removed, the function and its call sites could be cleaned up to reduce noise.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 139 - 144, The function
resetAllConsecutiveFailures is a no-op and should be removed along with its
calls and any exports/imports: delete the resetAllConsecutiveFailures
declaration in server/src/routes/proxy.ts, remove its export (if exported) and
delete the two call sites that invoke resetAllConsecutiveFailures so callers no
longer reference it; ensure you also remove any imports or references elsewhere
to avoid unused symbol errors and run tests/type-check to confirm no remaining
usages.

1369-1462: ⚖️ Poor tradeoff

Consider extracting duplicated graceful stream termination logic.

The pattern of ending a stream gracefully (write completion event + res.end()) appears three times in this error handling block (lines 1410-1425, 1444-1459, and 1470-1485). While the surrounding context differs, the core termination logic could be extracted into a helper to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1369 - 1462, Extract the duplicated
"end stream gracefully" logic into a small helper (e.g., endStreamGracefully)
that accepts (res, responseStreamContext) and performs the try/catch: if
responseStreamContext call writeResponseStreamEvent(...) with its responseId and
outputText, otherwise write 'data: [DONE]\n\n', then call res.end(); replace the
three inline try/catch blocks in the truncation branch, the LongCat mid-stream
retryable branch, and the earlier truncation handling with a single call to this
helper, keeping the surrounding ban/skip logic and subsequent logRequest/return
unchanged.

1209-1209: 💤 Low value

Minor: sessionKey is already computed at line 1176.

cooldownSessionKey duplicates the sessionKey variable computed earlier.

🔧 Suggested simplification
-      const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
-      const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
+      const cooldownEntry = sessionKey ? stickySessionMap.get(sessionKey) : undefined;

And update the log message to use sessionKey instead of cooldownSessionKey.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` at line 1209, The code creates a duplicate
session key: remove the redundant declaration of cooldownSessionKey that calls
getSessionKey(normalizedMessages, routingMode) since sessionKey was already
computed earlier; replace any uses of cooldownSessionKey (including log
messages) with the existing sessionKey variable to avoid duplication and keep a
single source of truth (ensure functions/methods referencing cooldownSessionKey
now reference sessionKey instead).
server/src/__tests__/routes/proxy-tools.test.ts (1)

895-975: ⚡ Quick win

Test name overstates what is asserted.

The title promises that LongCat is "excluded from bandit routing for non-sticky sessions," but the only behavioral check is expect(routedProvider).not.toBe('') (Line 974), which passes regardless of which provider was chosen — including LongCat. As the inline comment (Lines 920-930) itself notes, no exclusion is enforced for a session without a sticky LongCat entry, so there is nothing concrete being verified about exclusion.

Either rename the test to reflect the real behavior (a non-sticky session routes freely and emits no cooldown log) or strengthen it to assert the intended exclusion. As-is it provides false confidence.

♻️ Suggested rename to match actual assertions
-  it('excludes LongCat from bandit routing for non-sticky sessions during cooldown', async () => {
+  it('routes a non-sticky session freely and emits no cooldown log during another session\'s cooldown', async () => {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 895 - 975, The
test title claims LongCat is excluded from bandit routing but the assertions
only check that a non-sticky session routes freely and that no cooldown log was
emitted; update the test to reflect that by changing the it(...) description
string (the test name) to something like "allows non-sticky sessions to route
freely and does not log LongCat cooldown" (referencing the existing test using
variables/functions: routedProvider, logSpy, stickySessionMap, makeMessages,
request) and adjust the inline comment to match the actual behavior;
alternatively, if you want to enforce the original behavior, replace the final
assertion to assert routedProvider !== 'longcat' (and ensure the mocked
providers include both longcat and groq) instead of routedProvider !== ''.
.roo/specs/longcat-session-ban/requirements.md (1)

13-15: ⚡ Quick win

Simplify complex run-on sentences in Context section.

Lines 13-15 contain extremely long, complex sentences (200+ words each) that are difficult to parse. Breaking these into shorter sentences would significantly improve readability.

♻️ Suggested restructuring

Consider breaking the Context section into shorter, more digestible sentences. For example, line 13 could be split into:

  • Current behavior summary (what happens on auth errors)
  • The problem (why this causes multiple key usage)
  • Reference to shouldSkipModelOnRetry behavior
  • The consequence (LongCat detects this pattern)

This would make the context section much easier to understand for reviewers and implementers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-session-ban/requirements.md around lines 13 - 15, The
Context section contains very long, run-on sentences that hinder clarity;
rewrite lines 13-15 to split each complex sentence into multiple short sentences
and bullets covering: current behavior on auth/truncated errors (call sites:
clearStickyKey(), preferredKeyId reset but preferredModel remains), how
routeRequest() and router.ts smart-auto cause retries with different LongCat
keys, and how shouldSkipModelOnRetry() and isRetryableError() contribute to key
rotation; keep references to clearStickyKey, preferredModel, preferredKeyId,
routeRequest, shouldSkipModelOnRetry, isRetryableError, and the smart-auto logic
so readers can trace the flow, and ensure each new sentence conveys one idea
(behavior, cause, check function, consequence).
.roo/specs/longcat-session-ban/design.md (1)

5-5: ⚡ Quick win

Clarify router modification stance.

The document states "No changes to router.ts are required" (line 5) but later discusses router boost logic considerations (lines 374-380). While the conclusion is correct (no changes needed because skipModels handles it), the initial absolute statement could be confusing when read alongside the later discussion.

💡 Suggested clarification

Consider softening line 5 to:

-The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. **No changes to `router.ts` are required.** All ban detection and session management is implemented in `proxy.ts` and integrated into `handleChatCompletion()`.
+The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. **No functional changes to `router.ts` are required** — the existing `skipModels` mechanism effectively suppresses banned providers. All ban detection and session management is implemented in `proxy.ts` and integrated into `handleChatCompletion()`.

Also applies to: 374-380

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-session-ban/design.md at line 5, Change the absolute
statement "No changes to `router.ts` are required" in the design doc to a
softened clarification that explicitly notes router logic remains unchanged
because the proxy supplies `skipModels` to suppress LongCat routing and the
existing LongCat boost logic is harmless when entries are skipped; update the
initial sentence near `proxy.ts`/`handleChatCompletion()` and the later
discussion around the LongCat boost (the paragraphs referencing `skipModels`,
`router.ts`, and boost logic) so both places state that no code change to
`router.ts` is necessary but explain why (i.e., proxy-controlled `skipModels`
prevents LongCat routing and makes boost a no-op for skipped entries).
.roo/specs/provider-5xx-session-ban/tasks.md (1)

24-30: ⚡ Quick win

Clarify the rationale for asymmetric LongCat vs non-LongCat error handling.

Task 4 specifies that LongCat 5xx errors trigger both banPlatformFromSession() and addProviderModelsToSkipModels() (platform-wide ban + immediate model exclusion), while non-LongCat 5xx errors only skip the individual failed model with no platform ban. This asymmetric treatment is a significant design decision but lacks documented justification.

Without clear rationale, implementers may question whether this difference is intentional or a specification error. Consider adding a brief explanation of why LongCat requires stricter handling (e.g., reliability characteristics, cost considerations, or architectural constraints).

📝 Suggested clarification to add

Add a note after line 30:

  - **Rationale**: LongCat receives platform-wide banning because [insert reason: e.g., "cascading failures are common" or "to minimize cost exposure" or "architectural limitation requires full platform exclusion"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/provider-5xx-session-ban/tasks.md around lines 24 - 30, Add a
short rationale explaining why LongCat 5xx errors get stricter treatment than
non-LongCat errors by documenting that LongCat triggers both
banPlatformFromSession() and addProviderModelsToSkipModels() (platform-wide ban
+ immediate model exclusion) while non-LongCat only uses
skipModels.add(route.modelDbId); insert a one-sentence note after the task list
(after line 30) that briefly explains the reason (e.g., reliability, cost
exposure, or architectural constraints) so readers understand the intent behind
the asymmetric handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/longcat-session-ban/design.md:
- Line 397: Update the flowchart node label that currently reads STREAM[Stream
Comple] to the correct spelling STREAM[Stream Complete]; locate the STREAM node
in the diagram (paired with CHECK{Check streamedText for Truncation}) and
correct the label text to "Stream Complete".
- Around line 127-133: The isTruncatedResponse function is incomplete and
currently ends mid-return; finish its implementation by returning a boolean that
checks the lowercased string for common truncation indicators (e.g.,
'truncated', 'truncation', 'truncated output', 'response truncated',
'truncated...' and similar variants) and ensure it handles non-string inputs
safely (using String(errOrContent) as started) and returns false when input is
falsy; update the function body for isTruncatedResponse to include these checks
and close the function properly.

In @.roo/specs/longcat-session-ban/requirements.md:
- Line 5: Fix the typo in the requirements text where the HTTP status codes are
incorrect: replace the string "40403" with the correct "401/403" in the sentence
starting with "When a LongCat sticky session encounters an error — whether it's
an auth failure..." so the auth failure codes read "401/403". Ensure only the
status code token is updated and the rest of the sentence remains unchanged.

In @.roo/specs/provider-5xx-session-ban/tasks.md:
- Around line 32-42: The truncation handling is inconsistent: when
isTruncatedResponse() detects truncation you call
banPlatformFromSession(route.platform) but only for LongCat also call
addProviderModelsToSkipModels(), allowing other providers' models to still be
retried; update the truncation paths (post-stream and mid-stream) so that
whenever isTruncatedResponse(...) returns true you both call
banPlatformFromSession(route.platform) and
addProviderModelsToSkipModels(route.platform) (keeping LongCat behavior intact),
and ensure skipModels is consulted before retries (e.g., where getStickyKey()
and retry selection occur) so banned platform models are immediately excluded
from the current request retry loop.

In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 192-203: The test "handles empty provider model list gracefully"
opens a manual transaction and toggles PRAGMA foreign_keys but doesn't guarantee
cleanup on error; wrap the manual transaction and PRAGMA toggling in a
try/finally so that db.prepare('ROLLBACK').run() and db.prepare('PRAGMA
foreign_keys = ON').run() always execute, ensuring no open transaction or
foreign_keys=OFF leaks; modify the test around getDb(), the BEGIN/DELETE calls
and the expects so that the BEGIN and PRAGMA OFF are paired with a finally block
that always rolls back and restores PRAGMA foreign_keys regardless of
addProviderModelsToSkipModels or expect failures.

In `@server/src/routes/proxy.ts`:
- Around line 159-162: The check that detects truncation errors currently treats
any message containing "conflict" as a truncation indicator (the
lower.includes('conflict') term), which is too broad and causes false positives;
remove that clause or replace it with a much more specific pattern (e.g., only
match a documented/provider-specific truncation phrase that includes "conflict"
or a combined check like lower.includes('conflict') && (lower.includes('trunc')
|| lower.includes('length'))) so the truncation-detection predicate no longer
flags generic "conflict" errors; update the truncation helper in proxy.ts where
lower.includes('conflict') appears and adjust/add unit tests accordingly.

---

Nitpick comments:
In @.roo/specs/longcat-session-ban/design.md:
- Line 5: Change the absolute statement "No changes to `router.ts` are required"
in the design doc to a softened clarification that explicitly notes router logic
remains unchanged because the proxy supplies `skipModels` to suppress LongCat
routing and the existing LongCat boost logic is harmless when entries are
skipped; update the initial sentence near `proxy.ts`/`handleChatCompletion()`
and the later discussion around the LongCat boost (the paragraphs referencing
`skipModels`, `router.ts`, and boost logic) so both places state that no code
change to `router.ts` is necessary but explain why (i.e., proxy-controlled
`skipModels` prevents LongCat routing and makes boost a no-op for skipped
entries).

In @.roo/specs/longcat-session-ban/requirements.md:
- Around line 13-15: The Context section contains very long, run-on sentences
that hinder clarity; rewrite lines 13-15 to split each complex sentence into
multiple short sentences and bullets covering: current behavior on
auth/truncated errors (call sites: clearStickyKey(), preferredKeyId reset but
preferredModel remains), how routeRequest() and router.ts smart-auto cause
retries with different LongCat keys, and how shouldSkipModelOnRetry() and
isRetryableError() contribute to key rotation; keep references to
clearStickyKey, preferredModel, preferredKeyId, routeRequest,
shouldSkipModelOnRetry, isRetryableError, and the smart-auto logic so readers
can trace the flow, and ensure each new sentence conveys one idea (behavior,
cause, check function, consequence).

In @.roo/specs/provider-5xx-session-ban/tasks.md:
- Around line 24-30: Add a short rationale explaining why LongCat 5xx errors get
stricter treatment than non-LongCat errors by documenting that LongCat triggers
both banPlatformFromSession() and addProviderModelsToSkipModels() (platform-wide
ban + immediate model exclusion) while non-LongCat only uses
skipModels.add(route.modelDbId); insert a one-sentence note after the task list
(after line 30) that briefly explains the reason (e.g., reliability, cost
exposure, or architectural constraints) so readers understand the intent behind
the asymmetric handling.

In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 895-975: The test title claims LongCat is excluded from bandit
routing but the assertions only check that a non-sticky session routes freely
and that no cooldown log was emitted; update the test to reflect that by
changing the it(...) description string (the test name) to something like
"allows non-sticky sessions to route freely and does not log LongCat cooldown"
(referencing the existing test using variables/functions: routedProvider,
logSpy, stickySessionMap, makeMessages, request) and adjust the inline comment
to match the actual behavior; alternatively, if you want to enforce the original
behavior, replace the final assertion to assert routedProvider !== 'longcat'
(and ensure the mocked providers include both longcat and groq) instead of
routedProvider !== ''.

In `@server/src/routes/proxy.ts`:
- Around line 139-144: The function resetAllConsecutiveFailures is a no-op and
should be removed along with its calls and any exports/imports: delete the
resetAllConsecutiveFailures declaration in server/src/routes/proxy.ts, remove
its export (if exported) and delete the two call sites that invoke
resetAllConsecutiveFailures so callers no longer reference it; ensure you also
remove any imports or references elsewhere to avoid unused symbol errors and run
tests/type-check to confirm no remaining usages.
- Around line 1369-1462: Extract the duplicated "end stream gracefully" logic
into a small helper (e.g., endStreamGracefully) that accepts (res,
responseStreamContext) and performs the try/catch: if responseStreamContext call
writeResponseStreamEvent(...) with its responseId and outputText, otherwise
write 'data: [DONE]\n\n', then call res.end(); replace the three inline
try/catch blocks in the truncation branch, the LongCat mid-stream retryable
branch, and the earlier truncation handling with a single call to this helper,
keeping the surrounding ban/skip logic and subsequent logRequest/return
unchanged.
- Line 1209: The code creates a duplicate session key: remove the redundant
declaration of cooldownSessionKey that calls getSessionKey(normalizedMessages,
routingMode) since sessionKey was already computed earlier; replace any uses of
cooldownSessionKey (including log messages) with the existing sessionKey
variable to avoid duplication and keep a single source of truth (ensure
functions/methods referencing cooldownSessionKey now reference sessionKey
instead).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e661720e-e722-46b2-b5f5-70004638678f

📥 Commits

Reviewing files that changed from the base of the PR and between d9c5a73 and e0661b7.

📒 Files selected for processing (13)
  • .roo/specs/longcat-session-ban/design.md
  • .roo/specs/longcat-session-ban/requirements.md
  • .roo/specs/longcat-session-ban/tasks.md
  • .roo/specs/longcat-sticky-cooldown/design.md
  • .roo/specs/longcat-sticky-cooldown/requirements.md
  • .roo/specs/longcat-sticky-cooldown/tasks.md
  • .roo/specs/provider-5xx-session-ban/design.md
  • .roo/specs/provider-5xx-session-ban/requirements.md
  • .roo/specs/provider-5xx-session-ban/tasks.md
  • server/src/__tests__/routes/provider-session-ban.test.ts
  • server/src/__tests__/routes/proxy-tools.test.ts
  • server/src/routes/proxy.ts
  • server/src/services/router.ts

Comment on lines +127 to +133
function isTruncatedResponse(errOrContent: any): boolean {
if (!errOrContent) return false;
const str = String(errOrContent).toLowerCase();
// Truncation indicators from LongCat and similar providers
return str.includes('truncated')
|| str.includes('truncation')
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Incomplete code block in function definition.

The isTruncatedResponse() function definition is truncated mid-line at line 133. The function body is incomplete, showing only the start of the return statement.

📝 Suggested completion

Based on the requirements and usage elsewhere in the document, the function should likely be:

 function isTruncatedResponse(errOrContent: any): boolean {
   if (!errOrContent) return false;
   const str = String(errOrContent).toLowerCase();
   // Truncation indicators from LongCat and similar providers
   return str.includes('truncated')
     || str.includes('truncation')
+    || str.includes('conflict');
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function isTruncatedResponse(errOrContent: any): boolean {
if (!errOrContent) return false;
const str = String(errOrContent).toLowerCase();
// Truncation indicators from LongCat and similar providers
return str.includes('truncated')
|| str.includes('truncation')
```
function isTruncatedResponse(errOrContent: any): boolean {
if (!errOrContent) return false;
const str = String(errOrContent).toLowerCase();
// Truncation indicators from LongCat and similar providers
return str.includes('truncated')
|| str.includes('truncation')
|| str.includes('conflict');
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-session-ban/design.md around lines 127 - 133, The
isTruncatedResponse function is incomplete and currently ends mid-return; finish
its implementation by returning a boolean that checks the lowercased string for
common truncation indicators (e.g., 'truncated', 'truncation', 'truncated
output', 'response truncated', 'truncated...' and similar variants) and ensure
it handles non-string inputs safely (using String(errOrContent) as started) and
returns false when input is falsy; update the function body for
isTruncatedResponse to include these checks and close the function properly.

BAN3 --> RETRY

subgraph Stream Detection
STREAM[Stream Comple] --> CHECK{Check streamedText for Truncation}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in flowchart label.

Line 397 contains "Stream Comple" which should be "Stream Complete".

📝 Proposed fix
-        STREAM[Stream Comple] --> CHECK{Check streamedText for Truncation}
+        STREAM[Stream Complete] --> CHECK{Check streamedText for Truncation}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
STREAM[Stream Comple] --> CHECK{Check streamedText for Truncation}
STREAM[Stream Complete] --> CHECK{Check streamedText for Truncation}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-session-ban/design.md at line 397, Update the flowchart
node label that currently reads STREAM[Stream Comple] to the correct spelling
STREAM[Stream Complete]; locate the STREAM node in the diagram (paired with
CHECK{Check streamedText for Truncation}) and correct the label text to "Stream
Complete".


## Overview

When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in HTTP status codes.

Line 5 contains "40403" which should be "401/403".

📝 Proposed fix
-When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must:
+When a LongCat sticky session encounters an error — whether it's an auth failure (401/403) or a "truncated" / "conflict" error from the provider — the system must:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must:
When a LongCat sticky session encounters an error — whether it's an auth failure (401/403) or a "truncated" / "conflict" error from the provider — the system must:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-session-ban/requirements.md at line 5, Fix the typo in
the requirements text where the HTTP status codes are incorrect: replace the
string "40403" with the correct "401/403" in the sentence starting with "When a
LongCat sticky session encounters an error — whether it's an auth failure..." so
the auth failure codes read "401/403". Ensure only the status code token is
updated and the rest of the sentence remains unchanged.

Comment on lines +32 to +42
- [x] 5. Implement generalized truncation detection for all providers
- In the post-stream truncation check:
- Remove the `route.platform === 'longcat'` guard
- Apply `isTruncatedResponse()` check to any provider
- Call `banPlatformFromSession()` for the detected provider
- For LongCat, also call `addProviderModelsToSkipModels()`
- Update log message to use `route.platform` instead of hardcoded `'longcat'`
- In the mid-stream error handling:
- Apply `isTruncatedResponse()` to partial stream content for any provider
- If truncated: call `banPlatformFromSession()` for the detected provider
- Keep the existing mid-stream error SSE event behavior

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff

Inconsistent truncation handling may allow banned platforms to retry.

Task 5 specifies that truncation from ANY provider triggers banPlatformFromSession() (line 36), but only LongCat additionally calls addProviderModelsToSkipModels() (line 37). This creates a gap:

  1. Non-LongCat truncation bans the platform from the session (affects future requests)
  2. BUT doesn't add its models to skipModels for the current retry loop
  3. Unless the session has a sticky model from that platform (triggering Task 3's pre-routing check), other models from the banned platform remain available for retry in the current request

This means a non-LongCat platform that just returned a truncated response could be retried with a different model in the same request, which seems inconsistent with the intent of platform banning.

Recommendation: Either call addProviderModelsToSkipModels() for ALL providers when truncation is detected (consistent with the platform ban), or document why deferred exclusion (via future getStickyKey() checks) is the intended behavior for non-LongCat platforms.

♻️ Option 1: Consistent immediate exclusion (recommended)
  - In the post-stream truncation check:
    - Remove the `route.platform === 'longcat'` guard
    - Apply `isTruncatedResponse()` check to any provider
    - Call `banPlatformFromSession()` for the detected provider
-   - For LongCat, also call `addProviderModelsToSkipModels()`
+   - Call `addProviderModelsToSkipModels()` for the detected provider (immediate exclusion from current retry)
    - Update log message to use `route.platform` instead of hardcoded `'longcat'`
  - In the mid-stream error handling:
    - Apply `isTruncatedResponse()` to partial stream content for any provider
-   - If truncated: call `banPlatformFromSession()` for the detected provider
+   - If truncated: call `banPlatformFromSession()` and `addProviderModelsToSkipModels()` for the detected provider
    - Keep the existing mid-stream error SSE event behavior
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/provider-5xx-session-ban/tasks.md around lines 32 - 42, The
truncation handling is inconsistent: when isTruncatedResponse() detects
truncation you call banPlatformFromSession(route.platform) but only for LongCat
also call addProviderModelsToSkipModels(), allowing other providers' models to
still be retried; update the truncation paths (post-stream and mid-stream) so
that whenever isTruncatedResponse(...) returns true you both call
banPlatformFromSession(route.platform) and
addProviderModelsToSkipModels(route.platform) (keeping LongCat behavior intact),
and ensure skipModels is consulted before retries (e.g., where getStickyKey()
and retry selection occur) so banned platform models are immediately excluded
from the current request retry loop.

Comment on lines +192 to +203
it('handles empty provider model list gracefully', () => {
const db = getDb();
db.prepare('PRAGMA foreign_keys = OFF').run();
db.prepare('BEGIN').run();
db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
const skipModels = new Set<number>();
expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
expect(skipModels.size).toBe(0);
db.prepare('ROLLBACK').run();
db.prepare('PRAGMA foreign_keys = ON').run();
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard the manual transaction with try/finally to avoid leaking DB state across tests.

If addProviderModelsToSkipModels throws or either expect fails, control exits before ROLLBACK and PRAGMA foreign_keys = ON run. The transaction stays open and foreign_keys remains OFF, and the in-transaction DELETE FROM models WHERE platform = 'longcat' becomes visible to subsequent tests — causing cascading failures in the integration suites that rely on a LongCat model row existing.

🛡️ Proposed fix
     const db = getDb();
     db.prepare('PRAGMA foreign_keys = OFF').run();
     db.prepare('BEGIN').run();
-    db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
-    db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
-    const skipModels = new Set<number>();
-    expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
-    expect(skipModels.size).toBe(0);
-    db.prepare('ROLLBACK').run();
-    db.prepare('PRAGMA foreign_keys = ON').run();
+    try {
+      db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
+      db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
+      const skipModels = new Set<number>();
+      expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
+      expect(skipModels.size).toBe(0);
+    } finally {
+      db.prepare('ROLLBACK').run();
+      db.prepare('PRAGMA foreign_keys = ON').run();
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('handles empty provider model list gracefully', () => {
const db = getDb();
db.prepare('PRAGMA foreign_keys = OFF').run();
db.prepare('BEGIN').run();
db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
const skipModels = new Set<number>();
expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
expect(skipModels.size).toBe(0);
db.prepare('ROLLBACK').run();
db.prepare('PRAGMA foreign_keys = ON').run();
});
it('handles empty provider model list gracefully', () => {
const db = getDb();
db.prepare('PRAGMA foreign_keys = OFF').run();
db.prepare('BEGIN').run();
try {
db.prepare("DELETE FROM api_keys WHERE platform = 'longcat'").run();
db.prepare("DELETE FROM models WHERE platform = 'longcat'").run();
const skipModels = new Set<number>();
expect(() => addProviderModelsToSkipModels(skipModels, 'longcat')).not.toThrow();
expect(skipModels.size).toBe(0);
} finally {
db.prepare('ROLLBACK').run();
db.prepare('PRAGMA foreign_keys = ON').run();
}
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/provider-session-ban.test.ts` around lines 192 -
203, The test "handles empty provider model list gracefully" opens a manual
transaction and toggles PRAGMA foreign_keys but doesn't guarantee cleanup on
error; wrap the manual transaction and PRAGMA toggling in a try/finally so that
db.prepare('ROLLBACK').run() and db.prepare('PRAGMA foreign_keys = ON').run()
always execute, ensuring no open transaction or foreign_keys=OFF leaks; modify
the test around getDb(), the BEGIN/DELETE calls and the expects so that the
BEGIN and PRAGMA OFF are paired with a finally block that always rolls back and
restores PRAGMA foreign_keys regardless of addProviderModelsToSkipModels or
expect failures.

Comment thread server/src/routes/proxy.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant