-
Notifications
You must be signed in to change notification settings - Fork 0
refactor(proxy): LongCat immediate provider exclusion, non-LongCat model-only skip on 5xx #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
53b509e
27a9f2f
8772ba4
7f5e0b2
f212b00
7dd4189
5f6f0b0
95722bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,28 @@ | ||||||||||||||||||
| # Requirements: LongCat Session Ban & Fallback | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Overview | ||||||||||||||||||
|
|
||||||||||||||||||
| When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to: | ||||||||||||||||||
|
|
||||||||||||||||||
| 1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL) | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Context | ||||||||||||||||||
|
|
||||||||||||||||||
| The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:13-112). and It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, + optional `keyId` + `lastUsed` } with a 30-min TTL and 500-entry max. | ||||||||||||||||||
|
|
||||||||||||||||||
| and eviction. | ||||||||||||||||||
|
|
||||||||||||||||||
| The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description | | ||||||||||||||||||
|
Comment on lines
+11
to
+15
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clean up Context section formatting and grammar. The Context section contains multiple formatting errors:
🧰 Tools🪛 LanguageTool[style] ~15-~15: Consider an alternative for the overused word “exactly”. (EXACTLY_PRECISELY) [grammar] ~15-~15: Ensure spelling is correct (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) [grammar] ~15-~15: Ensure spelling is correct (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) [style] ~15-~15: Consider an alternative for the overused word “exactly”. (EXACTLY_PRECISELY) [grammar] ~15-~15: Ensure spelling is correct (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) 🪛 markdownlint-cli2 (0.22.1)[warning] 11-11: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) [warning] 15-15: Spaces inside code span elements (MD038, no-space-in-code) 🤖 Prompt for AI Agents |
||||||||||||||||||
| |---|---|---|---| | ||||||||||||||||||
| | | 1 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:16) | Edit | Extend `stickySessionMap` value type to include `bannedPlatforms` | | ||||||||||||||||||
| | | 2 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:34-52) | Edit | Add `isSessionBannedFromPlatform()` helper function | | ||||||||||||||||||
| | | 3 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:54-79) | Edit | Update `getStickyKey()` to check bans before returning key | | ||||||||||||||||||
| | | 4 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:81-87) | Edit | Update `clearStickyModel()` to also clear `bannedPlatforms` | | ||||||||||||||||||
| | | 5 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:100-112) | Edit | Update `setStickyModel()` to also store `bannedPlatforms` | | ||||||||||||||||||
| | | 6 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1039-1053) | Edit | Add `banPlatformFromSession()` function | | ||||||||||||||||||
| | | 7 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Add `addLongcatModelsToSkipModels()` helper | | ||||||||||||||||||
| | | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat | | ||||||||||||||||||
| | | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path | | ||||||||||||||||||
| | | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions | | ||||||||||||||||||
| | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | | ||||||||||||||||||
|
Comment on lines
+24
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix duplicate row numbering in files table. The table has two rows numbered as row 11 (lines 27-28). This creates ambiguity in the modification checklist. 📋 Suggested fixRenumber the second occurrence of row 11 to row 12: | | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions |
- | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |
+ | | 12 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
| | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | ## Out of Scope | - Persistent bans across server restarts ( in-memory only, same as existing sticky sessions) - Changes to the Thompson Sampling algorithm itself - Changes to rate limiting logic - - Changes to the fallback chain ordering in balanced mode - - Client-side UI changes - - Configuration UI for enabling/disabling bans per provider ( hardcoded to LongCat only - Changes to the `OpenAICompatProvider` class | | ||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| # Tasks: LongCat Session Ban & Fallback | ||
|
|
||
| ## Implementation Steps | ||
|
|
||
| - [x] 1. Extend `stickySessionMap` value type in `proxy.ts` | ||
| - Edit line 16: add `bannedPlatforms?: Set<string>` to the map value type | ||
| - This is the foundational type change — all other changes depend on it | ||
|
|
||
| - [x] 2. Add `isSessionBannedFromPlatform()` function in `proxy.ts` | ||
| - Add after `getStickyKey()` (after line 79) | ||
| - Parameters: `messages`, `routingMode`, `platform` | ||
| - Returns `boolean` — checks if the session's `bannedPlatforms` set contains the given platform | ||
| - Includes TTL check (expired sessions have no bans) | ||
| - Add diagnostic logging | ||
|
|
||
| - Add diagnostic logging | ||
|
Comment on lines
+15
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove duplicate instruction. Lines 15-16 both state "Add diagnostic logging" for task 2. One should be removed. 🔧 Proposed fix - Includes TTL check (expired sessions have no bans)
- Add diagnostic logging
-
- - Add diagnostic logging🤖 Prompt for AI Agents |
||
|
|
||
| - [x] 3. Add `banPlatformFromSession()` function in `proxy.ts` | ||
| - Add after `isSessionBannedFromPlatform()` | ||
| - Parameters: `messages`, `routingMode`, `platform` | ||
| - Creates or adds to `bannedPlatforms` set in the sticky session entry | ||
| - Refreshes `lastUsed` TTL so the ban persists | ||
| - Add diagnostic logging with banned platforms list | ||
|
|
||
| - [x] 4. Add `addLongcatModelsToSkipModels()` helper in `proxy.ts` | ||
| - Add after `banPlatformFromSession()` | ||
| - Queries DB for all enabled LongCat model IDs: `SELECT id FROM models WHERE platform = 'longcat' AND enabled = 1` | ||
| - Adds each to the `skipModels` set | ||
| - Add diagnostic logging with count and IDs | ||
|
|
||
| - [x] 5. Add `isTruncatedResponse()` function in `proxy.ts` | ||
| - Add after `addLongcatModelsToSkipModels()` | ||
| - Parameters: `errOrContent: any` | ||
| - Returns `boolean` — checks for truncation keywords in stringified input | ||
| - Keywords: 'truncated', 'truncation' | ||
| - Case-insensitive matching | ||
|
|
||
| - [x] 6. Update `getStickyKey()` to check session bans | ||
| - In `getStickyKey()` (lines 54-79), after TTL check, add: | ||
| - Look up the sticky model's platform via DB query | ||
| - If the model's platform is in `entry.bannedPlatforms`, return `undefined` | ||
| - Add diagnostic logging for skipped sticky keys due to bans | ||
|
|
||
| - [x] 7. Update `setStickyModel()` to preserve `bannedPlatforms` | ||
| - In `setStickyModel()` (lines 100-112), before setting the new entry: | ||
| - Get existing entry from `stickySessionMap` | ||
| - Preserve `bannedPlatforms` from existing entry (if any) | ||
| - Include `bannedPlatforms` in the new map entry | ||
| - Update log message to include banned platforms count when present | ||
|
|
||
| - [x] 8. Update pre-routing logic in `handleChatCompletion()` | ||
| - After determining `preferredModel` and `preferredKeyId` (around lines 1035-1053): | ||
| - Check `isSessionBannedFromPlatform(normalizedMessages, routingMode, 'longcat')` | ||
| - If banned: | ||
| - Call `addLongcatModelsToSkipModels(skipModels)` | ||
| - If `preferredModel` points to a LongCat model, set `preferredModel = undefined` and `preferredKeyId = undefined` | ||
| - Add diagnostic logging | ||
| - Move `skipModels` initialization earlier (before the ban check) or create it at the ban check point | ||
| - Note: `skipModels` is currently initialized at line 1058 — need to ensure it exists before the ban check | ||
|
|
||
| - [x] 9. Update error handling in retry loop for LongCat-specific bans | ||
| - In the `catch (err)` block (around lines 1245-1282): | ||
| - After logging the request error, check if `route.platform === 'longcat'` | ||
| - If LongCat + auth error: call `banPlatformFromSession()`, `addLongcatModelsToSkipModels()`, clear `preferredKeyId` | ||
| - If LongCat + rate-limit error: call `banPlatformFromSession()`, `addLongcatModelsToSkipModels()`, clear `preferredKeyId` | ||
| - If LongCat + truncated response: call `banPlatformFromSession()`, `addLongcatModelsToSkipModels()`, clear `preferredKeyId` | ||
| - Keep existing auth error handling for non-LongCat (`clearStickyKey()` + `preferredKeyId = undefined`) | ||
| - Keep existing `isRetryableError()` and `shouldSkipModelOnRetry()` logic | ||
| - Keep existing non-retryable error handling (`clearStickyModel()`) | ||
|
|
||
| - [x] 10. Add truncated response detection after stream completes | ||
| - After the streaming `for await` loop completes (around line 1133-1147): | ||
| - Check `route.platform === 'longcat'` and `isTruncatedResponse(streamedText)` | ||
| - If detected: call `banPlatformFromSession(normalizedMessages, routingMode, 'longcat')` | ||
| - Add diagnostic logging | ||
| - Note: the stream has already been sent to the client — no retry within the same request | ||
| - Future requests in this session will route to non-LongCat models | ||
| - For Responses API streaming: check `responseStreamContext.outputText` instead of `streamedText` | ||
|
|
||
| - [x] 11. Add truncation detection in mid-stream error handling | ||
| - In the `catch (streamErr)` block for mid-stream errors (around lines 1185-1214): | ||
| - Check `route.platform === 'longcat'` and `isTruncatedResponse(streamErr.message)` | ||
| - If detected: | ||
| - Call `banPlatformFromSession(normalizedMessages, routingMode, 'longcat')` | ||
| - End the stream gracefully (send completion event, not error event) | ||
| - Return — client receives truncated response as-is | ||
| - If not truncation: keep existing mid-stream error behavior (send error SSE event + return) | ||
|
|
||
| - Add diagnostic logging for both paths | ||
|
|
||
| - [x] 12. Verify TypeScript compilation | ||
| - Run `npx tsc --noEmit` in the `server/` directory | ||
| - Ensure no type errors from the new `bannedPlatforms` field or new functions | ||
|
|
||
| - [ ] 13. Run existing tests | ||
| - Run `npm test` in the `server/` directory | ||
| - Verify no regressions in router tests, proxy tests, or sticky session behavior | ||
|
|
||
| - [ ] 14. Add new unit tests for ban functionality | ||
| - Test `isSessionBannedFromPlatform()` — no session, expired session, banned session, non-banned session | ||
| - Test `banPlatformFromSession()` — adds platform to banned set, preserves existing bans | ||
| - Test `isTruncatedResponse()` — various truncation keywords, non-truncation strings | ||
| - Test `addLongcatModelsToSkipModels()` — adds LongCat model IDs to skip set | ||
| - Test `setStickyModel()` preserves `bannedPlatforms` when updating sticky model | ||
| - Test `getStickyKey()` returns `undefined` when session is banned from model's platform | ||
|
|
||
| - [ ] 15. Manual integration testing | ||
| - Add a LongCat API key via the Keys page | ||
| - Send a chat completion request and verify it routes through LongCat | ||
| - Send a second request with same first user message — verify sticky key is used | ||
| - Simulate auth error (disable key mid-session) — verify LongCat is banned and fallback occurs | ||
| - Send a third request — verify it routes to non-LongCat model ( LongCat is still banned) | ||
| - Wait for session TTL to expire (30 min) — verify LongCat is no longer banned | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammar and formatting in Overview section.
Multiple issues:
✏️ Proposed fixes
🤖 Prompt for AI Agents