animaios · vi70x3 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/.roo/specs/disable-sticky-on-auto/design.md b/.roo/specs/disable-sticky-on-auto/design.md
@@ -0,0 +1,97 @@
+# Design: Disable Sticky Threads on Auto Endpoint
+
+## Design Approach
+
+**Single-point guard in `getSessionKey()`** — modify [`getSessionKey()`](server/src/routes/proxy.ts:25) to return an empty string when `routingMode === 'balanced'`. This cascades through all sticky session functions because every one of them calls `getSessionKey()` first and returns early when the key is empty.
+
+## Why This Approach
+
+Every sticky session function in [`proxy.ts`](server/src/routes/proxy.ts) follows the same pattern:
+
+```
+function stickyOp(messages, routingMode, ...) {
+  const key = getSessionKey(messages, routingMode);
+  if (!key) return <no-op value>;   // undefined, false, or early return
+  ...operate on stickySessionMap using key...
+}
+```
+
+By making `getSessionKey()` return `''` for balanced mode, all downstream functions automatically become no-ops:
+
+| Function | No-op return when key is empty | Effect for balanced mode |
+|---|---|---|
+| [`getStickyModel()`](server/src/routes/proxy.ts:35) | `undefined` | No model pinning → free routing every request |
+| [`getStickyKey()`](server/src/routes/proxy.ts:55) | `undefined` | No key pinning → round-robin key selection |
+| [`setStickyModel()`](server/src/routes/proxy.ts:199) | early return | No sticky entries ever created |
+| [`clearStickyModel()`](server/src/routes/proxy.ts:180) | early return | No-op — nothing to clear |
+| [`clearStickyKey()`](server/src/routes/proxy.ts:188) | early return | No-op — nothing to clear |
+| [`isSessionBannedFromPlatform()`](server/src/routes/proxy.ts:92) | `false` | No platform bans checked |
+| [`banPlatformFromSession()`](server/src/routes/proxy.ts:108) | early return | No platform bans recorded |
+
+Direct `stickySessionMap` accesses in [`handleChatCompletion()`](server/src/routes/proxy.ts:1057) also use `getSessionKey()` and guard on the result being truthy, so they are automatically skipped:
+
+- **Session ban skipModels** (lines 1176–1189): `if (sessionKey)` guard → skipped when key is `''`
+- **LongCat sticky cooldown** (lines 1205–1217): `cooldownSessionKey ? ... : undefined` → skipped when key is `''`
+
+## Changes Required
+
+### 1. Modify `getSessionKey()` in `server/src/routes/proxy.ts`
+
+```typescript
+function getSessionKey(messages: ChatMessage[], routingMode: RoutingMode): string {
+  // Sticky sessions only apply to smart/auto-smart routing.
+  // Balanced/auto uses free routing on every request.
+  if (routingMode === 'balanced') return '';
+
+  const firstUser = messages.find(m => m.role === 'user');
+  if (!firstUser || typeof firstUser.content !== 'string') return '';
+  return crypto.createHash('sha1').update(`${routingMode}:${firstUser.content}`).digest('hex');
+}
+```
+
+This is the **only code change** needed. All other functions and call sites remain untouched.
+
+### 2. Update tests in `server/src/__tests__/routes/provider-session-ban.test.ts`
+
+Add test cases verifying that balanced mode skips sticky operations:
+- `getStickyModel()` returns `undefined` for balanced mode even when a sticky entry exists for the same messages under smart mode
+- `isSessionBannedFromPlatform()` returns `false` for balanced mode
+- `banPlatformFromSession()` does not create entries for balanced mode
+- `setStickyModel()` does not create entries for balanced mode
+
+### 3. No changes to `server/src/services/router.ts`
+
+The router itself does not interact with sticky sessions — it only receives `preferredModel` and `preferredKeyId` as optional parameters. When those are `undefined` (which they will be for balanced mode), the router already does free routing.
+
+## Flow Diagram
+
+```mermaid
+flowchart TD
+    A[Request arrives] --> B{model field?}
+    B -->|Explicit model| C[Pin to requested model]
+    B -->|No model field| D{routingMode?}
+    D -->|balanced| E[getSessionKey returns empty string]
+    D -->|smart| F[getSessionKey returns hash]
+    E --> G[All sticky functions return no-op]
+    G --> H[Free Thompson Sampling routing]
+    F --> I[Sticky model/key lookup]
+    I --> J{Sticky hit?}
+    J -->|Yes| K[Pin to sticky model + key]
+    J -->|No| H
+    K --> L[Route to pinned model]
+    H --> M[Route to best sampled model]
+    L --> N[On success: setStickyModel saves for smart]
+    M --> O[On success: setStickyModel is no-op for balanced]
+```
+
+## Edge Cases
+
+- **Mode switch mid-conversation**: Session keys include `routingMode` in the hash, so balanced and smart entries for the same messages are distinct. No cross-contamination.
+- **`stickySessionMap` size cleanup**: Since balanced mode never creates entries, the map only grows from smart-mode sessions. Existing eviction logic remains sufficient.
+- **`responseSessionMap`**: Separate from sticky sessions — used for the Responses API `previous_response_id` feature. Unaffected by this change.
+- **Per-request `skipModels`/`skipKeys`**: These are intra-request retry state, not sticky state. They remain active for both modes.
+
+## Risks
+
+- **Low risk**: The change is a single early-return in one function. All downstream behavior is already designed to handle empty keys gracefully.
+- **No backward compatibility concern**: Existing smart-mode sessions continue working identically. Balanced-mode sessions simply stop being created — there is no data to migrate or lose.
diff --git a/.roo/specs/disable-sticky-on-auto/requirements.md b/.roo/specs/disable-sticky-on-auto/requirements.md
@@ -0,0 +1,44 @@
+# Requirements: Disable Sticky Threads on Auto Endpoint
+
+## Summary
+
+Disable the sticky session/thread feature on the `freellmapi/auto` (balanced routing) endpoint, keeping it active only on the `freellmapi/auto-smart` (smart routing) endpoint.
+
+## Background
+
+The sticky session system in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts) pins a conversation to the same model and API key across multiple turns. This prevents mid-conversation model switching, which can cause hallucinations or inconsistent tone.
+
+Currently, sticky sessions operate for **both** routing modes:
+- `'balanced'` — used by `freellmapi/auto`
+- `'smart'` — used by `freellmapi/auto-smart`
+
+The balanced/auto endpoint uses Thompson Sampling with speed-weighted scoring, intentionally exploring different models to find the best throughput. Sticky sessions contradict this design — they prevent exploration by pinning to whatever model happened to serve the first turn.
+
+The smart/auto-smart endpoint prioritizes intelligence and consistency, where sticky sessions are desirable to maintain coherent conversations.
+
+## Requirements
+
+### R1: No sticky model pinning on balanced/auto endpoint
+When `routingMode === 'balanced'`, the system must **not** read or write sticky model preferences. Calls to [`getStickyModel()`](server/src/routes/proxy.ts:35) and [`setStickyModel()`](server/src/routes/proxy.ts:199) must be skipped for balanced mode.
+
+### R2: No sticky key pinning on balanced/auto endpoint
+When `routingMode === 'balanced'`, the system must **not** read or write sticky API key preferences. Calls to [`getStickyKey()`](server/src/routes/proxy.ts:55) must be skipped for balanced mode.
+
+### R3: No session-level platform bans on balanced/auto endpoint
+When `routingMode === 'balanced'`, the system must **not** track or check session-level platform bans. Calls to [`isSessionBannedFromPlatform()`](server/src/routes/proxy.ts:92), [`banPlatformFromSession()`](server/src/routes/proxy.ts:108), and related `skipModels` logic from session bans must be skipped for balanced mode.
+
+### R4: Sticky sessions remain fully active on smart/auto-smart endpoint
+All sticky session functionality (model pinning, key pinning, platform bans) must continue working unchanged when `routingMode === 'smart'`.
+
+### R5: Per-request retry skip logic remains for both modes
+The `skipModels` and `skipKeys` sets used within a single request's retry loop must continue working for both modes. These are intra-request fallback mechanisms, not cross-request sticky state.
+
+### R6: Existing tests must pass
+All existing tests in [`provider-session-ban.test.ts`](server/src/__tests__/routes/provider-session-ban.test.ts) and [`full-flow.test.ts`](server/src/__tests__/integration/full-flow.test.ts) must continue passing. New test cases should verify that balanced mode skips sticky operations.
+
+## Out of Scope
+
+- Changing the routing algorithm for either mode
+- Removing the sticky session infrastructure (functions, maps) — they remain available for smart mode
+- Modifying the `/v1/models` endpoint or model ID constants
+- Changing how `getSessionKey()` hashes messages
diff --git a/.roo/specs/disable-sticky-on-auto/tasks.md b/.roo/specs/disable-sticky-on-auto/tasks.md
@@ -0,0 +1,16 @@
+# Tasks: Disable Sticky Threads on Auto Endpoint
+
+## Task List
+
+- [x] **T1: Modify `getSessionKey()` in `server/src/routes/proxy.ts`** — Add an early return for `routingMode === 'balanced'` that returns an empty string, disabling all sticky session operations for the auto/balanced endpoint. This is the single code change that cascades through all sticky functions.
+
+- [x] **T2: Add balanced-mode sticky skip tests in `server/src/__tests__/routes/provider-session-ban.test.ts`** — Add a new `describe` block verifying that balanced mode skips sticky operations:
+  - `getStickyModel()` returns `undefined` for balanced mode even when a smart-mode sticky entry exists for the same messages
+  - `isSessionBannedFromPlatform()` returns `false` for balanced mode
+  - `banPlatformFromSession()` does not create entries for balanced mode
+  - `setStickyModel()` does not create entries for balanced mode
+  - `getSessionKey()` returns `''` for balanced mode
+
+- [x] **T3: Run existing test suite** — Verify all existing tests in `provider-session-ban.test.ts` and `full-flow.test.ts` still pass after the change.
+
+- [ ] **T4: Manual smoke test** — Send a request to `freellmapi/auto` and confirm logs show `[Sticky] miss key= | msgs=... → free routing` (empty key prefix) rather than a sticky hit. Send a follow-up request with the same first user message and confirm it routes freely again rather than pinning.
diff --git a/.roo/specs/generalized-thread-protection/design.md b/.roo/specs/generalized-thread-protection/design.md
@@ -0,0 +1,152 @@
+# Design: Generalized Thread Protection Scanner
+
+## Architecture Overview
+
+The thread protection scanner replaces all hardcoded `route.platform === 'longcat'` branches in `handleChatCompletion()` with a dynamic, provider-agnostic decision engine. The scanner evaluates error context against configurable per-platform protection rules to determine whether to ban an entire provider or just a single model.
+
+The scanner lives in a new module `server/src/services/threadProtection.ts` and is called from the retry loop in `proxy.ts`. It returns a `ThreadProtectionAction` that tells the caller exactly what to do.
+
+```mermaid
+graph TD
+    subgraph Proxy [proxy.ts — handleChatCompletion]
+        RETRY[Retry loop catch block] --> SCAN{threadProtection.scan}
+        STREAM_ERR[Mid-stream error handler] --> SCAN
+        TRUNC[Truncation detector] --> SCAN
+    end
+
+    subgraph Scanner [threadProtection.ts]
+        SCAN --> RULES{Protection rules lookup}
+        RULES -->|platform config| DECIDE{Decide action}
+        RULES -->|default| DECIDE
+        DECIDE --> ACTION[ThreadProtectionAction]
+    end
+
+    ACTION -->|banProvider| BAN[banPlatformFromSession + addProviderModelsToSkipModels]
+    ACTION -->|skipModel| SKIP[skipModels.add]
+    ACTION -->|clearSticky| CLEAR[preferredModel = undefined]
+```
+
+## Protection Rules
+
+Each platform can be configured with a protection level that determines how aggressively the scanner responds to errors:
+
+| Level | Behavior on 5xx | Behavior on truncation | Behavior on retryable error |
+|-------|----------------|----------------------|---------------------------|
+| `provider-ban` | Ban entire provider | Ban entire provider | Ban entire provider |
+| `model-skip` | Skip single model | Skip single model | Skip single model |
+| `off` | No protection action | No protection action | No protection action |
+
+### Configuration
+
+The `THREAD_PROTECTION_PLATFORMS` env var is a comma-separated list of `platform:level` pairs:
+
+```
+THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
+```
+
+When unset, the scanner uses a **default protection map** hardcoded in the module that preserves the existing LongCat behavior (`longcat → provider-ban`) and applies `model-skip` to all other platforms. This ensures full backward compatibility — existing deployments see zero behavior change without any env var configuration.
+
+## Scanner API
+
+```typescript
+// server/src/services/threadProtection.ts
+
+export type ProtectionLevel = 'provider-ban' | 'model-skip' | 'off';
+
+export type ErrorContextKind = '5xx' | 'truncation' | 'retryable';
+
+export interface ErrorContext {
+  platform: string;
+  kind: ErrorContextKind;
+  /** Whether the error occurred mid-stream (after SSE headers sent) */
+  midStream: boolean;
+  /** The model DB ID — always available */
+  modelDbId: number;
+  /** The error object, for logging */
+  error?: unknown;
+}
+
+export interface ThreadProtectionAction {
+  /** Ban the entire platform for this session */
+  banProvider: boolean;
+  /** Skip just this model */
+  skipModel: boolean;
+  /** Clear sticky model/key if pinned to this platform */
+  clearStickyIfPinned: boolean;
+  /** Human-readable reason for logging */
+  reason: string;
+}
+
+export function evaluateThreadProtection(ctx: ErrorContext): ThreadProtectionAction;
+```
+
+## Decision Matrix
+
+The `evaluateThreadProtection` function implements this decision matrix:
+
+| Protection Level | `5xx` | `truncation` | `retryable` |
+|------------------|-------|--------------|-------------|
+| `provider-ban` | `banProvider=true, skipModel=false, clearStickyIfPinned=true` | `banProvider=true, skipModel=false, clearStickyIfPinned=true` | `banProvider=true, skipModel=false, clearStickyIfPinned=true` |
+| `model-skip` | `banProvider=false, skipModel=true, clearStickyIfPinned=false` | `banProvider=false, skipModel=true, clearStickyIfPinned=false` | `banProvider=false, skipModel=true, clearStickyIfPinned=false` |
+| `off` | All false | All false | All false |
+
+## Integration Points in proxy.ts
+
+The scanner replaces 6 hardcoded `longcat` blocks:
+
+### 1. Stream truncation detection (line ~1394)
+```typescript
+// BEFORE:
+if (route.platform === 'longcat') {
+  banPlatformFromSession(..., 'longcat', ...);
+  addProviderModelsToSkipModels(skipModels, 'longcat');
+} else {
+  skipModels.add(route.modelDbId);
+}
+
+// AFTER:
+const action = evaluateThreadProtection({
+  platform: route.platform, kind: 'truncation', midStream: false, modelDbId: route.modelDbId,
+});
+if (action.banProvider) {
+  banPlatformFromSession(normalizedMessages, routingMode, route.platform, route.modelDbId);
+  addProviderModelsToSkipModels(skipModels, route.platform);
+}
+if (action.skipModel) skipModels.add(route.modelDbId);
+if (action.clearStickyIfPinned) { /* clear sticky if pinned to this platform */ }
+```
+
+### 2. Mid-stream 5xx (line ~1467)
+### 3. Mid-stream truncation error (line ~1492)
+### 4. Mid-stream retryable error (line ~1523)
+### 5. Non-stream 5xx (line ~1624)
+### 6. Non-stream retryable error (line ~1645)
+
+All 6 blocks follow the same pattern: replace the `if (route.platform === 'longcat') { ... } else { ... }` with a single `evaluateThreadProtection()` call.
+
+## Sticky Cooldown Generalization
+
+The LongCat sticky cooldown check (line ~1210-1222) is also generalized. Instead of checking `prefRow?.platform === 'longcat'`, it checks whether the sticky platform has `provider-ban` protection level:
+
+```typescript
+// BEFORE:
+if (prefRow?.platform === 'longcat') { ... addProviderModelsToSkipModels(skipModels, 'longcat'); ... }
+
+// AFTER:
+const stickyProtection = getProtectionLevel(prefRow?.platform ?? '');
+if (stickyProtection === 'provider-ban') {
+  // Apply cooldown exclusion for provider-ban platforms
+  addProviderModelsToSkipModels(skipModels, prefRow!.platform);
+}
+```
+
+This ensures that any future platform configured with `provider-ban` automatically gets the same cooldown protection.
+
+## Files to Modify
+
+| # | File | Change |
+|---|------|--------|
+| 1 | `server/src/services/threadProtection.ts` | **Create** — new scanner module |
+| 2 | `server/src/routes/proxy.ts` | Replace 6 hardcoded `longcat` blocks + cooldown block with scanner calls |
+| 3 | `server/src/__tests__/services/threadProtection.test.ts` | **Create** — unit tests for the scanner |
+| 4 | `server/src/__tests__/routes/proxy-tools.test.ts` | Update test assertions to use generic protection log messages |
diff --git a/.roo/specs/generalized-thread-protection/requirements.md b/.roo/specs/generalized-thread-protection/requirements.md
@@ -0,0 +1,5 @@
+# Requirements: Generalized Thread Protection Scanner
+
+## Problem Statement
+
+The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{
diff --git a/.roo/specs/generalized-thread-protection/tasks.md b/.roo/specs/generalized-thread-protection/tasks.md
@@ -0,0 +1,12 @@
+# Tasks: Generalized Thread Protection (Exclusive Model Sessions)
+
+## Implementation Tasks
+
+- [x] T-1: Rename `LONGCAT_STICKY_COOLDOWN_MS` to `THREAD_COOLDOWN_MS` in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:18) and update all references throughout the file
+- [x] T-2: Remove the hardcoded LongCat cooldown block (the `if (preferredModel)` block checking `prefRow?.platform === 'longcat'` and calling `addProviderModelsToSkipModels(skipModels, 'longcat')`)
+- [x] T-3: Remove the hardcoded Owl Alpha cooldown block (the `if (preferredModel)` block checking `prefRow?.platform === 'openrouter' && prefRow?.model_id === 'owl-alpha'` and calling `skipModels.add(preferredModel)`)
+- [x] T-4: Insert the generalized thread protection scanner at the same location where the removed blocks were, after the session ban sticky override and before the retry loop — including the `activeCooldownModels` collection loop, the exhaustion protection SQL query, and the conditional `skipModels` addition
+- [ ] T-5: Verify the execution order of the `skipModels` pipeline: session bans → transient cooldowns → global cooldown sticky override → session ban sticky override → thread protection scanner → retry loop
+- [ ] T-6: Create [`server/src/__tests__/routes/thread-protection.test.ts`](server/src/__tests__/routes/thread-protection.test.ts) with unit tests covering: dynamic exclusivity, exhaustion bypass, self-preservation, expired entries, and multiple busy models
+- [ ] T-7: Run the existing test suite to confirm no regressions in routing, fallback, or provider-session-ban tests
+- [ ] T-8: Manual smoke test: send two concurrent requests from different sessions and verify thread protection logs appear correctly, and that the second session routes to an alternative model