Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions .roo/specs/disable-sticky-on-auto/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Design: Disable Sticky Threads on Auto Endpoint

## Design Approach

**Single-point guard in `getSessionKey()`** — modify [`getSessionKey()`](server/src/routes/proxy.ts:25) to return an empty string when `routingMode === 'balanced'`. This cascades through all sticky session functions because every one of them calls `getSessionKey()` first and returns early when the key is empty.

## Why This Approach

Every sticky session function in [`proxy.ts`](server/src/routes/proxy.ts) follows the same pattern:

```
function stickyOp(messages, routingMode, ...) {
const key = getSessionKey(messages, routingMode);
if (!key) return <no-op value>; // undefined, false, or early return
...operate on stickySessionMap using key...
}
```

By making `getSessionKey()` return `''` for balanced mode, all downstream functions automatically become no-ops:

| Function | No-op return when key is empty | Effect for balanced mode |
|---|---|---|
| [`getStickyModel()`](server/src/routes/proxy.ts:35) | `undefined` | No model pinning → free routing every request |
| [`getStickyKey()`](server/src/routes/proxy.ts:55) | `undefined` | No key pinning → round-robin key selection |
| [`setStickyModel()`](server/src/routes/proxy.ts:199) | early return | No sticky entries ever created |
| [`clearStickyModel()`](server/src/routes/proxy.ts:180) | early return | No-op — nothing to clear |
| [`clearStickyKey()`](server/src/routes/proxy.ts:188) | early return | No-op — nothing to clear |
| [`isSessionBannedFromPlatform()`](server/src/routes/proxy.ts:92) | `false` | No platform bans checked |
| [`banPlatformFromSession()`](server/src/routes/proxy.ts:108) | early return | No platform bans recorded |

Direct `stickySessionMap` accesses in [`handleChatCompletion()`](server/src/routes/proxy.ts:1057) also use `getSessionKey()` and guard on the result being truthy, so they are automatically skipped:

- **Session ban skipModels** (lines 1176–1189): `if (sessionKey)` guard → skipped when key is `''`
- **LongCat sticky cooldown** (lines 1205–1217): `cooldownSessionKey ? ... : undefined` → skipped when key is `''`

## Changes Required

### 1. Modify `getSessionKey()` in `server/src/routes/proxy.ts`

```typescript
function getSessionKey(messages: ChatMessage[], routingMode: RoutingMode): string {
// Sticky sessions only apply to smart/auto-smart routing.
// Balanced/auto uses free routing on every request.
if (routingMode === 'balanced') return '';

const firstUser = messages.find(m => m.role === 'user');
if (!firstUser || typeof firstUser.content !== 'string') return '';
return crypto.createHash('sha1').update(`${routingMode}:${firstUser.content}`).digest('hex');
}
```

This is the **only code change** needed. All other functions and call sites remain untouched.

### 2. Update tests in `server/src/__tests__/routes/provider-session-ban.test.ts`

Add test cases verifying that balanced mode skips sticky operations:
- `getStickyModel()` returns `undefined` for balanced mode even when a sticky entry exists for the same messages under smart mode
- `isSessionBannedFromPlatform()` returns `false` for balanced mode
- `banPlatformFromSession()` does not create entries for balanced mode
- `setStickyModel()` does not create entries for balanced mode

### 3. No changes to `server/src/services/router.ts`

The router itself does not interact with sticky sessions — it only receives `preferredModel` and `preferredKeyId` as optional parameters. When those are `undefined` (which they will be for balanced mode), the router already does free routing.

## Flow Diagram

```mermaid
flowchart TD
A[Request arrives] --> B{model field?}
B -->|Explicit model| C[Pin to requested model]
B -->|No model field| D{routingMode?}
D -->|balanced| E[getSessionKey returns empty string]
D -->|smart| F[getSessionKey returns hash]
E --> G[All sticky functions return no-op]
G --> H[Free Thompson Sampling routing]
F --> I[Sticky model/key lookup]
I --> J{Sticky hit?}
J -->|Yes| K[Pin to sticky model + key]
J -->|No| H
K --> L[Route to pinned model]
H --> M[Route to best sampled model]
L --> N[On success: setStickyModel saves for smart]
M --> O[On success: setStickyModel is no-op for balanced]
```

## Edge Cases

- **Mode switch mid-conversation**: Session keys include `routingMode` in the hash, so balanced and smart entries for the same messages are distinct. No cross-contamination.
- **`stickySessionMap` size cleanup**: Since balanced mode never creates entries, the map only grows from smart-mode sessions. Existing eviction logic remains sufficient.
- **`responseSessionMap`**: Separate from sticky sessions — used for the Responses API `previous_response_id` feature. Unaffected by this change.
- **Per-request `skipModels`/`skipKeys`**: These are intra-request retry state, not sticky state. They remain active for both modes.

## Risks

- **Low risk**: The change is a single early-return in one function. All downstream behavior is already designed to handle empty keys gracefully.
- **No backward compatibility concern**: Existing smart-mode sessions continue working identically. Balanced-mode sessions simply stop being created — there is no data to migrate or lose.
44 changes: 44 additions & 0 deletions .roo/specs/disable-sticky-on-auto/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Requirements: Disable Sticky Threads on Auto Endpoint

## Summary

Disable the sticky session/thread feature on the `freellmapi/auto` (balanced routing) endpoint, keeping it active only on the `freellmapi/auto-smart` (smart routing) endpoint.

## Background

The sticky session system in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts) pins a conversation to the same model and API key across multiple turns. This prevents mid-conversation model switching, which can cause hallucinations or inconsistent tone.

Currently, sticky sessions operate for **both** routing modes:
- `'balanced'` — used by `freellmapi/auto`
- `'smart'` — used by `freellmapi/auto-smart`

The balanced/auto endpoint uses Thompson Sampling with speed-weighted scoring, intentionally exploring different models to find the best throughput. Sticky sessions contradict this design — they prevent exploration by pinning to whatever model happened to serve the first turn.

The smart/auto-smart endpoint prioritizes intelligence and consistency, where sticky sessions are desirable to maintain coherent conversations.

## Requirements

### R1: No sticky model pinning on balanced/auto endpoint
When `routingMode === 'balanced'`, the system must **not** read or write sticky model preferences. Calls to [`getStickyModel()`](server/src/routes/proxy.ts:35) and [`setStickyModel()`](server/src/routes/proxy.ts:199) must be skipped for balanced mode.

### R2: No sticky key pinning on balanced/auto endpoint
When `routingMode === 'balanced'`, the system must **not** read or write sticky API key preferences. Calls to [`getStickyKey()`](server/src/routes/proxy.ts:55) must be skipped for balanced mode.

### R3: No session-level platform bans on balanced/auto endpoint
When `routingMode === 'balanced'`, the system must **not** track or check session-level platform bans. Calls to [`isSessionBannedFromPlatform()`](server/src/routes/proxy.ts:92), [`banPlatformFromSession()`](server/src/routes/proxy.ts:108), and related `skipModels` logic from session bans must be skipped for balanced mode.

### R4: Sticky sessions remain fully active on smart/auto-smart endpoint
All sticky session functionality (model pinning, key pinning, platform bans) must continue working unchanged when `routingMode === 'smart'`.

### R5: Per-request retry skip logic remains for both modes
The `skipModels` and `skipKeys` sets used within a single request's retry loop must continue working for both modes. These are intra-request fallback mechanisms, not cross-request sticky state.

### R6: Existing tests must pass
All existing tests in [`provider-session-ban.test.ts`](server/src/__tests__/routes/provider-session-ban.test.ts) and [`full-flow.test.ts`](server/src/__tests__/integration/full-flow.test.ts) must continue passing. New test cases should verify that balanced mode skips sticky operations.

## Out of Scope

- Changing the routing algorithm for either mode
- Removing the sticky session infrastructure (functions, maps) — they remain available for smart mode
- Modifying the `/v1/models` endpoint or model ID constants
- Changing how `getSessionKey()` hashes messages
16 changes: 16 additions & 0 deletions .roo/specs/disable-sticky-on-auto/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Tasks: Disable Sticky Threads on Auto Endpoint

## Task List

- [x] **T1: Modify `getSessionKey()` in `server/src/routes/proxy.ts`** — Add an early return for `routingMode === 'balanced'` that returns an empty string, disabling all sticky session operations for the auto/balanced endpoint. This is the single code change that cascades through all sticky functions.

- [x] **T2: Add balanced-mode sticky skip tests in `server/src/__tests__/routes/provider-session-ban.test.ts`** — Add a new `describe` block verifying that balanced mode skips sticky operations:
- `getStickyModel()` returns `undefined` for balanced mode even when a smart-mode sticky entry exists for the same messages
- `isSessionBannedFromPlatform()` returns `false` for balanced mode
- `banPlatformFromSession()` does not create entries for balanced mode
- `setStickyModel()` does not create entries for balanced mode
- `getSessionKey()` returns `''` for balanced mode

- [x] **T3: Run existing test suite** — Verify all existing tests in `provider-session-ban.test.ts` and `full-flow.test.ts` still pass after the change.

- [ ] **T4: Manual smoke test** — Send a request to `freellmapi/auto` and confirm logs show `[Sticky] miss key= | msgs=... → free routing` (empty key prefix) rather than a sticky hit. Send a follow-up request with the same first user message and confirm it routes freely again rather than pinning.
152 changes: 152 additions & 0 deletions .roo/specs/generalized-thread-protection/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Design: Generalized Thread Protection Scanner

## Architecture Overview

The thread protection scanner replaces all hardcoded `route.platform === 'longcat'` branches in `handleChatCompletion()` with a dynamic, provider-agnostic decision engine. The scanner evaluates error context against configurable per-platform protection rules to determine whether to ban an entire provider or just a single model.

The scanner lives in a new module `server/src/services/threadProtection.ts` and is called from the retry loop in `proxy.ts`. It returns a `ThreadProtectionAction` that tells the caller exactly what to do.

```mermaid
graph TD
subgraph Proxy [proxy.ts — handleChatCompletion]
RETRY[Retry loop catch block] --> SCAN{threadProtection.scan}
STREAM_ERR[Mid-stream error handler] --> SCAN
TRUNC[Truncation detector] --> SCAN
end

subgraph Scanner [threadProtection.ts]
SCAN --> RULES{Protection rules lookup}
RULES -->|platform config| DECIDE{Decide action}
RULES -->|default| DECIDE
DECIDE --> ACTION[ThreadProtectionAction]
end

ACTION -->|banProvider| BAN[banPlatformFromSession + addProviderModelsToSkipModels]
ACTION -->|skipModel| SKIP[skipModels.add]
ACTION -->|clearSticky| CLEAR[preferredModel = undefined]
```

## Protection Rules

Each platform can be configured with a protection level that determines how aggressively the scanner responds to errors:

| Level | Behavior on 5xx | Behavior on truncation | Behavior on retryable error |
|-------|----------------|----------------------|---------------------------|
| `provider-ban` | Ban entire provider | Ban entire provider | Ban entire provider |
| `model-skip` | Skip single model | Skip single model | Skip single model |
| `off` | No protection action | No protection action | No protection action |

### Configuration

The `THREAD_PROTECTION_PLATFORMS` env var is a comma-separated list of `platform:level` pairs:

```
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
```

When unset, the scanner uses a **default protection map** hardcoded in the module that preserves the existing LongCat behavior (`longcat → provider-ban`) and applies `model-skip` to all other platforms. This ensures full backward compatibility — existing deployments see zero behavior change without any env var configuration.

## Scanner API

```typescript
// server/src/services/threadProtection.ts

export type ProtectionLevel = 'provider-ban' | 'model-skip' | 'off';

export type ErrorContextKind = '5xx' | 'truncation' | 'retryable';

export interface ErrorContext {
platform: string;
kind: ErrorContextKind;
/** Whether the error occurred mid-stream (after SSE headers sent) */
midStream: boolean;
/** The model DB ID — always available */
modelDbId: number;
/** The error object, for logging */
error?: unknown;
}

export interface ThreadProtectionAction {
/** Ban the entire platform for this session */
banProvider: boolean;
/** Skip just this model */
skipModel: boolean;
/** Clear sticky model/key if pinned to this platform */
clearStickyIfPinned: boolean;
/** Human-readable reason for logging */
reason: string;
}

export function evaluateThreadProtection(ctx: ErrorContext): ThreadProtectionAction;
```

## Decision Matrix

The `evaluateThreadProtection` function implements this decision matrix:

| Protection Level | `5xx` | `truncation` | `retryable` |
|------------------|-------|--------------|-------------|
| `provider-ban` | `banProvider=true, skipModel=false, clearStickyIfPinned=true` | `banProvider=true, skipModel=false, clearStickyIfPinned=true` | `banProvider=true, skipModel=false, clearStickyIfPinned=true` |
| `model-skip` | `banProvider=false, skipModel=true, clearStickyIfPinned=false` | `banProvider=false, skipModel=true, clearStickyIfPinned=false` | `banProvider=false, skipModel=true, clearStickyIfPinned=false` |
| `off` | All false | All false | All false |

## Integration Points in proxy.ts

The scanner replaces 6 hardcoded `longcat` blocks:

### 1. Stream truncation detection (line ~1394)
```typescript
// BEFORE:
if (route.platform === 'longcat') {
banPlatformFromSession(..., 'longcat', ...);
addProviderModelsToSkipModels(skipModels, 'longcat');
} else {
skipModels.add(route.modelDbId);
}

// AFTER:
const action = evaluateThreadProtection({
platform: route.platform, kind: 'truncation', midStream: false, modelDbId: route.modelDbId,
});
if (action.banProvider) {
banPlatformFromSession(normalizedMessages, routingMode, route.platform, route.modelDbId);
addProviderModelsToSkipModels(skipModels, route.platform);
}
if (action.skipModel) skipModels.add(route.modelDbId);
if (action.clearStickyIfPinned) { /* clear sticky if pinned to this platform */ }
```

### 2. Mid-stream 5xx (line ~1467)
### 3. Mid-stream truncation error (line ~1492)
### 4. Mid-stream retryable error (line ~1523)
### 5. Non-stream 5xx (line ~1624)
### 6. Non-stream retryable error (line ~1645)

All 6 blocks follow the same pattern: replace the `if (route.platform === 'longcat') { ... } else { ... }` with a single `evaluateThreadProtection()` call.

## Sticky Cooldown Generalization

The LongCat sticky cooldown check (line ~1210-1222) is also generalized. Instead of checking `prefRow?.platform === 'longcat'`, it checks whether the sticky platform has `provider-ban` protection level:

```typescript
// BEFORE:
if (prefRow?.platform === 'longcat') { ... addProviderModelsToSkipModels(skipModels, 'longcat'); ... }

// AFTER:
const stickyProtection = getProtectionLevel(prefRow?.platform ?? '');
if (stickyProtection === 'provider-ban') {
// Apply cooldown exclusion for provider-ban platforms
addProviderModelsToSkipModels(skipModels, prefRow!.platform);
}
```

This ensures that any future platform configured with `provider-ban` automatically gets the same cooldown protection.

## Files to Modify

| # | File | Change |
|---|------|--------|
| 1 | `server/src/services/threadProtection.ts` | **Create** — new scanner module |
| 2 | `server/src/routes/proxy.ts` | Replace 6 hardcoded `longcat` blocks + cooldown block with scanner calls |
| 3 | `server/src/__tests__/services/threadProtection.test.ts` | **Create** — unit tests for the scanner |
| 4 | `server/src/__tests__/routes/proxy-tools.test.ts` | Update test assertions to use generic protection log messages |
5 changes: 5 additions & 0 deletions .roo/specs/generalized-thread-protection/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Requirements: Generalized Thread Protection Scanner

## Problem Statement

The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{
Comment on lines +1 to +5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Requirements document is incomplete.

The file appears truncated at line 5, cutting off mid-sentence. The requirements section is missing, which is critical for understanding the feature scope and acceptance criteria.

The file ends with:

The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{

This looks like an incomplete save or file corruption. Please complete the requirements document with:

  • Problem statement (appears to be started)
  • List of requirements with acceptance criteria
  • Out of scope items
  • Dependencies (if any)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/requirements.md around lines 1 - 5,
The requirements.md file is truncated mid-sentence; finish the document by
completing the Problem Statement (referencing the proxy route handler in
server/src/routes/proxy.ts and the existing special-casing of "longcat"), then
add a clear numbered List of Requirements with measurable Acceptance Criteria
(e.g., remove hardcoded branches, implement configurable rules, pass
unit/integration tests, performance and security checks), an Out of Scope
section stating what will not be changed (e.g., unrelated routes, legacy
clients), and a Dependencies section listing impacted components
(server/src/routes/proxy.ts, any config files, tests, and deployment steps);
ensure each requirement includes an acceptance test or success metric and that
terminology matches identifiers like "longcat" and "proxy route handler" so
reviewers can map requirements to code.

12 changes: 12 additions & 0 deletions .roo/specs/generalized-thread-protection/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Tasks: Generalized Thread Protection (Exclusive Model Sessions)

## Implementation Tasks

- [x] T-1: Rename `LONGCAT_STICKY_COOLDOWN_MS` to `THREAD_COOLDOWN_MS` in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:18) and update all references throughout the file
- [x] T-2: Remove the hardcoded LongCat cooldown block (the `if (preferredModel)` block checking `prefRow?.platform === 'longcat'` and calling `addProviderModelsToSkipModels(skipModels, 'longcat')`)
- [x] T-3: Remove the hardcoded Owl Alpha cooldown block (the `if (preferredModel)` block checking `prefRow?.platform === 'openrouter' && prefRow?.model_id === 'owl-alpha'` and calling `skipModels.add(preferredModel)`)
- [x] T-4: Insert the generalized thread protection scanner at the same location where the removed blocks were, after the session ban sticky override and before the retry loop — including the `activeCooldownModels` collection loop, the exhaustion protection SQL query, and the conditional `skipModels` addition
- [ ] T-5: Verify the execution order of the `skipModels` pipeline: session bans → transient cooldowns → global cooldown sticky override → session ban sticky override → thread protection scanner → retry loop
- [ ] T-6: Create [`server/src/__tests__/routes/thread-protection.test.ts`](server/src/__tests__/routes/thread-protection.test.ts) with unit tests covering: dynamic exclusivity, exhaustion bypass, self-preservation, expired entries, and multiple busy models
- [ ] T-7: Run the existing test suite to confirm no regressions in routing, fallback, or provider-session-ban tests
- [ ] T-8: Manual smoke test: send two concurrent requests from different sessions and verify thread protection logs appear correctly, and that the second session routes to an alternative model
Loading