Skip to content

feat: generalized thread protection scanner#10

Closed
vi70x3 wants to merge 2 commits into
mainfrom
spec/generalized-thread-protection
Closed

feat: generalized thread protection scanner#10
vi70x3 wants to merge 2 commits into
mainfrom
spec/generalized-thread-protection

Conversation

@vi70x3

@vi70x3 vi70x3 commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Implements the generalized-thread-protection spec. Replaces hardcoded LongCat and Owl Alpha cooldown blocks with a dynamic thread protection scanner.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automatic heartbeat and stall detection for streaming connections to prevent timeouts
    • Enabled detection and handling of wrapped errors in HTTP 200 responses
    • Implemented dynamic model exclusion during provider cooldown windows
  • Tests

    • Added comprehensive test coverage for stream heartbeat and stall protection mechanisms
  • Documentation

    • Added design specifications for thread protection and wrapped error handling

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements two independent resilience features and establishes thread-protection infrastructure: (1) wrapped-error detection for upstream payloads returned with HTTP 200 status codes, (2) SSE stream keepalive heartbeats and stall timeouts to prevent indefinite stream hangs, and (3) generalized thread-protection types and sticky-cooldown refactoring to replace hardcoded provider-specific branching.

Changes

Error Handling and Resilience

Layer / File(s) Summary
Wrapped Error Interception: Contract and Base Provider
server/src/providers/base.ts, .roo/specs/wrapped-error-interception/*
Design and requirements documented for detecting upstream error payloads in HTTP 200 responses. BaseProvider gains protected helpers isWrappedError() and throwWrappedError(), and extractErrorMessage() visibility changed from private to protected for code reuse.
Wrapped Error Interception: Provider Implementation
server/src/providers/openai-compat.ts, server/src/providers/cohere.ts, server/src/providers/cloudflare.ts, server/src/providers/google.ts
Both non-streaming chatCompletion() and streaming streamChatCompletion() paths in all four providers add wrapped-error detection immediately after JSON parsing, throwing ProviderApiError to terminate streams on error detection.
Stream Keepalive and Stall Protection
server/src/routes/proxy.ts, server/src/__tests__/routes/stream-heartbeat-stall.test.ts
Exports streamKeepaliveConfig and extends streaming retry loop with heartbeat intervals and stall detection: tracks last chunk timestamp, emits SSE keep-alive comments, aborts streams exceeding MAX_STREAM_STALL_MS, responds with error frames (in-stream) or HTTP 504 (pre-stream), and cleans up on completion or disconnect. Comprehensive test suite validates all scenarios.
Thread Protection Framework: Types and Integration
server/src/services/threadProtection.ts, server/src/routes/proxy.ts, .roo/specs/generalized-thread-protection/*, server/src/__tests__/routes/proxy-tools.test.ts
Thread-protection service module introduces ProtectionLevel, ErrorContext, and ThreadProtectionAction types, with placeholder evaluateThreadProtection(). Proxy imports the function and replaces LongCat-specific sticky-cooldown logic with generic thread-protection mechanism. Test setup updated to clear transient model cooldowns.
Tooling: Streaming Proxy Script
fix_streaming.py
Python script to locate and modify the streaming conditional block in proxy.ts by scanning for the if (stream) { marker.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ProxyRoute
  participant StreamSource
  ProxyRoute->>StreamSource: start streaming
  StreamSource-->>ProxyRoute: first chunk (after delay)
  ProxyRoute->>ProxyRoute: track lastChunkTimestamp
  activate ProxyRoute
  ProxyRoute->>ProxyRoute: start keepalive interval
  alt chunk arrives before stall threshold
    StreamSource-->>ProxyRoute: chunk
    ProxyRoute->>ProxyRoute: update lastChunkTimestamp
    ProxyRoute-->>Client: SSE data
  else stall exceeds MAX_STREAM_STALL_MS
    ProxyRoute->>StreamSource: abort
    alt headers already sent
      ProxyRoute-->>Client: SSE error frame + [DONE]
    else pre-stream stall
      ProxyRoute-->>Client: HTTP 504 + error payload
    end
  end
  deactivate ProxyRoute
  ProxyRoute->>ProxyRoute: clear keepalive interval
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • vi70x3/freellmapi#9: Both PRs modify server/src/routes/proxy.ts to replace provider-level hardcoded branching with generalized logic for model exclusion and retry handling.
  • vi70x3/freellmapi#8: Directly related in removing hardcoded longcat branching from proxy.ts sticky-cooldown logic in favor of generalized thread-protection decision engine.
  • vi70x3/freellmapi#2: Overlaps with sticky-session retry and pinning behavior changes that the generalized thread-protection logic is now replacing.

Poem

🐰 Stream keepalive and wrapped errors caught,
Thread protection rises, provider branching fought,
No more hardcoded paths for just the one,
Resilience blooms—the heartbeat's begun! 💚

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: generalized thread protection scanner' directly and accurately describes the main feature being implemented—replacing hardcoded platform-specific cooldown logic with a generalized, provider-agnostic thread protection mechanism.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec/generalized-thread-protection

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a generalized thread protection scanner, wrapped error payload detection on HTTP 200 responses, and stream heartbeat/stall protection. While the wrapped error handling and heartbeat features are well-structured, several critical issues must be addressed. Specifically, the thread protection scanner is incomplete as the main evaluation function is a placeholder and the proxy integration is missing. Additionally, there is a test suite crash due to an invalid import in the proxy tests, a potential logic bug in the exhaustion protection check, a potential NaN status code issue in the base provider, and both the helper Python script and requirements document are truncated.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +815 to +817
// Dynamic import to get the same module instance used by the running app
const { transientModelCooldowns: cooldowns } = await import('../../routes/proxy.js');
(cooldowns as Map<any, any>).clear();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The import of transientModelCooldowns from ../../routes/proxy.js will fail because transientModelCooldowns is not defined or exported in server/src/routes/proxy.ts. This will throw a TypeError when attempting to call .clear() on it, causing the test suite to crash.

Suggested change
// Dynamic import to get the same module instance used by the running app
const { transientModelCooldowns: cooldowns } = await import('../../routes/proxy.js');
(cooldowns as Map<any, any>).clear();
(stickySessionMap as Map<any, any>).clear();
const db = getDb();

Comment on lines +1241 to +1248
// Check if cooldown would exhaust all models
let wouldExhaustAll = true;
for (const id of allEnabledIds) {
if (!activeCooldownModels.has(id)) {
wouldExhaustAll = false;
break;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The exhaustion protection check wouldExhaustAll only checks if activeCooldownModels contains all enabled models. It does not take into account the models that are already present in skipModels (e.g., models from banned platforms or previously failed attempts).

If skipModels already contains some models, adding activeCooldownModels to skipModels can exhaust all enabled models, causing the router to fail and return a 503/502 error to the user.

The check should verify if the union of activeCooldownModels and skipModels would exhaust all enabled models.

Suggested change
// Check if cooldown would exhaust all models
let wouldExhaustAll = true;
for (const id of allEnabledIds) {
if (!activeCooldownModels.has(id)) {
wouldExhaustAll = false;
break;
}
}
// Check if cooldown would exhaust all models
let wouldExhaustAll = true;
for (const id of allEnabledIds) {
if (!activeCooldownModels.has(id) && !skipModels.has(id)) {
wouldExhaustAll = false;
break;
}
}

import { Router } from 'express';
import type { Request, Response } from 'express';
import { z } from 'zod';
import { evaluateThreadProtection } from '../services/threadProtection.js';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The evaluateThreadProtection function is imported here but never used in this file. The 6 hardcoded longcat blocks described in the design spec (.roo/specs/generalized-thread-protection/design.md) have not been replaced with calls to evaluateThreadProtection.

Comment on lines +20 to +23
export function evaluateThreadProtection(_ctx: ErrorContext): ThreadProtectionAction {
// Placeholder implementation: no protection
return { banProvider: false, skipModel: false, clearStickyIfPinned: false, reason: 'off' };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The evaluateThreadProtection function is currently just a placeholder returning a static action with no protection. The actual scanner logic and decision matrix described in the design spec have not been implemented.

Comment thread fix_streaming.py
Comment on lines +13 to +15
# Find the if(stream) line
stream_start = None
for i, line in{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Python script is truncated and contains a syntax error at line 15 (for i, line in{). This will cause the script to fail with a SyntaxError if executed.

Comment on lines +142 to +145
error.status =
typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>)
? Number((errPayload as Record<string, unknown>).code)
: 200;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If errPayload.code is a non-numeric string (e.g., "rate_limit_error"), calling Number(...) on it will return NaN. This can cause unexpected behavior in downstream error helpers like isRateLimitError or isRetryableError which expect a valid numeric status code.

Consider checking if the parsed number is a valid number, and default to 200 if it is NaN.

Suggested change
error.status =
typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>)
? Number((errPayload as Record<string, unknown>).code)
: 200;
const parsedCode = typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>)
? Number((errPayload as Record<string, unknown>).code)
: 200;
error.status = isNaN(parsedCode) ? 200 : parsedCode;


## Problem Statement

The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The markdown file is truncated and ends abruptly with longcat{`. Please complete the requirements specification.

});
} else {
res.write(`data: ${JSON.stringify(payload)}\n\n`);
res.write('data: [DONE]\n\n');;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Double semicolon on this line - res.write('data: [DONE]\n\n');; should be res.write('data: [DONE]\n\n');

@kilo-code-bot

kilo-code-bot Bot commented Jun 2, 2026

Copy link
Copy Markdown

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 1
Issue Details (click to expand)

SUGGESTION

File Line Issue
server/src/routes/proxy.ts 1347 Double semicolon typo - ;; should be ;
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
server/src/routes/proxy.ts 5 Imported evaluateThreadProtection from threadProtection.js is never used in this file
server/src/services/threadProtection.ts 20-22 Placeholder function evaluateThreadProtection exists but is never called; consider removing if unused
Files Reviewed (15 files)
  • server/src/providers/base.ts - No issues
  • server/src/providers/openai-compat.ts - No issues
  • server/src/providers/cohere.ts - No issues
  • server/src/providers/cloudflare.ts - No issues
  • server/src/providers/google.ts - No issues
  • server/src/routes/proxy.ts - 2 issues
  • server/src/services/threadProtection.ts - No issues (placeholder)
  • server/src/__tests__/routes/proxy-tools.test.ts - No issues
  • server/src/__tests__/routes/stream-heartbeat-stall.test.ts - No issues
  • .roo/specs/wrapped-error-interception/design.md - No issues (spec)
  • .roo/specs/wrapped-error-interception/requirements.md - No issues (spec)
  • .roo/specs/wrapped-error-interception/tasks.md - No issues (spec)
  • .roo/specs/generalized-thread-protection/design.md - No issues (spec)
  • .roo/specs/generalized-thread-protection/requirements.md - No issues (spec)
  • fix_streaming.py - No issues (incomplete script)

Reviewed by laguna-m.1-20260312:free · 1,266,531 tokens

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (1)
.roo/specs/generalized-thread-protection/design.md (1)

43-45: 💤 Low value

Add language identifier to code fence.

The code fence at line 43 is missing a language identifier, which improves syntax highlighting and accessibility.

📝 Proposed fix
-```
+```bash
 THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
</details>

As per coding guidelines: markdownlint-cli2 flagged missing language specifier.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 45, The
code fence containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" lacks a
language identifier; update the fenced block that contains the
THREAD_PROTECTION_PLATFORMS assignment to include a language tag (e.g., bash) so
the fence becomes bash ... , ensuring the block around the
THREAD_PROTECTION_PLATFORMS string is modified accordingly to satisfy
markdownlint-cli2.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/generalized-thread-protection/requirements.md:

  • Around line 1-5: The requirements document ends mid-sentence ("special-case
    the longcat{") and must be completed or the fragment removed; update the
    Requirements: Generalized Thread Protection Scanner doc to finish the Problem
    Statement by describing that the proxy route handler has 6+ hardcoded branches
    that special-case longcat, then add clear objectives, scope, and acceptance
    criteria for the generalized scanner (use the design.md as the source to draft
    the missing text), or delete the unfinished sentence if unintentional—ensure
    mention of the proxy route handler and the longcat special-case so reviewers
    can trace the requirement to the implementation.

In @.roo/specs/wrapped-error-interception/design.md:

  • Around line 108-120: The try/catch around JSON.parse currently swallows any
    synchronous throws inside it (including ProviderApiError thrown by
    throwWrappedError), so in openai-compat.ts, cohere.ts and cloudflare.ts move the
    wrapped-error detection and throw out of the parse try block: only JSON.parse
    and parsing-related validations remain inside try/catch (to skip malformed
    chunks), then after successful parse call this.isWrappedError(parsed) and, if
    true, call this.throwWrappedError(parsed) so ProviderApiError propagates; mirror
    the safe pattern used in google.ts and ensure generator yield only occurs after
    the wrapped-error check.

In @fix_streaming.py:

  • Around line 13-15: The file fix_streaming.py currently contains an incomplete
    for loop ("for i, line in{") causing a SyntaxError; either finish the script or
    remove it. To fix, complete the iteration over the source lines (replace the
    malformed "for i, line in{" with a proper iterable, e.g., enumerate(lines) or
    similar), implement the logic that searches for the "if(stream)" marker and
    sets/uses the stream_start variable, and ensure the loop and file end with valid
    Python syntax (close blocks and handle cases where "if(stream)" is not found).
    Alternatively, if this file is not needed, delete it to avoid introducing a
    broken file into the repo.

In @server/src/__tests__/routes/proxy-tools.test.ts:

  • Around line 813-817: The test fails because transientModelCooldowns is not
    exported from the proxy router; either export the actual cooldown Map from the
    router module (e.g., add and export a Map named transientModelCooldowns
    alongside stickySessionMap and the set/clear helpers in proxy.ts) so the test
    can import and call .clear(), or add a dedicated exported helper (e.g.,
    clearTransientModelCooldowns) in proxy.ts and update the test to call that
    helper instead; ensure the exported symbol is the same runtime instance the
    router uses so clearing it affects the router state.

In @server/src/providers/base.ts:

  • Around line 135-149: The wrapped-error path in throwWrappedError uses
    extractErrorMessage which currently ignores string-valued error payloads (Format
    2), causing the fallback message to be used; update extractErrorMessage to
    detect when body.error (or the extracted err.error) is a string and return that
    string as the message (instead of attempting to read .message), then leave
    throwWrappedError as-is so ProviderApiError contains the actual string message
    for formats like {"error": "Rate limit exceeded"}; reference the
    extractErrorMessage function and the throwWrappedError method to locate and
    change the logic.
  • Around line 142-145: The wrapped-error handling should guard against
    non-numeric codes and string error payloads: in the block that sets
    error.status (the code that currently does Number((errPayload as
    Record<string, unknown>).code)), validate that errPayload.code exists and is a
    finite numeric value before using it, otherwise fall back to 200 (or undefined)
    so proxy.ts’s getErrorStatus() and isRateLimitError() work correctly; also
    update extractErrorMessage() to handle body.error being a string (use the
    string directly) as well as the object form error.message so message-based
    rate-limit detection sees the real text. Ensure you reference and fix the error
    assignment logic in the provider (error.status setting) and the
    extractErrorMessage routine to cover both string and object error shapes.

In @server/src/providers/cloudflare.ts:

  • Around line 123-131: The parsing try/catch currently wraps both JSON.parse and
    the wrapped-error check so throwWrappedError(parsed) can be swallowed; refactor
    so JSON.parse(data) is the only operation inside the try/catch (catch continues
    to skip malformed chunks), then after a successful parse check
    this.isWrappedError(parsed) and call this.throwWrappedError(parsed) outside the
    catch block so the thrown stream error is not suppressed.

In @server/src/providers/cohere.ts:

  • Around line 114-122: The JSON.parse try/catch is currently swallowing
    exceptions thrown by throwWrappedError; separate parsing from the wrapped-error
    check by limiting the try/catch to only JSON.parse (in the block that
    returns/continues on parse failure), then after a successful parse call
    this.isWrappedError(parsed) and, if true, call this.throwWrappedError(parsed)
    outside the catch so the thrown stream error propagates; only yield parsed when
    it passed the wrapped-error check. Ensure you update the logic around
    ChatCompletionChunk parsing, isWrappedError, throwWrappedError, and the
    generator yield accordingly.

In @server/src/providers/openai-compat.ts:

  • Around line 130-138: The current try/catch around JSON.parse swallows
    exceptions thrown by throwWrappedError(), so wrapped ProviderApiError never
    propagates; refactor the block in openai-compat.ts that parses
    ChatCompletionChunk so JSON parsing and wrapped-error handling are separated:
    first attempt JSON.parse(data) inside a try that only catches/parses malformed
    JSON and yields/continues on parse failure, but after a successful parse call
    this.isWrappedError(parsed) and, if true, call this.throwWrappedError(parsed)
    outside the parse catch so the error bubbles; reference the parsed variable,
    isWrappedError, and throwWrappedError to locate the code to change.

In @server/src/routes/proxy.ts:

  • Line 5: The file imports evaluateThreadProtection but still uses manual
    longcat/cooldown branches; replace the thread protection block (and the other 6
    hardcoded longcat branches) by calling evaluateThreadProtection({ platform:
    prefRow?.platform ?? '', kind: 'retryable', midStream: false, modelDbId:
    preferredModel }) and then apply its directives: if action.banProvider &&
    prefRow?.platform call addProviderModelsToSkipModels(skipModels,
    prefRow.platform) and log the reason, and if action.skipModel do
    skipModels.add(preferredModel); update the block around the existing
    skipModels/preferredModel/prefRow logic to use this pattern and remove the
    manual cooldown/longcat branching.
  • Around line 1207-1260: Replace the in-place cooldown loop with a call to the
    shared scanner: remove the manual logic that iterates stickySessionMap and
    computes activeCooldownModels (the entire Thread protection block) and instead
    call evaluateThreadProtection(...) to populate skipModels; pass in
    normalizedMessages and routingMode (for getSessionKey behavior), the
    stickySessionMap and any TTL/COOLDOWN constants (STICKY_TTL_MS,
    THREAD_COOLDOWN_MS), and the current preferredModel so the scanner can apply
    exhaustion protection; ensure evaluateThreadProtection returns or mutates
    skipModels (or adapt its return to be merged into skipModels) and remove the
    console logs in favor of the scanner’s logging.
  • Line 1347: Remove the stray extra semicolon after the response write call:
    locate the res.write('data: [DONE]\n\n');; statement (the res.write call) and
    delete the duplicate semicolon so it ends with a single semicolon; ensure no
    other trailing punctuation was accidentally removed.

In @server/src/services/threadProtection.ts:

  • Around line 20-23: The evaluateThreadProtection function is a placeholder;
    implement the scanner to read THREAD_PROTECTION_PLATFORMS (or fall back to a
    default map) and determine the ProtectionLevel for ctx.platform, then return
    actions per the decision matrix: 'off' => {banProvider:false, skipModel:false,
    clearStickyIfPinned:false, reason:'off'}, 'provider-ban' => {banProvider:true,
    skipModel:false, clearStickyIfPinned:true, reason:${ctx.platform} ${ctx.kind} error — provider-ban level}, and 'model-skip' => {banProvider:false,
    skipModel:true, clearStickyIfPinned:false, reason:${ctx.platform} ${ctx.kind} error — model-skip level}; add a helper getProtectionLevel(platform: string) to
    parse the env var (e.g. JSON or comma map) and return the configured level or
    defaults (default: longcat → provider-ban, others → model-skip).

Nitpick comments:
In @.roo/specs/generalized-thread-protection/design.md:

  • Around line 43-45: The code fence containing
    THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" lacks a
    language identifier; update the fenced block that contains the
    THREAD_PROTECTION_PLATFORMS assignment to include a language tag (e.g., bash) so
    the fence becomes bash ... , ensuring the block around the
    THREAD_PROTECTION_PLATFORMS string is modified accordingly to satisfy
    markdownlint-cli2.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro Plus

**Run ID**: `15fc3aaa-cc25-4f42-b85e-2b2da7d3a8c0`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 233e03142895cd8c18d441461d5c9568186c7123 and 09bffb08a0cfa590a4de3513bf06af00e58be60b.

</details>

<details>
<summary>📒 Files selected for processing (15)</summary>

* `.roo/specs/generalized-thread-protection/design.md`
* `.roo/specs/generalized-thread-protection/requirements.md`
* `.roo/specs/wrapped-error-interception/design.md`
* `.roo/specs/wrapped-error-interception/requirements.md`
* `.roo/specs/wrapped-error-interception/tasks.md`
* `fix_streaming.py`
* `server/src/__tests__/routes/proxy-tools.test.ts`
* `server/src/__tests__/routes/stream-heartbeat-stall.test.ts`
* `server/src/providers/base.ts`
* `server/src/providers/cloudflare.ts`
* `server/src/providers/cohere.ts`
* `server/src/providers/google.ts`
* `server/src/providers/openai-compat.ts`
* `server/src/routes/proxy.ts`
* `server/src/services/threadProtection.ts`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +1 to +5
# Requirements: Generalized Thread Protection Scanner

## Problem Statement

The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Requirements document appears incomplete.

The file ends mid-sentence at line 5 with "special-case the longcat{". This should be completed or removed if it's unintentional.

Would you like me to help complete this requirements document based on the design.md file?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/requirements.md around lines 1 - 5,
The requirements document ends mid-sentence ("special-case the `longcat`{") and
must be completed or the fragment removed; update the Requirements: Generalized
Thread Protection Scanner doc to finish the Problem Statement by describing that
the proxy route handler has 6+ hardcoded branches that special-case `longcat`,
then add clear objectives, scope, and acceptance criteria for the generalized
scanner (use the design.md as the source to draft the missing text), or delete
the unfinished sentence if unintentional—ensure mention of the proxy route
handler and the `longcat` special-case so reviewers can trace the requirement to
the implementation.

Comment on lines +108 to +120
```typescript
try {
const parsed = JSON.parse(data) as ChatCompletionChunk;
if (this.isWrappedError(parsed)) {
this.throwWrappedError(parsed);
}
yield parsed;
} catch {
// Skip malformed chunks
}
```

**Note**: The `catch` block already skips malformed chunks. The `throwWrappedError()` call throws before `yield`, so the generator terminates immediately. The `try/catch` around `JSON.parse` does NOT catch the `ProviderApiError` thrown by `throwWrappedError()` because that throw happens after successful parsing — it propagates out of the generator to the consumer in `proxy.ts`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Incorrect claim: the catch block does catch throwWrappedError().

The note states the try/catch around JSON.parse does NOT catch the ProviderApiError because the throw happens after successful parsing. That is wrong — a try/catch catches every synchronous throw inside the try body regardless of where it occurs. As written in openai-compat.ts, cohere.ts, and cloudflare.ts, throwWrappedError() is called inside the try, so the empty catch {} swallows it and the wrapped error is silently skipped. The check must be moved outside the parse try/catch (as already done in google.ts). See the corresponding code comments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/wrapped-error-interception/design.md around lines 108 - 120, The
try/catch around JSON.parse currently swallows any synchronous throws inside it
(including ProviderApiError thrown by throwWrappedError), so in
openai-compat.ts, cohere.ts and cloudflare.ts move the wrapped-error detection
and throw out of the parse try block: only JSON.parse and parsing-related
validations remain inside try/catch (to skip malformed chunks), then after
successful parse call this.isWrappedError(parsed) and, if true, call
this.throwWrappedError(parsed) so ProviderApiError propagates; mirror the safe
pattern used in google.ts and ensure generator yield only occurs after the
wrapped-error check.

Comment thread fix_streaming.py
Comment on lines +13 to +15
# Find the if(stream) line
stream_start = None
for i, line in{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("fix_streaming.py")
source = path.read_text()
try:
    ast.parse(source, filename=str(path))
    print("OK: syntax is valid")
except SyntaxError as e:
    print(f"SyntaxError: {e.msg} at line {e.lineno}, column {e.offset}")
    print(source.splitlines()[e.lineno - 1])
PY

Repository: vi70x3/freellmapi

Length of output: 133


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "---- wc -l ----"
wc -l fix_streaming.py
echo "---- sed -n '8,20p' with numbers ----"
nl -ba fix_streaming.py | sed -n '8,20p'
echo "---- tail (last 5 lines) ----"
tail -n 5 fix_streaming.py | nl -ba

Repository: vi70x3/freellmapi

Length of output: 196


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "---- cat -n (entire file) ----"
cat -n fix_streaming.py
echo "---- python: total lines + last line repr + newline presence ----"
python - <<'PY'
from pathlib import Path
p = Path("fix_streaming.py")
s = p.read_text()
print("len(text) =", len(s))
print("endswith_newline =", s.endswith("\n"))
lines = s.splitlines(True)  # keepends
print("line_count =", len(lines))
for i, line in enumerate(lines, start=1):
    if i >= len(lines)-3:
        print(f"line {i} repr: {line!r}")
PY

Repository: vi70x3/freellmapi

Length of output: 859


🏁 Script executed:

python - <<'PY'
from pathlib import Path
import ast
p = Path("fix_streaming.py")
ast.parse(p.read_text(), filename=str(p))
print("OK")
PY

Repository: vi70x3/freellmapi

Length of output: 398


Critical: fix_streaming.py is incomplete/invalid Python (ends mid-for loop).

  • fix_streaming.py stops at line 15 (for i, line in{), causing SyntaxError: '{' was never closed; finish the script so it fully parses (or remove the partial file) before merging.
🧰 Tools
🪛 Ruff (0.15.15)

[warning] 15-15: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fix_streaming.py` around lines 13 - 15, The file fix_streaming.py currently
contains an incomplete for loop ("for i, line in{") causing a SyntaxError;
either finish the script or remove it. To fix, complete the iteration over the
source lines (replace the malformed "for i, line in{" with a proper iterable,
e.g., enumerate(lines) or similar), implement the logic that searches for the
"if(stream)" marker and sets/uses the stream_start variable, and ensure the loop
and file end with valid Python syntax (close blocks and handle cases where
"if(stream)" is not found). Alternatively, if this file is not needed, delete it
to avoid introducing a broken file into the repo.

Comment on lines +813 to +817
beforeEach(async () => {
(stickySessionMap as Map<any, any>).clear();
// Dynamic import to get the same module instance used by the running app
const { transientModelCooldowns: cooldowns } = await import('../../routes/proxy.js');
(cooldowns as Map<any, any>).clear();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for transientModelCooldowns export in proxy.ts
rg -n 'export.*transientModelCooldowns|transientModelCooldowns.*=' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 43


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Confirm whether the symbol exists anywhere (definition/usage)
rg -n "transientModelCooldowns" server/src/routes/proxy.ts server/src/routes || true

# 2) Broader search in repo for any export or re-export patterns
rg -n "transientModelCooldowns" server/src || true

# 3) Check for export-block patterns that might not match the previous regex
rg -n "export\s*\{[^}]*transientModelCooldowns|export\s+default\s+.*transientModelCooldowns|transientModelCooldowns\s*:" server/src/routes/proxy.ts || true

# 4) Print the most relevant sections around where it is defined/exported (if found)
#    If rg above finds nothing, this will help locate any similarly named cooldown maps.
if rg -n "transientModelCooldowns" server/src/routes/proxy.ts >/dev/null 2>&1; then
  python3 - <<'PY'
import re, subprocess, os, textwrap, sys, pathlib, json
path="server/src/routes/proxy.ts"
out=subprocess.check_output(["rg","-n","transientModelCooldowns",path]).decode()
lines=out.strip().splitlines()
# Collect line numbers
lns=[]
for l in lines:
    m=re.match(r"(\d+):", l)
    if m: lns.append(int(m.group(1)))
# Expand contexts
for ln in sorted(set(lns)):
    start=max(1, ln-20)
    end=ln+20
    subprocess.run(["sed","-n",f"{start},{end}p",path],check=False)
PY
fi

Repository: vi70x3/freellmapi

Length of output: 203


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "cooldown" server/src/routes/proxy.ts || true
rg -n "^export\s" server/src/routes/proxy.ts || true
rg -n "export\s*\{" server/src/routes/proxy.ts || true
rg -n "transient" server/src/routes/proxy.ts || true

Repository: vi70x3/freellmapi

Length of output: 684


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== occurrences of transientModelCooldowns in server/src =="
rg -n "transientModelCooldowns" server/src || true

echo "== cooldown-related identifiers in proxy.ts =="
rg -n "(Cooldowns|cooldowns|cooldownExclusions|cooldownWindow|cooldown.*Map|ModelCooldown)" server/src/routes/proxy.ts || true

echo "== export statements in proxy.ts =="
rg -n "^export (const|function|class|type)" server/src/routes/proxy.ts || true
rg -n "^export \{" server/src/routes/proxy.ts || true
rg -n "^export default" server/src/routes/proxy.ts || true

echo "== show around top-level export block (around line 1-260) =="
sed -n '1,260p' server/src/routes/proxy.ts

echo "== show around cooldown-related logic (around 1180-1320) =="
sed -n '1160,1320p' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 17075


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== list server/src/routes =="
ls -la server/src/routes || true

echo "== list any proxy.js/proxy.ts in server/src/routes =="
fd -a "proxy\.js$|proxy\.ts$" server/src/routes || true

echo "== search entire repo for transientModelCooldowns =="
rg -n "transientModelCooldowns" . || true

echo "== locate ratelimit service files =="
fd -a "ratelimit\.ts$|ratelimit\.js$" server/src/services || true

echo "== search ratelimit for transientModelCooldowns =="
rg -n "transientModelCooldowns" server/src/services || true

echo "== show ratelimit exports around suspected cooldown maps =="
# Print a small window around any exports from ratelimit
rg -n "^export" server/src/services/ratelimit.* 2>/dev/null || true

Repository: vi70x3/freellmapi

Length of output: 1619


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== proxy-tools.test.ts around the transientModelCooldowns import =="
sed -n '760,860p' server/src/__tests__/routes/proxy-tools.test.ts

echo "== ratelimit.ts contents (focused on cooldown implementation) =="
sed -n '1,220p' server/src/services/ratelimit.ts

echo "== find any exported cooldown Maps/objects (broader) =="
rg -n "cooldown|cooldowns|Cooldown|transient" server/src/services/ratelimit.ts server/src/routes/proxy.ts server/src/__tests__/routes/proxy-tools.test.ts || true

Repository: vi70x3/freellmapi

Length of output: 17048


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '150,220p' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 2784


Fix: transientModelCooldowns is not exported from server/src/routes/proxy.ts.

The test imports { transientModelCooldowns } and immediately calls (cooldowns as Map<any, any>).clear(), but server/src/routes/proxy.ts only exports the sticky/session helpers (stickySessionMap, setStickyModel, clearStickyModel, etc.)—there’s no transientModelCooldowns export—so cooldowns would be undefined and the test will fail at runtime. Update proxy.ts to export/reset the intended cooldown Map, or change the test to clear the actual cooldown state used by the router.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 813 - 817, The
test fails because transientModelCooldowns is not exported from the proxy
router; either export the actual cooldown Map from the router module (e.g., add
and export a Map named transientModelCooldowns alongside stickySessionMap and
the set/clear helpers in proxy.ts) so the test can import and call .clear(), or
add a dedicated exported helper (e.g., clearTransientModelCooldowns) in proxy.ts
and update the test to call that helper instead; ensure the exported symbol is
the same runtime instance the router uses so clearing it affects the router
state.

Comment on lines +135 to +149
protected throwWrappedError(body: unknown): void {
const obj = body as Record<string, unknown>;
const errPayload = obj.error;
const message = this.extractErrorMessage(body, 'Unknown wrapped error');
const error = new Error(
`${this.name} API error (wrapped in 200): ${message}`,
) as ProviderApiError;
error.status =
typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>)
? Number((errPayload as Record<string, unknown>).code)
: 200;
error.provider = this.name;
error.responseBody = body;
throw error;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

String-form wrapped error loses its message.

For the documented Format 2 ({"error": "Rate limit exceeded"}), extractErrorMessage(body, ...) returns the fallback: err.error is a string, so err.error?.message is undefined, falling through to 'Unknown wrapped error'. This violates FR-4 ("when the value is a string, the string itself must be used") and contradicts design Format 2. Handle string-valued error in extractErrorMessage:

🐛 Extract string-valued error messages
   protected extractErrorMessage(body: unknown, fallback: string): string {
     if (typeof body === 'string') return body || fallback;
     if (!body || typeof body !== 'object') return fallback;

-    const err = body as { error?: { message?: string }; errors?: Array<{ message?: string }>; message?: string };
+    const err = body as { error?: string | { message?: string }; errors?: Array<{ message?: string }>; message?: string };
+    if (typeof err.error === 'string') return err.error || fallback;
     return err.error?.message ?? err.errors?.[0]?.message ?? err.message ?? fallback;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/providers/base.ts` around lines 135 - 149, The wrapped-error path
in throwWrappedError uses extractErrorMessage which currently ignores
string-valued error payloads (Format 2), causing the fallback message to be
used; update extractErrorMessage to detect when body.error (or the extracted
err.error) is a string and return that string as the message (instead of
attempting to read .message), then leave throwWrappedError as-is so
ProviderApiError contains the actual string message for formats like {"error":
"Rate limit exceeded"}; reference the extractErrorMessage function and the
throwWrappedError method to locate and change the logic.

Comment on lines 130 to 138
try {
yield JSON.parse(data) as ChatCompletionChunk;
const parsed = JSON.parse(data) as ChatCompletionChunk;
if (this.isWrappedError(parsed)) {
this.throwWrappedError(parsed);
}
yield parsed;
} catch {
// Skip malformed chunks
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: wrapped stream errors are silently swallowed.

throwWrappedError() throws inside the try, and the catch {} (intended only for malformed JSON) catches and discards it. The ProviderApiError never propagates, so wrapped errors in streaming are skipped instead of aborting the stream (violates FR-6). Move the check outside the parse try/catch, mirroring google.ts.

🐛 Separate parsing from the wrapped-error check
-        try {
-          const parsed = JSON.parse(data) as ChatCompletionChunk;
-          if (this.isWrappedError(parsed)) {
-            this.throwWrappedError(parsed);
-          }
-          yield parsed;
-        } catch {
-          // Skip malformed chunks
-        }
+        let parsed: ChatCompletionChunk;
+        try {
+          parsed = JSON.parse(data) as ChatCompletionChunk;
+        } catch {
+          continue; // Skip malformed chunks
+        }
+        if (this.isWrappedError(parsed)) {
+          this.throwWrappedError(parsed);
+        }
+        yield parsed;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
yield JSON.parse(data) as ChatCompletionChunk;
const parsed = JSON.parse(data) as ChatCompletionChunk;
if (this.isWrappedError(parsed)) {
this.throwWrappedError(parsed);
}
yield parsed;
} catch {
// Skip malformed chunks
}
let parsed: ChatCompletionChunk;
try {
parsed = JSON.parse(data) as ChatCompletionChunk;
} catch {
continue; // Skip malformed chunks
}
if (this.isWrappedError(parsed)) {
this.throwWrappedError(parsed);
}
yield parsed;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/providers/openai-compat.ts` around lines 130 - 138, The current
try/catch around JSON.parse swallows exceptions thrown by throwWrappedError(),
so wrapped ProviderApiError never propagates; refactor the block in
openai-compat.ts that parses ChatCompletionChunk so JSON parsing and
wrapped-error handling are separated: first attempt JSON.parse(data) inside a
try that only catches/parses malformed JSON and yields/continues on parse
failure, but after a successful parse call this.isWrappedError(parsed) and, if
true, call this.throwWrappedError(parsed) outside the parse catch so the error
bubbles; reference the parsed variable, isWrappedError, and throwWrappedError to
locate the code to change.

import { Router } from 'express';
import type { Request, Response } from 'express';
import { z } from 'zod';
import { evaluateThreadProtection } from '../services/threadProtection.js';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Imported function is never used.

evaluateThreadProtection is imported but never called. According to the design spec (.roo/specs/generalized-thread-protection/design.md), this function should replace the 6+ hardcoded longcat branches in this file, including the thread protection block at lines 1207-1260. The current implementation still uses manual cooldown logic instead of calling this function.

🔧 Integration guidance

Based on the design doc, replace the manual thread protection block (lines 1207-1260) with:

const action = evaluateThreadProtection({
  platform: prefRow?.platform ?? '',
  kind: 'retryable',  // or appropriate kind based on context
  midStream: false,
  modelDbId: preferredModel,
});

if (action.banProvider && prefRow?.platform) {
  addProviderModelsToSkipModels(skipModels, prefRow.platform);
  console.log(`[ThreadProtection] ${action.reason}`);
}
if (action.skipModel) {
  skipModels.add(preferredModel);
}

This pattern should also replace the 6 other hardcoded longcat branches mentioned in the design doc.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` at line 5, The file imports
evaluateThreadProtection but still uses manual longcat/cooldown branches;
replace the thread protection block (and the other 6 hardcoded longcat branches)
by calling evaluateThreadProtection({ platform: prefRow?.platform ?? '', kind:
'retryable', midStream: false, modelDbId: preferredModel }) and then apply its
directives: if action.banProvider && prefRow?.platform call
addProviderModelsToSkipModels(skipModels, prefRow.platform) and log the reason,
and if action.skipModel do skipModels.add(preferredModel); update the block
around the existing skipModels/preferredModel/prefRow logic to use this pattern
and remove the manual cooldown/longcat branching.

Comment on lines +1207 to 1260
// ── Thread protection: dynamically exclude models actively used by other sessions ──
// If another sticky session has used a model within THREAD_COOLDOWN_MS, that model
// becomes a soft-exclusion candidate. Exhaustion protection ensures we never block
// all available models — preferring shared access over outright failure.
{
const currentSessionKey = getSessionKey(normalizedMessages, routingMode);
const activeCooldownModels = new Set<number>();
const threadNow = Date.now();

for (const [key, entry] of stickySessionMap) {
// Self-preservation: never exclude the current session's own pinned model
if (currentSessionKey && key === currentSessionKey) continue;

// Expired entries are irrelevant — the session has gone quiet
if (threadNow - entry.lastUsed > STICKY_TTL_MS) continue;

// Only consider entries within the cooldown window
if (threadNow - entry.lastUsed < THREAD_COOLDOWN_MS) {
activeCooldownModels.add(entry.modelDbId);
}
}

// Exhaustion protection: if cooldown would block ALL available models,
// clear the set and let the request through rather than failing outright.
if (activeCooldownModels.size > 0) {
const db = getDb();
const allEnabled = db.prepare('SELECT id FROM models WHERE enabled = 1').all() as Array<{ id: number }>;
const allEnabledIds = new Set(allEnabled.map(m => m.id));

// Remove the current preferred model from cooldown consideration
if (preferredModel !== undefined) {
activeCooldownModels.delete(preferredModel);
}

// Check if cooldown would exhaust all models
let wouldExhaustAll = true;
for (const id of allEnabledIds) {
if (!activeCooldownModels.has(id)) {
wouldExhaustAll = false;
break;
}
}

if (wouldExhaustAll) {
console.log(`[ThreadProtection] cooldown would exhaust all ${allEnabled.length} models — clearing cooldown exclusions`);
activeCooldownModels.clear();
} else {
for (const modelDbId of activeCooldownModels) {
skipModels.add(modelDbId);
}
console.log(`[ThreadProtection] excluding ${activeCooldownModels.size} model(s) from routing: [${[...activeCooldownModels].join(',')}]`);
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Thread protection implementation should use the imported scanner.

This manual cooldown logic should be replaced with the generalized evaluateThreadProtection() scanner as specified in the design doc. The current implementation hardcodes the cooldown behavior instead of using the configurable protection-level system.

Refer to the integration guidance in the earlier comment on line 5.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1207 - 1260, Replace the in-place
cooldown loop with a call to the shared scanner: remove the manual logic that
iterates stickySessionMap and computes activeCooldownModels (the entire Thread
protection block) and instead call evaluateThreadProtection(...) to populate
skipModels; pass in normalizedMessages and routingMode (for getSessionKey
behavior), the stickySessionMap and any TTL/COOLDOWN constants (STICKY_TTL_MS,
THREAD_COOLDOWN_MS), and the current preferredModel so the scanner can apply
exhaustion protection; ensure evaluateThreadProtection returns or mutates
skipModels (or adapt its return to be merged into skipModels) and remove the
console logs in favor of the scanner’s logging.

});
} else {
res.write(`data: ${JSON.stringify(payload)}\n\n`);
res.write('data: [DONE]\n\n');;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove double semicolon.

Line 1347 has a double semicolon at the end of the statement.

🔧 Proposed fix
-                res.write('data: [DONE]\n\n');;
+                res.write('data: [DONE]\n\n');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
res.write('data: [DONE]\n\n');;
res.write('data: [DONE]\n\n');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` at line 1347, Remove the stray extra semicolon
after the response write call: locate the res.write('data: [DONE]\n\n');;
statement (the res.write call) and delete the duplicate semicolon so it ends
with a single semicolon; ensure no other trailing punctuation was accidentally
removed.

Comment on lines +20 to +23
export function evaluateThreadProtection(_ctx: ErrorContext): ThreadProtectionAction {
// Placeholder implementation: no protection
return { banProvider: false, skipModel: false, clearStickyIfPinned: false, reason: 'off' };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Implement the thread protection scanner logic.

The evaluateThreadProtection function is currently a placeholder that returns 'off' for all inputs. According to the design doc (.roo/specs/generalized-thread-protection/design.md), this function should:

  1. Load protection configuration from THREAD_PROTECTION_PLATFORMS env var (or use default map)
  2. Lookup the platform's protection level
  3. Return the appropriate action based on the decision matrix in the design doc

Without this implementation, the thread protection refactoring in proxy.ts cannot proceed.

💡 Implementation guidance

Based on the design doc's decision matrix:

export function evaluateThreadProtection(ctx: ErrorContext): ThreadProtectionAction {
  const level = getProtectionLevel(ctx.platform);
  
  if (level === 'off') {
    return { banProvider: false, skipModel: false, clearStickyIfPinned: false, reason: 'off' };
  }
  
  if (level === 'provider-ban') {
    return {
      banProvider: true,
      skipModel: false,
      clearStickyIfPinned: true,
      reason: `${ctx.platform} ${ctx.kind} error — provider-ban level`
    };
  }
  
  // level === 'model-skip'
  return {
    banProvider: false,
    skipModel: true,
    clearStickyIfPinned: false,
    reason: `${ctx.platform} ${ctx.kind} error — model-skip level`
  };
}

function getProtectionLevel(platform: string): ProtectionLevel {
  // Parse THREAD_PROTECTION_PLATFORMS env var or use default map
  // Default: longcat → provider-ban, others → model-skip
}

Would you like me to generate the full implementation based on the design spec?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/threadProtection.ts` around lines 20 - 23, The
evaluateThreadProtection function is a placeholder; implement the scanner to
read THREAD_PROTECTION_PLATFORMS (or fall back to a default map) and determine
the ProtectionLevel for ctx.platform, then return actions per the decision
matrix: 'off' => {banProvider:false, skipModel:false, clearStickyIfPinned:false,
reason:'off'}, 'provider-ban' => {banProvider:true, skipModel:false,
clearStickyIfPinned:true, reason:`${ctx.platform} ${ctx.kind} error —
provider-ban level`}, and 'model-skip' => {banProvider:false, skipModel:true,
clearStickyIfPinned:false, reason:`${ctx.platform} ${ctx.kind} error —
model-skip level`}; add a helper getProtectionLevel(platform: string) to parse
the env var (e.g. JSON or comma map) and return the configured level or defaults
(default: longcat → provider-ban, others → model-skip).

@vi70x3 vi70x3 closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant