Skip to content

feat(router): add cache TTL gate, fix auto-refresh order, fix indentation, deduplicate RoutingMode type#12

Closed
vi70x3 wants to merge 10 commits into
mainfrom
spec/recency-biased-thompson-sampling
Closed

feat(router): add cache TTL gate, fix auto-refresh order, fix indentation, deduplicate RoutingMode type#12
vi70x3 wants to merge 10 commits into
mainfrom
spec/recency-biased-thompson-sampling

Conversation

@vi70x3

@vi70x3 vi70x3 commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Implements the recency-biased-thompson-sampling spec.

Phase 2 changes to server/src/services/router.ts:

  • Add Cache TTL gate in refreshStatsCache() to skip redundant refreshes within the TTL window
  • Fix auto-refresh order in getAnalyticsScores() so cache refresh happens before the early-return guard
  • Fix modelId indentation in return type annotation and result.push() object literal
  • Deduplicate RoutingMode type by importing/re-exporting from @freellmapi/shared/types.js

Summary by CodeRabbit

  • New Features

    • Added "auto-fast" routing mode and model pool grouping (Fast, Balanced, Smart) across UI, plus PoolBadge/PoolSection components and new model picker.
    • Streaming now includes keep-alive heartbeats and stall detection for more resilient live responses.
  • Improvements

    • Better handling of wrapped provider errors and recency-weighted analytics for smarter model ranking.
    • Global transient cooldowns and generalized thread protection to improve availability during provider failures.

vi70x3 added 7 commits June 2, 2026 15:12
- Add ANALYTICS_WINDOW_DAYS constant for SQL decay divisor
- Extend ModelStats with rawSuccesses/rawTotal fields
- Rewrite SQL query with CTE-based time-decay weighting (julianday)
- Update statsCache population to use weighted columns
- Add Math.max(0.1, ...) guards in thompsonSampleScore, smartSampleScore,
  getAnalyticsScore, and getSmartAnalyticsScore to handle fractional weights
- Update getAnalyticsScores dashboard to display rawTotal counts
- Add modelId field to getAnalyticsScores return type for fallback route compat
- Add 'fast' to RoutingMode type and ModelPool enum (Fast/Balanced/Smart)
- Implement classifyModel() to categorize models by speed/intelligence ranks
- Implement fastSampleScore() with 3x speed weight, 1.5x TTFB factor, minimal intelligence
- Add pool-based routing in routeRequest: fast pool first, balanced pool borrowing fallback
- Add freellmapi/auto-fast model ID support in proxy routes
- Extend /v1/models endpoint with primaryPool field for all models
- Add unit tests for pool classification (T13-T16) and fast routing (T17-T18)
- Add integration tests for /v1/models primaryPool fields (T13-T17)
- Update README.md and CLAUDE.md documentation
- extractErrorMessage: handle string error payloads and object errors with message field
- throwWrappedError: guard against non-numeric error codes (NaN prevention)
- cloudflare/cohere/openai-compat: move isWrappedError check outside JSON.parse try/catch in SSE loops
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a third routing mode ("fast") and model-pool classification, implements recency-weighted analytics and fast-mode scoring, disables sticky sessions in balanced mode, introduces transient per-model cooldowns and a thread-protection scaffold, rewrites SSE streaming to add heartbeat/stall detection, detects wrapped provider errors on HTTP 200 responses, adds frontend pool UI components, and expands test coverage and specs.

Changes

Router, Proxy, and Provider Core Implementation

Layer / File(s) Summary
Shared types: routing modes and model pools
shared/types.ts
Adds RoutingMode union and ModelPool enum to drive fast/balanced/smart routing and pool classification.
Router weighted analytics, pool classification, and fast-mode scoring
server/src/services/router.ts, server/src/__tests__/services/router.test.ts
Implements recency-weighted stats via SQL CTE, clamps Beta params with Math.max(0.1, ...), adds classifyModel, fastSampleScore, hasValidKeys, and tests for analytics, classification, and fast-mode routing.
Model catalog and fallback endpoints expose pool metadata
server/src/routes/proxy.ts, server/src/routes/fallback.ts, server/src/__tests__/integration/full-flow.test.ts
/models and fallback routes compute min/max ranks, classify models into pools, expose primaryPool, and include freellmapi/auto-fast virtual model.
Disable sticky sessions for balanced routing mode
server/src/routes/proxy.ts, server/src/__tests__/routes/provider-session-ban.test.ts
getSessionKey() returns '' for routingMode === 'balanced', causing sticky-related helpers to be no-ops; tests added/updated to assert balanced-mode skipping and preserve smart-mode sticky behavior.
Transient model cooldowns for concurrent failure mitigation
server/src/routes/proxy.ts, server/src/__tests__/routes/transient-cooldown.test.ts
Adds transientModelCooldowns Map and TRANSIENT_COOLDOWN_MS, injects active cooldowned modelDbIds into skipModels, clears preferred sticky model when cooled, and registers cooldowns on qualifying 5xx/connection failures; dedicated tests validate behavior.
Thread protection types and evaluator stub
server/src/services/threadProtection.ts
Adds ProtectionLevel, ErrorContext, ThreadProtectionAction, and a stubbed evaluateThreadProtection returning an "off" action for future implementation.
SSE heartbeat and stall detection for streaming responses
server/src/routes/proxy.ts, server/src/__tests__/routes/stream-heartbeat-stall.test.ts
Refactors streaming: sends SSE keep-alive comments, tracks lastChunkTimestamp, aborts stalled upstream via AbortController, emits pre-stream 504 or mid-stream error frames, and ensures cleanup on client disconnect; tests cover keep-alive, pre-/mid-stream stall, disconnect cleanup, and normal streaming.
Provider-layer wrapped error detection across adapters
server/src/providers/base.ts, server/src/providers/{openai-compat,cohere,cloudflare,google}.ts
BaseProvider gains isWrappedError() and throwWrappedError() and exposes extractErrorMessage() as protected; each adapter checks parsed JSON (and streamed chunks) for wrapped errors and throws ProviderApiError with inferred status when found.
Frontend pool badge and section components
client/src/components/pool-badge.tsx, client/src/components/pool-section.tsx
Adds PoolBadge and PoolSection React components and PoolType for pool-labeled UI blocks used by the playground and fallback pages.
Playground custom model picker grouped by pool
client/src/pages/PlaygroundPage.tsx
Replaces Select with a popover model picker that groups models by pool and adds picker open/close behavior and Escape handling.
Streaming block transformation helpers
do_fix.py, fix.py, fix_streaming.py, new_streaming_block.txt
Adds Python scripts and a streaming template to support the SSE streaming refactor and automated transformation/workflow.

Specifications and Tasks

Layer / File(s) Summary
Disable sticky on balanced mode specification
.roo/specs/disable-sticky-on-auto/{design,requirements,tasks}.md
Design, requirements, and tasks documenting the getSessionKey() gate and test updates to skip sticky ops in balanced mode.
Generalized thread protection scanner specification
.roo/specs/generalized-thread-protection/{design,requirements,tasks}.md
Spec for a provider-agnostic thread protection scanner and integration points replacing LongCat-specific branches.
Model pools display specification
.roo/specs/model-pools-display/{design,requirements,tasks}.md
Frontend and backend requirements and tasks for pool classification, badges, sections, and accessibility.
Owl Alpha + LongCat model-level routing specification
.roo/specs/owl-alpha-longcat-model-routing/{design,requirements,tasks}.md
Spec covering balanced exclusions, smart-mode preferences, sticky cooldowns for Owl Alpha and LongCat, and model-level skipping.
Recency-biased Thompson sampling specification
.roo/specs/recency-biased-thompson-sampling/{design,requirements,tasks}.md
Spec for time-decay weighting in analytics, Beta-parameter guards, and dashboard changes.
SSE heartbeat and stall protection specification
.roo/specs/sse-stream-heartbeat-stall-protection/{design,requirements,tasks}.md
Spec for SSE keep-alives, stall detection, cleanup ordering, and tests.
Transient model cooldown specification
.roo/specs/transient-model-cooldown/{design,requirements,tasks}.md
Spec for global transient cooldown Map, pruning, sticky override, registration rules, and acceptance criteria.
Wrapped provider error interception specification
.roo/specs/wrapped-error-interception/{design,requirements,tasks}.md
Spec for detecting root-level error in JSON payloads on HTTP 200 and failing fast in chat/stream flows.

Public Documentation

Layer / File(s) Summary
Documentation of routing modes and pool classification
CLAUDE.md, README.md
Documents new freellmapi/auto-fast routing mode and model pool classification behavior exposed via /v1/models and the UI.

Estimated code review effort:
🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • vi70x3/freellmapi#9: Overlaps on disabling sticky-session behavior for balanced mode via getSessionKey() guard and related test adjustments.
  • vi70x3/freellmapi#8: Related to sticky-session and LongCat sticky-cooldown logic that this PR generalizes and gates by routing mode.
  • vi70x3/freellmapi#2: Also modifies sticky-session machinery; changes here affect how that machinery behaves when balanced mode disables sticky ops.

"I'm a rabbit in a routing maze,
Fast pools dash in bright, bold haze.
Balanced slips free — no sticky glue,
Heartbeats hum, stalled streams we rue.
Wrapped errors unwrapped, cooldowns keep watch — hooray!" 🐇

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec/recency-biased-thompson-sampling
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch spec/recency-biased-thompson-sampling

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several major routing and stability enhancements, including a new fast routing mode (freellmapi/auto-fast) with model pool classification, dynamic thread protection to prevent concurrent session conflicts, SSE stream heartbeats with stall protection, and a shared transient model cooldown mechanism. It also excludes LongCat and Owl Alpha from balanced routing while adding smart preference for Owl Alpha. The review feedback is highly constructive, pointing out a critical 'Headers already sent' crash during pre-stream stalls, an unused import, a database query performance bottleneck under thread protection, a double semicolon syntax error, and minor indentation inconsistencies.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 1543 to 1545
} catch (streamErr: any) {
cleanupStream();
if (streamStarted) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If a pre-stream stall is detected, the stall handler sends a 504 response and ends the connection. However, if the generator subsequently throws an error, it will be caught by this catch block. Since streamStarted is false, the error will be rethrown and propagate to the outer retry loop, which will attempt to fallback to another provider and write to the already-ended response, causing a 'Headers already sent' crash. Checking streamAborted here and returning early prevents this crash.

        } catch (streamErr: any) {\n          cleanupStream();\n          if (streamAborted) {\n            logRequest(route.platform, route.modelId, 'error', estimatedInputTokens, totalOutputTokens, Date.now() - start, ttfbMs, 'stream_stalled');\n            return;\n          }\n          if (streamStarted) {

Comment on lines 4 to 6
import { z } from 'zod';
import { evaluateThreadProtection } from '../services/threadProtection.js';
import type { ChatCompletionChunk, ChatCompletionResponse, ChatMessage, ChatToolCall, ChatToolDefinition } from '@freellmapi/shared/types.js';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import of evaluateThreadProtection is unused in this file. The design spec for the 'Generalized Thread Protection Scanner' seems to have been implemented inline or left as a placeholder in threadProtection.ts. Consider removing this unused import to keep the codebase clean.

import { z } from 'zod';\nimport type { ChatCompletionChunk, ChatCompletionResponse, ChatMessage, ChatToolCall, ChatToolDefinition } from '@freellmapi/shared/types.js';

// clear the set and let the request through rather than failing outright.
if (activeCooldownModels.size > 0) {
const db = getDb();
const allEnabled = db.prepare('SELECT id FROM models WHERE enabled = 1').all() as Array<{ id: number }>;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Querying the database for all enabled models on every request with active cooldowns can become a performance bottleneck under high concurrency. Consider caching the list of enabled model IDs or retrieving them from an in-memory cache/registry, as this list changes infrequently.

Comment thread server/src/routes/proxy.ts Outdated
});
} else {
res.write(`data: ${JSON.stringify(payload)}\n\n`);
res.write('data: [DONE]\n\n');;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a double semicolon at the end of this line. Remove the extra semicolon to adhere to clean coding standards.

Suggested change
res.write('data: [DONE]\n\n');;
res.write('data: [DONE]\\n\\n');

Comment on lines 1556 to +1559
console.warn(`[Proxy] Mid-stream 5xx from ${route.platform} — skipping model ${route.modelId} only`);
skipModels.add(route.modelDbId);
}
}

// Generalized truncation detection for any provider (not just LongCat)
// Register global transient cooldown for any 5xx mid-stream error

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indentation of these lines is inconsistent with the surrounding block. Adjust the indentation to match the 14-space statement indentation and 12-space closing brace indentation of the parent block.

Suggested change
console.warn(`[Proxy] Mid-stream 5xx from ${route.platform} — skipping model ${route.modelId} only`);
skipModels.add(route.modelDbId);
}
}
// Generalized truncation detection for any provider (not just LongCat)
// Register global transient cooldown for any 5xx mid-stream error
// Register global transient cooldown for any 5xx mid-stream error\n transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);\n console.log(\`[TransientCooldown] registered global cooldown for modelDbId=\${route.modelDbId} (\${TRANSIENT_COOLDOWN_MS / 1000}s)\`);\n }

Comment on lines +1727 to +1730
}
}

if (isRetryableError(err)) {
if (isRetryableError(err)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indentation of these lines is off by one space. Adjust the indentation to align with the parent block.

        }\n      }\n\n      if (isRetryableError(err)) {

@kilo-code-bot

kilo-code-bot Bot commented Jun 2, 2026

Copy link
Copy Markdown

Code Review Summary

Status: 2 Issues Found (existing) | Recommendation: Address before merge

Overview

Severity Count
WARNING 2
Issue Details (click to expand)
File Line Issue
server/src/routes/proxy.ts 1397 Double semicolon — redundant trailing ;
server/src/routes/proxy.ts 1730 Wrong indentation — if (isRetryableError(err)) is indented at 6 spaces but should be at 3 spaces, placing it inside the wrong block scope
Other Observations (not in diff)
  1. server/src/services/threadProtection.ts is a stubevaluateThreadProtection() always returns { banProvider: false, skipModel: false, clearStickyIfPinned: false, reason: 'off' }. Meanwhile proxy.ts lines ~1254–1310 contain a fully implemented thread-protection block that bypasses sticky-model preferences when another session is actively using a model. That logic in proxy.ts will run but the extracted service is a no-op placeholder.

  2. Fast mode bypass affects non-fast routing paths — The transient cooldown maps (transientModelCooldowns) and sticky-preference clearing added in proxy.ts run unconditionally (current session key is computed from routingMode, but the cooldown injection into skipModels at lines ~1224–1244 happens before mode checks, and the global transient cooldown at lines 1560–1561, 1733–1734 fires regardless of routing mode). In balanced/smart mode this means a model can be silently excluded from routing by another session's transient cooldown, which does not appear to be the intended scope.

  3. server/write_test.py and server/write_tests.py are scratch/staging files — These Python scripts appear to be local tooling used to write test files and are committed to the PR. They do not belong in the repository.

Files Reviewed (3 files)
  • server/src/routes/proxy.ts — 2 existing issues (indentation, semicolons)
  • server/src/services/router.ts — no new issues
  • server/src/services/threadProtection.ts — stub implementation observation

Fix these issues in Kilo Cloud


Reviewed by nemotron-3-super-120b-a12b-20230311:free · 924,387 tokens

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/routes/proxy.ts (1)

1479-1484: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

LongCat is still being banned at provider scope in the new error paths.

These branches still ban the whole longcat platform instead of skipping only route.modelDbId, so one failing LongCat request will also block future LongCat models for the session. That breaks the model-level contract described in this PR.

Also applies to: 1551-1554, 1579-1583, 1610-1613, 1711-1714, 1735-1738

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1479 - 1484, The code currently
calls banPlatformFromSession(...) and addProviderModelsToSkipModels(...) for
LongCat branches, which bans the entire platform; instead, remove those two
calls and add the failing modelDbId to the session-level skip list so only
route.modelDbId is skipped. Locate the LongCat branches (the block checking
route.platform === 'longcat') and replace
banPlatformFromSession(normalizedMessages, routingMode, 'longcat',
route.modelDbId) and addProviderModelsToSkipModels(skipModels, 'longcat') with
logic that records route.modelDbId under the LongCat provider in skipModels
(e.g., push or set skipModels['longcat'] to include route.modelDbId or call an
existing addModel-level helper), ensuring you use the existing skipModels
variable and route.modelDbId so only that model is skipped for the session.
🧹 Nitpick comments (4)
.roo/specs/disable-sticky-on-auto/design.md (1)

11-17: ⚡ Quick win

Add language identifier to fenced code block.

The fenced code block should specify a language for proper syntax highlighting.

📝 Proposed fix
-```
+```typescript
 function stickyOp(messages, routingMode, ...) {
   const key = getSessionKey(messages, routingMode);
   if (!key) return <no-op value>;   // undefined, false, or early return
   ...operate on stickySessionMap using key...
 }

Based on learnings from static analysis: markdownlint-cli2 flagged this as missing a language specifier.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/disable-sticky-on-auto/design.md around lines 11 - 17, The fenced
code block containing the example for function stickyOp is missing a language
identifier; update the block to include a language (e.g., "typescript" or "js")
so syntax highlighters handle getSessionKey, stickyOp, and stickySessionMap
correctly—locate the block showing function stickyOp(messages, routingMode, ...)
and add the language specifier immediately after the opening backticks (for
example ```typescript).
.roo/specs/generalized-thread-protection/design.md (1)

43-45: ⚡ Quick win

Add language identifier to fenced code block.

The environment variable example block should specify a language (e.g., bash or shell) for proper syntax highlighting and markdown compliance.

📝 Proposed fix
-```
+```bash
 THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 45,
Update the fenced code block that shows the environment variable
THREAD_PROTECTION_PLATFORMS so it includes a language identifier (e.g., use
bash or shell) for proper markdown highlighting; locate the block
containing THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
and change the opening fence to include the language token.


</details>

</blockquote></details>
<details>
<summary>fix.py (1)</summary><blockquote>

`4-8`: _⚡ Quick win_

**Remove or complete this helper script.**

Right now it only reads `server/src/routes/proxy.ts` and stops, so running it does nothing and leaves a misleading artifact in the repo.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fix.py` around lines 4 - 8, This helper script is incomplete and should be
removed or finished; either delete fix.py entirely, or implement its intended
behavior: open('server/src/routes/proxy.ts', 'r') should read into lines, locate
the streaming block start by searching for the string "for await (const chunk"
and then find the matching block end (matching braces) and then perform the
intended action (e.g., print the start/end indices, extract the block, or modify
and write back) before closing—use the existing lines variable and proper file
write handling to persist changes if any.
```

</details>

</blockquote></details>
<details>
<summary>client/src/components/pool-badge.tsx (1)</summary><blockquote>

`7-7`: _⚡ Quick win_

**Inconsistent color naming in pool configurations.**

Each pool configuration mixes different color families in text and background:
- Fast: `bg-emerald-100` with `text-green-800` (emerald vs green)
- Balanced: `bg-slate-100` with `text-gray-700` (slate vs gray)  
- Smart: `bg-purple-100` with `text-indigo-700` (purple vs indigo)

While these may be intentional design choices, using consistent color families improves predictability and maintainability.




<details>
<summary>🎨 Proposed fix for color consistency</summary>

```diff
   fast: {
     icon: "⚡",
     label: "Fast",
-    className: "bg-emerald-100 text-green-800 dark:bg-emerald-900/30 dark:text-emerald-400",
+    className: "bg-emerald-100 text-emerald-800 dark:bg-emerald-900/30 dark:text-emerald-400",
   },
   balanced: {
     icon: "⚖️",
     label: "Balanced",
-    className: "bg-slate-100 text-gray-700 dark:bg-slate-800 dark:text-slate-300",
+    className: "bg-slate-100 text-slate-700 dark:bg-slate-800 dark:text-slate-300",
   },
   smart: {
     icon: "🧠",
     label: "Smart",
-    className: "bg-purple-100 text-indigo-700 dark:bg-purple-900/30 dark:text-purple-400",
+    className: "bg-purple-100 text-purple-700 dark:bg-purple-900/30 dark:text-purple-400",
   },
```
</details>


Also applies to: 12-12, 17-17

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/src/components/pool-badge.tsx` at line 7, The pool badge color classes
mix different Tailwind color families; update the className entries in the
PoolBadge/pool configuration for Fast, Balanced, and Smart so the text color
uses the same family as the background (e.g., bg-emerald-100 -> text-emerald-800
for Fast, bg-slate-100 -> text-slate-700 for Balanced, bg-purple-100 ->
text-purple-700 for Smart). Locate the className fields in
client/src/components/pool-badge.tsx (the PoolBadge or pool config entries
referenced on the three lines) and replace the mismatched text color tokens to
match each background family for consistency.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/generalized-thread-protection/requirements.md:

  • Line 5: The requirement line is truncated and contains a stray '{' so rewrite
    it into a single complete sentence that clearly states the problem and expected
    outcome: describe that the proxy route handler (the exported router/handler in
    proxy.ts) currently contains 6+ hardcoded branches that special-case the
    "longcat" route/host/prefix and specify what should be done instead (e.g.,
    remove hardcoded branches and replace with a configurable/generalized routing or
    policy mechanism), ensuring no stray punctuation and including the scope (which
    branches/behavior to change) and the acceptance criteria for the fix.

In @.roo/specs/model-pools-display/design.md:

  • Around line 109-132: The fenced ASCII diagrams (the triple-backtick blocks
    that start before the "Bandit routing" diagram and the similar blocks for
    "Balanced (5 models)" and "Smart (2 models)") must include a markdown language
    identifier; update each opening fence from totext (or ```plain) so
    linters/renderers recognize them as plain text and the diagrams render
    consistently.

In @.roo/specs/model-pools-display/requirements.md:

  • Line 90: The document ends with a truncated heading "## Out of" — change that
    heading to "## Out of Scope" and populate it with explicit exclusions for this
    feature (e.g., what model-pool UI interactions, permissions, API endpoints, or
    integrations are not covered), ensuring the section clarifies boundaries for
    implementation and QA; update any table of contents or cross-references if
    present to reflect the completed heading.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md:

  • Around line 5-46: The markdown diagrams use triple-fenced blocks without a
    language which triggers MD040; update each ASCII diagram/code block (the ones
    showing Balanced Router / Smart Router, the routeRequest() flow, and the
    handleChatCompletion() flow — e.g., the block containing "Balanced Router",
    "Smart Router", "routeRequest() in router.ts", and "handleChatCompletion() in
    proxy.ts") to include a fence language of text (use ```text) so markdownlint no
    longer flags them; apply the same change to the other listed blocks (the ones
    noted around lines 52-78, 82-92, 96-105).

In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md:

  • Around line 122-126: The Markdown links in the requirements list are using
    ../server/... which resolves relative to
    .roo/specs/owl-alpha-longcat-model-routing/ instead of the repo root; update
    each link target to go up three levels to reference the true server files (e.g.,
    change ../server/src/services/router.ts to
    ../../../server/src/services/router.ts and similarly update
    ../server/src/routes/proxy.ts and ../server/src/db/index.ts references used for
    migrateModelsV15/migrateModelsV16 and the LongCat smart/sticky/ban logic).

In @.roo/specs/recency-biased-thompson-sampling/design.md:

  • Line 5: The relative source links in
    .roo/specs/recency-biased-thompson-sampling/design.md point to server files
    using paths like server/src/... which resolve under the spec folder; update each
    such link to use the repository-root relative prefix "../../../server/..." so
    they open correctly from the spec location (apply the same fix to the other
    occurrences noted around lines 31, 85, 115, 132, 138, and 229-230); search for
    markdown links referencing "server/src/..." in this file and replace them with
    "../../../server/src/..." to correct all references.

In @.roo/specs/wrapped-error-interception/tasks.md:

  • Around line 28-31: In OpenAICompatProvider.streamChatCompletion(), constrain
    the wrapped-error check to only the first parsed SSE payload: after parsing each
    chunk (assign JSON.parse(data) to a variable), check this.isWrappedError(parsed)
    and call this.throwWrappedError(parsed) only if no previous payload has been
    yielded yet (e.g., guard with a local flag like sawFirstPayload or check a
    stream-start state) so later parsed chunks do not abort an already-started
    downstream stream; then proceed to yield the parsed payload and set the flag.

In @client/src/components/pool-section.tsx:

  • Around line 5-30: The PoolSection component currently renders a static
    container but needs collapse/expand behavior; update PoolSection to maintain
    internal open state (e.g., useState) or accept a prop like defaultOpen, render a
    clickable header (wrap the PoolBadge + title in a button/div with onClick) that
    toggles that state, show/hide the children based on the state, add an accessible
    aria-expanded attribute on the header and visually indicate state with a
    chevron/icon, and preserve the existing className and styling when collapsed or
    expanded; reference the PoolSection component and PoolBadge for where to attach
    the toggle and children visibility logic.

In @do_fix.py:

  • Around line 1-7: The file do_fix.py contains an incomplete helper script
    (orphaned '{' at the top and a partial read into the variable content) and
    should be either removed or completed; to fix, either delete do_fix.py from the
    commit if it was temporary, or complete the script by removing the stray '{',
    add necessary imports and a proper main() implementation that opens
    server/src/routes/proxy.ts, reads content, performs the replacement of the
    streaming block with the Promise.race-based stall-detection snippet, writes the
    modified content back safely (use a temp file and atomic replace), handle
    exceptions and return non-zero on failure, and ensure the shebang and file
    permissions are correct so do_fix.py is a runnable, syntactically valid helper;
    locate the incomplete references to content and the
    open('server/src/routes/proxy.ts', 'r') call to implement or remove them.

In @fix_streaming.py:

  • Around line 12-21: The string assigned to the variable new_streaming in
    fix_streaming.py is left unterminated (starts with r''' but no closing '''),
    causing a SyntaxError; fix it by adding the missing closing triple-quote to
    terminate the raw triple-quoted string (or remove the stray snippet entirely) so
    new_streaming is a valid string literal and the module can be imported/executed.

In @new_streaming_block.txt:

  • Around line 21-28: The snippet for stallTimeout was truncated and left
    unclosed; restore the full implementation of stallTimeout so it returns a
    Promise that sets a timer via setTimeout to reject with an
    Error('Upstream stream stalled') augmented with { status: 504, type:
    'stream_timeout' after MAX_STREAM_STALL_MS, and close all opened
    braces/parentheses; ensure the timer variable is declared and the promise body
    is properly terminated so stallTimeout and its inner setTimeout callback are
    syntactically complete (refer to stallTimeout and MAX_STREAM_STALL_MS to locate
    where to fix).

In @server/src/__tests__/routes/stream-heartbeat-stall.test.ts:

  • Around line 74-119: The test asserts a keep-alive comment during the 300ms
    gap, but the proxy implementation (streamStarted flag in
    server/src/routes/proxy.ts) only emits ": keep-alive\n\n" once streamStarted ===
    true; update the test in
    server/src/tests/routes/stream-heartbeat-stall.test.ts to stop expecting a
    keep-alive before the first chunk (remove or change the expect(raw).toContain(':
    keep-alive') assertion) or adjust it to assert keep-alives only after the first
    chunk/after streamStarted is true so the test matches the current behavior of
    streamStarted-controlled heartbeats.
  • Around line 12-18: The test helper currently injects only the unified API key
    via getUnifiedApiKey() when path.startsWith('/v1/'), which incorrectly allows
    unauthenticated access to /api/* routes; update the helper (the fetch wrapper
    used in these tests) to accept an explicit admin key or auth override and use
    getAdminApiKey() when the request path startsWith('/api/'), while continuing to
    use getUnifiedApiKey() for '/v1/'; alternatively add a parameter to the helper
    to pass custom headers so tests can seed the DB or set Authorization explicitly
    instead of implicitly baking unauthenticated /api/* access.

In @server/src/providers/openai-compat.ts:

  • Around line 131-143: Only check for and throw wrapped SSE errors before any
    chunk has been yielded: add a boolean flag (e.g., hasYielded) that is false
    until the first successful yield of parsed, and only call
    isWrappedError/throwWrappedError when hasYielded is false; if you detect a
    wrapped error before the first yield, cancel the underlying reader (call
    reader.cancel()) before calling throwWrappedError(parsed) so the upstream stream
    is aborted and the proxy can retry/fallback cleanly. Update the code around
    parsed, isWrappedError, throwWrappedError, and the reader to implement this
    gating and cancellation.

In @server/src/routes/proxy.ts:

  • Around line 1422-1425: The close listener is being added inside the retry loop
    which causes listener leaks; register it once outside the retry loop (or use
    req.once('close', ...) or remove any existing listener before adding) so
    cleanupStream is not re-registered on each attempt—move the req.on('close', ...)
    call (or switch to req.once or removeListener) so it executes a single time for
    the request and still calls cleanupStream on client disconnect.
  • Around line 1279-1307: The exhaustion check currently compares
    activeCooldownModels against all enabled models (using allEnabledIds) which is
    incorrect because routing mode and existing skipModels further restrict the
    actual routable candidates; change the check to compute the same candidate set
    that routeRequest() uses (apply enabled filter, current routing mode filters,
    existing skipModels and remove preferredModel) and then test whether
    activeCooldownModels would exhaust that specific candidate set; use the same
    variables/functions involved (activeCooldownModels, preferredModel, skipModels,
    routeRequest or the helper that builds candidate IDs) so the logic mirrors
    routing and avoids false negatives that lead to 429s.

In @server/src/services/router.ts:

  • Around line 345-370: The classifyModel function uses normalized thresholds
    which yield incorrect top-40% cutoffs for small catalogs; change it to compute
    pool membership from ordinal positions and total count instead: compute
    totalCount = maxSpeedRank - minSpeedRank + 1 (and similarly for intelligence),
    compute ordinalPosition = speedRank - minSpeedRank + 1 (and for intelligence),
    compute cutoff = Math.max(1, Math.ceil(totalCount * 0.4)), then classify as Fast
    if ordinalPosition <= cutoff, Smart if intelligence ordinal <= cutoff, else
    Balanced; preserve the early-return when there is no variation and handle edge
    cases (totalCount <= 0) in classifyModel and reference the ModelPool enum.
  • Around line 407-408: Call refreshStatsCache(getDb(), true) unconditionally
    before returning cached stats so the TTL-gated refresh runs even when statsCache
    is already set; specifically, invoke refreshStatsCache(getDb(), true)
    unconditionally (rather than only when !statsCache) and then keep the existing
    guard if (!statsCache) return []; this ensures statsCache and the TTL logic are
    honored and prevents serving stale scores while preserving the cheap TTL check
    inside refreshStatsCache.

In @server/write_test.py:

  • Around line 4-29: The file defines a JavaScript test payload in Python but
    currently contains an unterminated string and a machine-specific absolute path;
    fix the top-level variables 'path' and 'content' so the Python file parses:
    replace the hard-coded '/home/vi/freellmapi/...' assigned to the variable path
    with a repo-relative computed path (use file / pathlib to resolve the
    project root and join the relative test path), and ensure the 'content' value is
    a properly closed Python list or a single properly quoted multi-line string so
    there are no unterminated string literals (verify brackets/quotes around the
    array are balanced and the last line completes the JS snippet such as the
    truncated "expect(() => route..." line). Ensure no other syntax errors remain so
    the script runs.

In @server/write_tests.py:

  • Around line 4-45: The generator file server/write_tests.py currently contains
    an unterminated triple-quoted string in the part1 variable (the test text stops
    mid-token like "const groqKey{"), causing a SyntaxError; close or properly
    terminate part1 (complete the test contents or split into smaller, properly
    quoted strings and concatenate) and ensure all embedded JS quotes/braces are
    correctly escaped if needed; also replace the hard-coded absolute output path
    stored in path with a repo-relative construction (use
    pathlib.Path(file).parent / 'src' / 'tests' / 'services' /
    'router.test.ts' or os.path.join with repo root resolution) so the generator
    writes to a portable location.

Outside diff comments:
In @server/src/routes/proxy.ts:

  • Around line 1479-1484: The code currently calls banPlatformFromSession(...)
    and addProviderModelsToSkipModels(...) for LongCat branches, which bans the
    entire platform; instead, remove those two calls and add the failing modelDbId
    to the session-level skip list so only route.modelDbId is skipped. Locate the
    LongCat branches (the block checking route.platform === 'longcat') and replace
    banPlatformFromSession(normalizedMessages, routingMode, 'longcat',
    route.modelDbId) and addProviderModelsToSkipModels(skipModels, 'longcat') with
    logic that records route.modelDbId under the LongCat provider in skipModels
    (e.g., push or set skipModels['longcat'] to include route.modelDbId or call an
    existing addModel-level helper), ensuring you use the existing skipModels
    variable and route.modelDbId so only that model is skipped for the session.

Nitpick comments:
In @.roo/specs/disable-sticky-on-auto/design.md:

  • Around line 11-17: The fenced code block containing the example for function
    stickyOp is missing a language identifier; update the block to include a
    language (e.g., "typescript" or "js") so syntax highlighters handle
    getSessionKey, stickyOp, and stickySessionMap correctly—locate the block showing
    function stickyOp(messages, routingMode, ...) and add the language specifier
    immediately after the opening backticks (for example ```typescript).

In @.roo/specs/generalized-thread-protection/design.md:

  • Around line 43-45: Update the fenced code block that shows the environment
    variable THREAD_PROTECTION_PLATFORMS so it includes a language identifier (e.g.,
    use bash or shell) for proper markdown highlighting; locate the block
    containing THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
    and change the opening fence to include the language token.

In @client/src/components/pool-badge.tsx:

  • Line 7: The pool badge color classes mix different Tailwind color families;
    update the className entries in the PoolBadge/pool configuration for Fast,
    Balanced, and Smart so the text color uses the same family as the background
    (e.g., bg-emerald-100 -> text-emerald-800 for Fast, bg-slate-100 ->
    text-slate-700 for Balanced, bg-purple-100 -> text-purple-700 for Smart). Locate
    the className fields in client/src/components/pool-badge.tsx (the PoolBadge or
    pool config entries referenced on the three lines) and replace the mismatched
    text color tokens to match each background family for consistency.

In @fix.py:

  • Around line 4-8: This helper script is incomplete and should be removed or
    finished; either delete fix.py entirely, or implement its intended behavior:
    open('server/src/routes/proxy.ts', 'r') should read into lines, locate the
    streaming block start by searching for the string "for await (const chunk" and
    then find the matching block end (matching braces) and then perform the intended
    action (e.g., print the start/end indices, extract the block, or modify and
    write back) before closing—use the existing lines variable and proper file write
    handling to persist changes if any.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro Plus

**Run ID**: `ebc2d52b-36c4-4d90-9c6a-9b952a33334b`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 233e03142895cd8c18d441461d5c9568186c7123 and 40abf0b8c67d25c3d5526a9345e88d01db0cca9e.

</details>

<details>
<summary>📒 Files selected for processing (51)</summary>

* `.roo/specs/disable-sticky-on-auto/design.md`
* `.roo/specs/disable-sticky-on-auto/requirements.md`
* `.roo/specs/disable-sticky-on-auto/tasks.md`
* `.roo/specs/generalized-thread-protection/design.md`
* `.roo/specs/generalized-thread-protection/requirements.md`
* `.roo/specs/generalized-thread-protection/tasks.md`
* `.roo/specs/model-pools-display/design.md`
* `.roo/specs/model-pools-display/requirements.md`
* `.roo/specs/model-pools-display/tasks.md`
* `.roo/specs/owl-alpha-longcat-model-routing/design.md`
* `.roo/specs/owl-alpha-longcat-model-routing/requirements.md`
* `.roo/specs/owl-alpha-longcat-model-routing/tasks.md`
* `.roo/specs/recency-biased-thompson-sampling/design.md`
* `.roo/specs/recency-biased-thompson-sampling/requirements.md`
* `.roo/specs/recency-biased-thompson-sampling/tasks.md`
* `.roo/specs/sse-stream-heartbeat-stall-protection/design.md`
* `.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md`
* `.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md`
* `.roo/specs/transient-model-cooldown/design.md`
* `.roo/specs/transient-model-cooldown/requirements.md`
* `.roo/specs/transient-model-cooldown/tasks.md`
* `.roo/specs/wrapped-error-interception/design.md`
* `.roo/specs/wrapped-error-interception/requirements.md`
* `.roo/specs/wrapped-error-interception/tasks.md`
* `CLAUDE.md`
* `README.md`
* `client/src/components/pool-badge.tsx`
* `client/src/components/pool-section.tsx`
* `do_fix.py`
* `fix.py`
* `fix_streaming.py`
* `fix{`
* `new_streaming_block.txt`
* `server/src/__tests__/integration/full-flow.test.ts`
* `server/src/__tests__/routes/provider-session-ban.test.ts`
* `server/src/__tests__/routes/proxy-tools.test.ts`
* `server/src/__tests__/routes/stream-heartbeat-stall.test.ts`
* `server/src/__tests__/routes/transient-cooldown.test.ts`
* `server/src/__tests__/services/router.test.ts`
* `server/src/providers/base.ts`
* `server/src/providers/cloudflare.ts`
* `server/src/providers/cohere.ts`
* `server/src/providers/google.ts`
* `server/src/providers/openai-compat.ts`
* `server/src/routes/fallback.ts`
* `server/src/routes/proxy.ts`
* `server/src/services/router.ts`
* `server/src/services/threadProtection.ts`
* `server/write_test.py`
* `server/write_tests.py`
* `shared/types.ts`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->


## Problem Statement

The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Finish the truncated problem statement.

This sentence ends mid-thought and includes a stray {, so the requirement is not readable enough to implement from.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/requirements.md at line 5, The
requirement line is truncated and contains a stray '{' so rewrite it into a
single complete sentence that clearly states the problem and expected outcome:
describe that the proxy route handler (the exported router/handler in proxy.ts)
currently contains 6+ hardcoded branches that special-case the "longcat"
route/host/prefix and specify what should be done instead (e.g., remove
hardcoded branches and replace with a configurable/generalized routing or policy
mechanism), ensuring no stray punctuation and including the scope (which
branches/behavior to change) and the acceptance criteria for the fix.

Comment on lines +109 to +132
```
┌─────────────────────────────────────────────────────────┐
│ Bandit routing │
├─────────────────────────────────────────────────────────┤
│ ⚡ Fast (3 models) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Model 1 ... [ON] │ │
│ │ Model 2 ... [ON] │ │
│ │ Model 3 ... [ON] │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ⚖️ Balanced (5 models) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Model 4 ... [ON] │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ 🧠 Smart (2 models) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Model 9 ... [ON] │ │
│ │ Model 10 ... [ON] │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks for ASCII diagrams.

The ASCII layout diagrams should have a language identifier (e.g., text or plain) to satisfy markdown linting requirements and improve rendering consistency across different viewers.

📝 Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────────┐
 │ Bandit routing                                          │

Apply the same change to the other two ASCII diagrams at lines 142 and 168.

Also applies to: 142-158, 168-177

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/model-pools-display/design.md around lines 109 - 132, The fenced
ASCII diagrams (the triple-backtick blocks that start before the "Bandit
routing" diagram and the similar blocks for "Balanced (5 models)" and "Smart (2
models)") must include a markdown language identifier; update each opening fence
from ``` to ```text (or ```plain) so linters/renderers recognize them as plain
text and the diagrams render consistently.

- Pool classification logic should match backend routing logic exactly
- No changes to the pool classification algorithm (only display changes)

## Out of No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

File appears truncated at "Out of" heading.

The requirements document ends abruptly with an incomplete "## Out of" section heading. This likely should be "## Out of Scope" with content defining what is explicitly excluded from this feature.

✅ Suggested completion
-## Out of
+## Out of Scope
+
+- Changes to the pool classification algorithm itself
+- Backend routing logic modifications
+- Analytics calculation changes
+- Model catalog structure changes
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Out of
## Out of Scope
- Changes to the pool classification algorithm itself
- Backend routing logic modifications
- Analytics calculation changes
- Model catalog structure changes
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/model-pools-display/requirements.md at line 90, The document ends
with a truncated heading "## Out of" — change that heading to "## Out of Scope"
and populate it with explicit exclusions for this feature (e.g., what model-pool
UI interactions, permissions, API endpoints, or integrations are not covered),
ensuring the section clarifies boundaries for implementation and QA; update any
table of contents or cross-references if present to reflect the completed
heading.

Comment on lines +5 to +46
```
┌─────────────────────────────────────────────────────────────┐
│ Client Request │
│ model: "freellmapi/auto" | "freellmapi/auto-smart" │
└─────────────┬───────────────────────────────┬───────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Balanced Router │ │ Smart Router │
│ (auto) │ │ (auto-smart) │
│ │ │ │
│ - Excludes longcat/* │ │ - Prefers longcat/* │
│ - Excludes OR/owl-alpha│ │ and OR/owl-alpha │
│ - Normal bandit for │ │ when valid keys exist│
│ everything else │ │ - Applies sticky │
│ │ │ cooldown for both │
│ │ │ - Model-level banning │
│ │ │ on errors │
└─────────────┬───────────┘ └───────────┬─────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ routeRequest() in router.ts │
│ │
│ 1. Build chain from fallback_config + models │
│ 2. Score via Thompson sampling │
│ 3. Apply balanced exclusions (REQ-1) │
│ 4. Apply smart preferences (REQ-2) │
│ 5. Apply sticky session pin │
│ 6. Iterate chain, find first model with valid key │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ handleChatCompletion() in proxy.ts │
│ │
│ - Sticky cooldown check for longcat + owl-alpha (REQ-3) │
│ - Model-level skipModels on 5xx/retryable (REQ-4, REQ-5) │
│ - Model-level skipModels on truncation (REQ-4, REQ-5) │
│ - Model-level skipModels on mid-stream errors (REQ-4, REQ-5)│
└─────────────────────────────────────────────────────────────┘
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add fence languages to these new code blocks.

These fences will trip markdownlint (MD040). Using text for the ASCII diagrams/flows is enough.

Also applies to: 52-78, 82-92, 96-105

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md around lines 5 - 46,
The markdown diagrams use triple-fenced blocks without a language which triggers
MD040; update each ASCII diagram/code block (the ones showing Balanced Router /
Smart Router, the routeRequest() flow, and the handleChatCompletion() flow —
e.g., the block containing "Balanced Router", "Smart Router", "routeRequest() in
router.ts", and "handleChatCompletion() in proxy.ts") to include a fence
language of text (use ```text) so markdownlint no longer flags them; apply the
same change to the other listed blocks (the ones noted around lines 52-78,
82-92, 96-105).

Comment on lines +122 to +126
- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts)
- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15`
- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the dependency links; they currently point to the wrong directory depth.

From .roo/specs/owl-alpha-longcat-model-routing/, ../server/... resolves to .roo/specs/server/..., not the repo root. These should go up to the root first (for example, ../../../server/...).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md around lines 122
- 126, The Markdown links in the requirements list are using ../server/... which
resolves relative to .roo/specs/owl-alpha-longcat-model-routing/ instead of the
repo root; update each link target to go up three levels to reference the true
server files (e.g., change ../server/src/services/router.ts to
../../../server/src/services/router.ts and similarly update
../server/src/routes/proxy.ts and ../server/src/db/index.ts references used for
migrateModelsV15/migrateModelsV16 and the LongCat smart/sticky/ban logic).

Comment on lines +1422 to +1425
// Attach client-disconnect listener
req.on('close', () => {
cleanupStream();
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't register a new close listener on every retry attempt.

This code sits inside the retry loop, so repeated pre-stream failures will stack listeners on the same req object. Once a request crosses ~10 attempts, Node will start warning about listener leaks, and every disconnect will fan out through all prior handlers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1422 - 1425, The close listener is
being added inside the retry loop which causes listener leaks; register it once
outside the retry loop (or use req.once('close', ...) or remove any existing
listener before adding) so cleanupStream is not re-registered on each
attempt—move the req.on('close', ...) call (or switch to req.once or
removeListener) so it executes a single time for the request and still calls
cleanupStream on client disconnect.

Comment on lines +345 to +370
export function classifyModel(
speedRank: number,
intelligenceRank: number,
minSpeedRank: number,
maxSpeedRank: number,
minIntelligenceRank: number,
maxIntelligenceRank: number,
): ModelPool {
const speedRange = maxSpeedRank - minSpeedRank;
const intelRange = maxIntelligenceRank - minIntelligenceRank;

// If no variation at all, default to Balanced
if (speedRange <= 0 && intelRange <= 0) return ModelPool.Balanced;

const normalizedSpeed = speedRange > 0
? (speedRank - minSpeedRank) / speedRange
: 0.5;
const normalizedIntel = intelRange > 0
? (intelligenceRank - minIntelligenceRank) / intelRange
: 0.5;

// Fast pool: top 40% fastest models (lowest normalized speed_rank)
if (normalizedSpeed <= 0.4) return ModelPool.Fast;
// Smart pool: top 40% smartest models (lowest normalized intelligence_rank)
if (normalizedIntel <= 0.4) return ModelPool.Smart;
return ModelPool.Balanced;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

classifyModel() does not implement a true top-40% cutoff.

This normalized-rank check over-selects for common catalog sizes. With 6 models, normalized <= 0.4 marks ranks 1-3 as Fast/Smart, which is 50%, not 40%. That changes both pool exposure and fast-mode routing decisions. Please compute pool membership from ordinal position/count (or precomputed cutoff ranks) instead of a fixed normalized threshold.

As per coding guidelines, **/*router*.{ts,tsx,js,jsx}: Models are classified into Fast (top 40% speed), Smart (top 40% intelligence), or Balanced pools via the classifyModel() function in router.ts. Fast mode routes to Fast pool first, then borrows from Balanced pool

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 345 - 370, The classifyModel
function uses normalized thresholds which yield incorrect top-40% cutoffs for
small catalogs; change it to compute pool membership from ordinal positions and
total count instead: compute totalCount = maxSpeedRank - minSpeedRank + 1 (and
similarly for intelligence), compute ordinalPosition = speedRank - minSpeedRank
+ 1 (and for intelligence), compute cutoff = Math.max(1, Math.ceil(totalCount *
0.4)), then classify as Fast if ordinalPosition <= cutoff, Smart if intelligence
ordinal <= cutoff, else Balanced; preserve the early-return when there is no
variation and handle edge cases (totalCount <= 0) in classifyModel and reference
the ModelPool enum.

Comment on lines +407 to 408
if (!statsCache) { refreshStatsCache(getDb(), true); }
if (!statsCache) return [];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Call the TTL-gated refresh unconditionally here.

Once statsCache exists, this path never refreshes it again, so the dashboard can keep serving stale scores until some other route happens to touch refreshStatsCache(). The new TTL gate already makes the unconditional call cheap.

Suggested fix
-export function getAnalyticsScores(): Array<{
+export function getAnalyticsScores(): Array<{
   platform: string;
   modelName: string;
   modelId: string;
@@
-  if (!statsCache) { refreshStatsCache(getDb(), true); }
+  refreshStatsCache(getDb());
   if (!statsCache) return [];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 407 - 408, Call
refreshStatsCache(getDb(), true) unconditionally before returning cached stats
so the TTL-gated refresh runs even when statsCache is already set; specifically,
invoke refreshStatsCache(getDb(), true) unconditionally (rather than only when
!statsCache) and then keep the existing guard if (!statsCache) return []; this
ensures statsCache and the TTL logic are honored and prevents serving stale
scores while preserving the cheap TTL check inside refreshStatsCache.

Comment thread server/write_test.py
Comment on lines +4 to +29
path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'

content = [
"import { describe, it, expect, beforeAll, beforeEach } from 'vitest';",
"import { initDb, getDb } from '../../db/index.js';",
"import { encrypt } from '../../lib/crypto.js';",
"import { routeRequest, refreshStatsCache, getAnalyticsScores } from '../../services/router.js';",
"",
"describe('Router', () => {",
" beforeAll(() => {",
" process.env.ENCRYPTION_KEY = '0'.repeat(64);",
" initDb(':memory:');",
" });",
"",
" beforeEach(() => {",
" const db = getDb();",
" db.prepare('DELETE FROM api_keys').run();",
" const models = db.prepare('SELECT id, intelligence_rank FROM models ORDER BY intelligence_rank ASC').all() as any[];",
" const update = db.prepare('UPDATE fallback_config SET priority = ? WHERE model_db_id = ?');",
" for (let i = 0; i < models.length; i++) {",
" update.run(i + 1, models[i].id);",
" }",
" });",
"",
" it('should throw when no keys are configured', () => {",
" expect(() => route{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_test.py")
src = path.read_text()
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 521


🏁 Script executed:

#!/bin/bash
python3 - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_test.py")
src = path.read_text(encoding="utf-8")
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 521


🏁 Script executed:

#!/bin/bash
python3 - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_test.py")
src = path.read_text(encoding="utf-8")
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 521


🏁 Script executed:

#!/bin/bash
python3 - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_test.py")
src = path.read_text(encoding="utf-8")
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 521


🏁 Script executed:

#!/bin/bash
python3 - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_test.py")
src = path.read_text(encoding="utf-8")
ast.parse(src, filename=str(path))
print("syntax OK")
PY

Repository: vi70x3/freellmapi

Length of output: 426


🏁 Script executed:

#!/bin/bash
python3 - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_test.py")
src = path.read_text(encoding="utf-8")
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 521


Critical: Fix server/write_test.py so it’s valid Python and remove the hard-coded absolute path.

  • Fails to parse with SyntaxError: unterminated string literal at server/write_test.py:29, so the script can’t run as committed.
  • Replace the machine-specific absolute path (/home/vi/freellmapi/...) with a repo-relative path (e.g., computed from __file__ / project root).
🧰 Tools
🪛 Ruff (0.15.15)

[warning] 29-29: missing closing quote in string literal

(invalid-syntax)


[warning] 29-29: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/write_test.py` around lines 4 - 29, The file defines a JavaScript test
payload in Python but currently contains an unterminated string and a
machine-specific absolute path; fix the top-level variables 'path' and 'content'
so the Python file parses: replace the hard-coded '/home/vi/freellmapi/...'
assigned to the variable path with a repo-relative computed path (use __file__ /
pathlib to resolve the project root and join the relative test path), and ensure
the 'content' value is a properly closed Python list or a single properly quoted
multi-line string so there are no unterminated string literals (verify
brackets/quotes around the array are balanced and the last line completes the JS
snippet such as the truncated "expect(() => route..." line). Ensure no other
syntax errors remain so the script runs.

Comment thread server/write_tests.py
Comment on lines +4 to +45
path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'

# Part 1: existing tests
part1 = """import { describe, it, expect, beforeAll, beforeEach } from 'vitest';
import { initDb, getDb } from '../../db/index.js';
import { encrypt } from '../../lib/crypto.js';
import { routeRequest, refreshStatsCache, getAnalyticsScores } from '../../services/router.js';

describe('Router', () => {
beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
initDb(':memory:');
});

beforeEach(() => {
const db = getDb();
db.prepare('DELETE FROM api_keys').run();
const models = db.prepare('SELECT id, intelligence_rank FROM models ORDER BY intelligence_rank ASC').all() as any[];
const update = db.prepare('UPDATE fallback_config SET priority = ? WHERE model_db_id = ?');
for (let i = 0; i < models.length; i++) {
update.run(i + 1, models[i].id);
}
});

it('should throw when no keys are configured', () => {
expect(() => routeRequest()).toThrow(/exhausted/i);
});

it('should route to highest priority model with available key', () => {
const db = getDb();
const { encrypted, iv, authTag } = encrypt('test-groq-key');
db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('groq', 'test', encrypted, iv, authTag, 'healthy', 1);
const result = routeRequest();
expect(result.platform).toBe('groq');
expect(result.apiKey).toBe('test-groq-key');
});

it('should route to an available model when keys exist for multiple platforms', () => {
const db = getDb();
const googleKey = encrypt('test-google-key');
db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('google', 'test', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 1);
const groqKey{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 613


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 613


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 613


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 613


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 613


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
ast.parse(src, filename=str(path))
print(f"{path}: syntax OK")
PY

Repository: vi70x3/freellmapi

Length of output: 504


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
ast.parse(src, filename=str(path))
print(f"{path}: syntax OK")
PY

Repository: vi70x3/freellmapi

Length of output: 504


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
import ast

path = Path("server/write_tests.py")
src = path.read_text()
try:
    ast.parse(src, filename=str(path))
    print(f"{path}: syntax OK")
except SyntaxError as e:
    print(f"{path}:{e.lineno}:{e.offset}: {e.msg}")
    raise
PY

Repository: vi70x3/freellmapi

Length of output: 613


Critical: Fix server/write_tests.py generator (invalid Python + non-portable absolute path).

  • server/write_tests.py doesn’t parse: SyntaxError: unterminated triple-quoted string literal in part1 (part1 = """...), line 7 (detected at line 45).
  • Generator output path is hard-coded to /home/vi/freellmapi/server/src/__tests__/services/router.test.ts; switch to a repo-relative path or remove the generator.
🧰 Tools
🪛 Ruff (0.15.15)

[warning] 7-45: missing closing quote in string literal

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/write_tests.py` around lines 4 - 45, The generator file
server/write_tests.py currently contains an unterminated triple-quoted string in
the part1 variable (the test text stops mid-token like "const groqKey{"),
causing a SyntaxError; close or properly terminate part1 (complete the test
contents or split into smaller, properly quoted strings and concatenate) and
ensure all embedded JS quotes/braces are correctly escaped if needed; also
replace the hard-coded absolute output path stored in path with a repo-relative
construction (use pathlib.Path(__file__).parent / 'src' / '__tests__' /
'services' / 'router.test.ts' or os.path.join with repo root resolution) so the
generator writes to a portable location.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
client/src/pages/PlaygroundPage.tsx (1)

274-285: 💤 Low value

Consider adding ARIA attributes for accessibility.

The custom dropdown works but lacks semantic hints for screen readers. Adding aria-expanded to the trigger and role="menu" to the dropdown would improve accessibility without significant effort.

♿ Suggested accessibility improvements
 <Button
   variant="outline"
   className="w-[260px] justify-between font-normal"
   onClick={() => setModelPickerOpen(v => !v)}
+  aria-expanded={modelPickerOpen}
+  aria-haspopup="menu"
 >
-<div className="absolute right-0 top-full mt-2 z-50 max-h-[70vh] overflow-y-auto w-[360px] space-y-3 rounded-lg border bg-background p-3 shadow-lg">
+<div role="menu" className="absolute right-0 top-full mt-2 z-50 max-h-[70vh] overflow-y-auto w-[360px] space-y-3 rounded-lg border bg-background p-3 shadow-lg">
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/src/pages/PlaygroundPage.tsx` around lines 274 - 285, The dropdown
trigger Button (onClick={() => setModelPickerOpen(v => !v)}) and the conditional
dropdown div (rendered when modelPickerOpen is true) lack ARIA semantics; update
the Button to include aria-expanded={modelPickerOpen} and aria-controls
referencing the dropdown's id, and add role="menu" (and an id) to the dropdown
container div so screen readers can detect its expanded state and treat the
container as a menu; ensure any interactive items inside the dropdown use
appropriate roles (e.g., role="menuitem") if present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@client/src/pages/PlaygroundPage.tsx`:
- Around line 274-285: The dropdown trigger Button (onClick={() =>
setModelPickerOpen(v => !v)}) and the conditional dropdown div (rendered when
modelPickerOpen is true) lack ARIA semantics; update the Button to include
aria-expanded={modelPickerOpen} and aria-controls referencing the dropdown's id,
and add role="menu" (and an id) to the dropdown container div so screen readers
can detect its expanded state and treat the container as a menu; ensure any
interactive items inside the dropdown use appropriate roles (e.g.,
role="menuitem") if present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 56c2d9ac-e7b6-4dbc-90ed-56abb2c18f97

📥 Commits

Reviewing files that changed from the base of the PR and between 40abf0b and 134254c.

📒 Files selected for processing (2)
  • client/src/pages/PlaygroundPage.tsx
  • server/src/routes/proxy.ts

@vi70x3 vi70x3 closed this Jun 5, 2026
@vi70x4 vi70x4 deleted the spec/recency-biased-thompson-sampling branch June 7, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant