feat(router): add cache TTL gate, fix auto-refresh order, fix indentation, deduplicate RoutingMode type by vi70x3 · Pull Request #12 · animaios/api-llm-localhost

vi70x3 · 2026-06-02T23:15:22Z

Implements the recency-biased-thompson-sampling spec.

Phase 2 changes to server/src/services/router.ts:

Add Cache TTL gate in refreshStatsCache() to skip redundant refreshes within the TTL window
Fix auto-refresh order in getAnalyticsScores() so cache refresh happens before the early-return guard
Fix modelId indentation in return type annotation and result.push() object literal
Deduplicate RoutingMode type by importing/re-exporting from @freellmapi/shared/types.js

Summary by CodeRabbit

New Features
- Added "auto-fast" routing mode and model pool grouping (Fast, Balanced, Smart) across UI, plus PoolBadge/PoolSection components and new model picker.
- Streaming now includes keep-alive heartbeats and stall detection for more resilient live responses.
Improvements
- Better handling of wrapped provider errors and recency-weighted analytics for smarter model ranking.
- Global transient cooldowns and generalized thread protection to improve availability during provider failures.

…alized thread protection scanner

- Add ANALYTICS_WINDOW_DAYS constant for SQL decay divisor - Extend ModelStats with rawSuccesses/rawTotal fields - Rewrite SQL query with CTE-based time-decay weighting (julianday) - Update statsCache population to use weighted columns - Add Math.max(0.1, ...) guards in thompsonSampleScore, smartSampleScore, getAnalyticsScore, and getSmartAnalyticsScore to handle fractional weights - Update getAnalyticsScores dashboard to display rawTotal counts - Add modelId field to getAnalyticsScores return type for fallback route compat

- Add 'fast' to RoutingMode type and ModelPool enum (Fast/Balanced/Smart) - Implement classifyModel() to categorize models by speed/intelligence ranks - Implement fastSampleScore() with 3x speed weight, 1.5x TTFB factor, minimal intelligence - Add pool-based routing in routeRequest: fast pool first, balanced pool borrowing fallback - Add freellmapi/auto-fast model ID support in proxy routes - Extend /v1/models endpoint with primaryPool field for all models - Add unit tests for pool classification (T13-T16) and fast routing (T17-T18) - Add integration tests for /v1/models primaryPool fields (T13-T17) - Update README.md and CLAUDE.md documentation

- extractErrorMessage: handle string error payloads and object errors with message field - throwWrappedError: guard against non-numeric error codes (NaN prevention) - cloudflare/cohere/openai-compat: move isWrappedError check outside JSON.parse try/catch in SSE loops

…tion, deduplicate RoutingMode type

coderabbitai · 2026-06-02T23:15:41Z

📝 Walkthrough

Walkthrough

Adds a third routing mode ("fast") and model-pool classification, implements recency-weighted analytics and fast-mode scoring, disables sticky sessions in balanced mode, introduces transient per-model cooldowns and a thread-protection scaffold, rewrites SSE streaming to add heartbeat/stall detection, detects wrapped provider errors on HTTP 200 responses, adds frontend pool UI components, and expands test coverage and specs.

Changes

Router, Proxy, and Provider Core Implementation

Layer / File(s)	Summary
Shared types: routing modes and model pools `shared/types.ts`	Adds `RoutingMode` union and `ModelPool` enum to drive fast/balanced/smart routing and pool classification.
Router weighted analytics, pool classification, and fast-mode scoring `server/src/services/router.ts`, `server/src/__tests__/services/router.test.ts`	Implements recency-weighted stats via SQL CTE, clamps Beta params with `Math.max(0.1, ...)`, adds `classifyModel`, `fastSampleScore`, `hasValidKeys`, and tests for analytics, classification, and fast-mode routing.
Model catalog and fallback endpoints expose pool metadata `server/src/routes/proxy.ts`, `server/src/routes/fallback.ts`, `server/src/__tests__/integration/full-flow.test.ts`	`/models` and fallback routes compute min/max ranks, classify models into pools, expose `primaryPool`, and include `freellmapi/auto-fast` virtual model.
Disable sticky sessions for balanced routing mode `server/src/routes/proxy.ts`, `server/src/__tests__/routes/provider-session-ban.test.ts`	`getSessionKey()` returns `''` for `routingMode === 'balanced'`, causing sticky-related helpers to be no-ops; tests added/updated to assert balanced-mode skipping and preserve smart-mode sticky behavior.
Transient model cooldowns for concurrent failure mitigation `server/src/routes/proxy.ts`, `server/src/__tests__/routes/transient-cooldown.test.ts`	Adds `transientModelCooldowns` Map and `TRANSIENT_COOLDOWN_MS`, injects active cooldowned modelDbIds into `skipModels`, clears preferred sticky model when cooled, and registers cooldowns on qualifying 5xx/connection failures; dedicated tests validate behavior.
Thread protection types and evaluator stub `server/src/services/threadProtection.ts`	Adds `ProtectionLevel`, `ErrorContext`, `ThreadProtectionAction`, and a stubbed `evaluateThreadProtection` returning an "off" action for future implementation.
SSE heartbeat and stall detection for streaming responses `server/src/routes/proxy.ts`, `server/src/__tests__/routes/stream-heartbeat-stall.test.ts`	Refactors streaming: sends SSE keep-alive comments, tracks `lastChunkTimestamp`, aborts stalled upstream via `AbortController`, emits pre-stream 504 or mid-stream error frames, and ensures cleanup on client disconnect; tests cover keep-alive, pre-/mid-stream stall, disconnect cleanup, and normal streaming.
Provider-layer wrapped error detection across adapters `server/src/providers/base.ts`, `server/src/providers/{openai-compat,cohere,cloudflare,google}.ts`	`BaseProvider` gains `isWrappedError()` and `throwWrappedError()` and exposes `extractErrorMessage()` as protected; each adapter checks parsed JSON (and streamed chunks) for wrapped errors and throws `ProviderApiError` with inferred status when found.
Frontend pool badge and section components `client/src/components/pool-badge.tsx`, `client/src/components/pool-section.tsx`	Adds `PoolBadge` and `PoolSection` React components and `PoolType` for pool-labeled UI blocks used by the playground and fallback pages.
Playground custom model picker grouped by pool `client/src/pages/PlaygroundPage.tsx`	Replaces Select with a popover model picker that groups models by pool and adds picker open/close behavior and Escape handling.
Streaming block transformation helpers `do_fix.py`, `fix.py`, `fix_streaming.py`, `new_streaming_block.txt`	Adds Python scripts and a streaming template to support the SSE streaming refactor and automated transformation/workflow.

Specifications and Tasks

Layer / File(s)	Summary
Disable sticky on balanced mode specification `.roo/specs/disable-sticky-on-auto/{design,requirements,tasks}.md`	Design, requirements, and tasks documenting the `getSessionKey()` gate and test updates to skip sticky ops in balanced mode.
Generalized thread protection scanner specification `.roo/specs/generalized-thread-protection/{design,requirements,tasks}.md`	Spec for a provider-agnostic thread protection scanner and integration points replacing LongCat-specific branches.
Model pools display specification `.roo/specs/model-pools-display/{design,requirements,tasks}.md`	Frontend and backend requirements and tasks for pool classification, badges, sections, and accessibility.
Owl Alpha + LongCat model-level routing specification `.roo/specs/owl-alpha-longcat-model-routing/{design,requirements,tasks}.md`	Spec covering balanced exclusions, smart-mode preferences, sticky cooldowns for Owl Alpha and LongCat, and model-level skipping.
Recency-biased Thompson sampling specification `.roo/specs/recency-biased-thompson-sampling/{design,requirements,tasks}.md`	Spec for time-decay weighting in analytics, Beta-parameter guards, and dashboard changes.
SSE heartbeat and stall protection specification `.roo/specs/sse-stream-heartbeat-stall-protection/{design,requirements,tasks}.md`	Spec for SSE keep-alives, stall detection, cleanup ordering, and tests.
Transient model cooldown specification `.roo/specs/transient-model-cooldown/{design,requirements,tasks}.md`	Spec for global transient cooldown Map, pruning, sticky override, registration rules, and acceptance criteria.
Wrapped provider error interception specification `.roo/specs/wrapped-error-interception/{design,requirements,tasks}.md`	Spec for detecting root-level `error` in JSON payloads on HTTP 200 and failing fast in chat/stream flows.

Public Documentation

Layer / File(s)	Summary
Documentation of routing modes and pool classification `CLAUDE.md`, `README.md`	Documents new `freellmapi/auto-fast` routing mode and model pool classification behavior exposed via `/v1/models` and the UI.

Estimated code review effort:
🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

vi70x3/freellmapi#9: Overlaps on disabling sticky-session behavior for balanced mode via getSessionKey() guard and related test adjustments.
vi70x3/freellmapi#8: Related to sticky-session and LongCat sticky-cooldown logic that this PR generalizes and gates by routing mode.
vi70x3/freellmapi#2: Also modifies sticky-session machinery; changes here affect how that machinery behaves when balanced mode disables sticky ops.

"I'm a rabbit in a routing maze,
Fast pools dash in bright, bold haze.
Balanced slips free — no sticky glue,
Heartbeats hum, stalled streams we rue.
Wrapped errors unwrapped, cooldowns keep watch — hooray!" 🐇

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/recency-biased-thompson-sampling

⚔️ Resolve merge conflicts

Resolve merge conflict in branch spec/recency-biased-thompson-sampling

gemini-code-assist

Code Review

This pull request introduces several major routing and stability enhancements, including a new fast routing mode (freellmapi/auto-fast) with model pool classification, dynamic thread protection to prevent concurrent session conflicts, SSE stream heartbeats with stall protection, and a shared transient model cooldown mechanism. It also excludes LongCat and Owl Alpha from balanced routing while adding smart preference for Owl Alpha. The review feedback is highly constructive, pointing out a critical 'Headers already sent' crash during pre-stream stalls, an unused import, a database query performance bottleneck under thread protection, a double semicolon syntax error, and minor indentation inconsistencies.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T23:17:43Z

        } catch (streamErr: any) {
+          cleanupStream();
          if (streamStarted) {


If a pre-stream stall is detected, the stall handler sends a 504 response and ends the connection. However, if the generator subsequently throws an error, it will be caught by this catch block. Since streamStarted is false, the error will be rethrown and propagate to the outer retry loop, which will attempt to fallback to another provider and write to the already-ended response, causing a 'Headers already sent' crash. Checking streamAborted here and returning early prevents this crash.

} catch (streamErr: any) {\n cleanupStream();\n if (streamAborted) {\n logRequest(route.platform, route.modelId, 'error', estimatedInputTokens, totalOutputTokens, Date.now() - start, ttfbMs, 'stream_stalled');\n return;\n }\n if (streamStarted) {

gemini-code-assist · 2026-06-02T23:17:43Z

 import { z } from 'zod';
+import { evaluateThreadProtection } from '../services/threadProtection.js';
 import type { ChatCompletionChunk, ChatCompletionResponse, ChatMessage, ChatToolCall, ChatToolDefinition } from '@freellmapi/shared/types.js';


The import of evaluateThreadProtection is unused in this file. The design spec for the 'Generalized Thread Protection Scanner' seems to have been implemented inline or left as a placeholder in threadProtection.ts. Consider removing this unused import to keep the codebase clean.

import { z } from 'zod';\nimport type { ChatCompletionChunk, ChatCompletionResponse, ChatMessage, ChatToolCall, ChatToolDefinition } from '@freellmapi/shared/types.js';

gemini-code-assist · 2026-06-02T23:17:43Z

+    // clear the set and let the request through rather than failing outright.
+    if (activeCooldownModels.size > 0) {
+      const db = getDb();
+      const allEnabled = db.prepare('SELECT id FROM models WHERE enabled = 1').all() as Array<{ id: number }>;


Querying the database for all enabled models on every request with active cooldowns can become a performance bottleneck under high concurrency. Consider caching the list of enabled model IDs or retrieving them from an in-memory cache/registry, as this list changes infrequently.

gemini-code-assist · 2026-06-02T23:17:43Z

+                  });
+                } else {
+                  res.write(`data: ${JSON.stringify(payload)}\n\n`);
+                  res.write('data: [DONE]\n\n');;


There is a double semicolon at the end of this line. Remove the extra semicolon to adhere to clean coding standards.

Suggested change

res.write('data: [DONE]\n\n');;

res.write('data: [DONE]\\n\\n');

gemini-code-assist · 2026-06-02T23:17:43Z

                console.warn(`[Proxy] Mid-stream 5xx from ${route.platform} — skipping model ${route.modelId} only`);
                skipModels.add(route.modelDbId);
              }
-            }
-
-            // Generalized truncation detection for any provider (not just LongCat)
+             // Register global transient cooldown for any 5xx mid-stream error


The indentation of these lines is inconsistent with the surrounding block. Adjust the indentation to match the 14-space statement indentation and 12-space closing brace indentation of the parent block.

Suggested change

console.warn(`[Proxy] Mid-stream 5xx from ${route.platform} — skipping model ${route.modelId} only`);

skipModels.add(route.modelDbId);

}

}

// Generalized truncation detection for any provider (not just LongCat)

// Register global transient cooldown for any 5xx mid-stream error

// Register global transient cooldown for any 5xx mid-stream error\n transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);\n console.log(\`[TransientCooldown] registered global cooldown for modelDbId=\${route.modelDbId} (\${TRANSIENT_COOLDOWN_MS / 1000}s)\`);\n }

gemini-code-assist · 2026-06-02T23:17:43Z

+       }
+     }

-      if (isRetryableError(err)) {
+     if (isRetryableError(err)) {


The indentation of these lines is off by one space. Adjust the indentation to align with the parent block.

}\n }\n\n if (isRetryableError(err)) {

kilo-code-bot · 2026-06-02T23:21:24Z

Code Review Summary

Status: 2 Issues Found (existing) | Recommendation: Address before merge

Overview

Severity	Count
WARNING	2

Issue Details (click to expand)

File	Line	Issue
`server/src/routes/proxy.ts`	1397	Double semicolon — redundant trailing `;`
`server/src/routes/proxy.ts`	1730	Wrong indentation — `if (isRetryableError(err))` is indented at 6 spaces but should be at 3 spaces, placing it inside the wrong block scope

Other Observations (not in diff)

server/src/services/threadProtection.ts is a stub — evaluateThreadProtection() always returns { banProvider: false, skipModel: false, clearStickyIfPinned: false, reason: 'off' }. Meanwhile proxy.ts lines ~1254–1310 contain a fully implemented thread-protection block that bypasses sticky-model preferences when another session is actively using a model. That logic in proxy.ts will run but the extracted service is a no-op placeholder.
Fast mode bypass affects non-fast routing paths — The transient cooldown maps (transientModelCooldowns) and sticky-preference clearing added in proxy.ts run unconditionally (current session key is computed from routingMode, but the cooldown injection into skipModels at lines ~1224–1244 happens before mode checks, and the global transient cooldown at lines 1560–1561, 1733–1734 fires regardless of routing mode). In balanced/smart mode this means a model can be silently excluded from routing by another session's transient cooldown, which does not appear to be the intended scope.
server/write_test.py and server/write_tests.py are scratch/staging files — These Python scripts appear to be local tooling used to write test files and are committed to the PR. They do not belong in the repository.

Files Reviewed (3 files)

server/src/routes/proxy.ts — 2 existing issues (indentation, semicolons)
server/src/services/router.ts — no new issues
server/src/services/threadProtection.ts — stub implementation observation

Fix these issues in Kilo Cloud

_{Reviewed by nemotron-3-super-120b-a12b-20230311:free · 924,387 tokens}

…haustAll, remove double semicolons

…onent

coderabbitai

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/src/routes/proxy.ts (1)
1479-1484: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

LongCat is still being banned at provider scope in the new error paths.

These branches still ban the whole longcat platform instead of skipping only route.modelDbId, so one failing LongCat request will also block future LongCat models for the session. That breaks the model-level contract described in this PR.

Also applies to: 1551-1554, 1579-1583, 1610-1613, 1711-1714, 1735-1738
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1479 - 1484, The code currently
calls banPlatformFromSession(...) and addProviderModelsToSkipModels(...) for
LongCat branches, which bans the entire platform; instead, remove those two
calls and add the failing modelDbId to the session-level skip list so only
route.modelDbId is skipped. Locate the LongCat branches (the block checking
route.platform === 'longcat') and replace
banPlatformFromSession(normalizedMessages, routingMode, 'longcat',
route.modelDbId) and addProviderModelsToSkipModels(skipModels, 'longcat') with
logic that records route.modelDbId under the LongCat provider in skipModels
(e.g., push or set skipModels['longcat'] to include route.modelDbId or call an
existing addModel-level helper), ensuring you use the existing skipModels
variable and route.modelDbId so only that model is skipped for the session.

🧹 Nitpick comments (4)

.roo/specs/disable-sticky-on-auto/design.md (1)
11-17: ⚡ Quick win

Add language identifier to fenced code block.

The fenced code block should specify a language for proper syntax highlighting.
📝 Proposed fix
-```
+```typescript
 function stickyOp(messages, routingMode, ...) {
   const key = getSessionKey(messages, routingMode);
   if (!key) return <no-op value>;   // undefined, false, or early return
   ...operate on stickySessionMap using key...
 }
Based on learnings from static analysis: markdownlint-cli2 flagged this as missing a language specifier.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/disable-sticky-on-auto/design.md around lines 11 - 17, The fenced
code block containing the example for function stickyOp is missing a language
identifier; update the block to include a language (e.g., "typescript" or "js")
so syntax highlighters handle getSessionKey, stickyOp, and stickySessionMap
correctly—locate the block showing function stickyOp(messages, routingMode, ...)
and add the language specifier immediately after the opening backticks (for
example ```typescript).
.roo/specs/generalized-thread-protection/design.md (1)
43-45: ⚡ Quick win

Add language identifier to fenced code block.

The environment variable example block should specify a language (e.g., bash or shell) for proper syntax highlighting and markdown compliance.
📝 Proposed fix
-```
+```bash
 THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 45,
Update the fenced code block that shows the environment variable
THREAD_PROTECTION_PLATFORMS so it includes a language identifier (e.g., use
bash or shell) for proper markdown highlighting; locate the block
containing THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
and change the opening fence to include the language token.
</details>

</blockquote></details>
<details>
<summary>fix.py (1)</summary><blockquote>

`4-8`: _⚡ Quick win_

**Remove or complete this helper script.**

Right now it only reads `server/src/routes/proxy.ts` and stops, so running it does nothing and leaves a misleading artifact in the repo.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fix.py` around lines 4 - 8, This helper script is incomplete and should be
removed or finished; either delete fix.py entirely, or implement its intended
behavior: open('server/src/routes/proxy.ts', 'r') should read into lines, locate
the streaming block start by searching for the string "for await (const chunk"
and then find the matching block end (matching braces) and then perform the
intended action (e.g., print the start/end indices, extract the block, or modify
and write back) before closing—use the existing lines variable and proper file
write handling to persist changes if any.
```

</details>

</blockquote></details>
<details>
<summary>client/src/components/pool-badge.tsx (1)</summary><blockquote>

`7-7`: _⚡ Quick win_

**Inconsistent color naming in pool configurations.**

Each pool configuration mixes different color families in text and background:
- Fast: `bg-emerald-100` with `text-green-800` (emerald vs green)
- Balanced: `bg-slate-100` with `text-gray-700` (slate vs gray)  
- Smart: `bg-purple-100` with `text-indigo-700` (purple vs indigo)

While these may be intentional design choices, using consistent color families improves predictability and maintainability.




<details>
<summary>🎨 Proposed fix for color consistency</summary>

```diff
   fast: {
     icon: "⚡",
     label: "Fast",
-    className: "bg-emerald-100 text-green-800 dark:bg-emerald-900/30 dark:text-emerald-400",
+    className: "bg-emerald-100 text-emerald-800 dark:bg-emerald-900/30 dark:text-emerald-400",
   },
   balanced: {
     icon: "⚖️",
     label: "Balanced",
-    className: "bg-slate-100 text-gray-700 dark:bg-slate-800 dark:text-slate-300",
+    className: "bg-slate-100 text-slate-700 dark:bg-slate-800 dark:text-slate-300",
   },
   smart: {
     icon: "🧠",
     label: "Smart",
-    className: "bg-purple-100 text-indigo-700 dark:bg-purple-900/30 dark:text-purple-400",
+    className: "bg-purple-100 text-purple-700 dark:bg-purple-900/30 dark:text-purple-400",
   },
```
</details>


Also applies to: 12-12, 17-17

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/src/components/pool-badge.tsx` at line 7, The pool badge color classes
mix different Tailwind color families; update the className entries in the
PoolBadge/pool configuration for Fast, Balanced, and Smart so the text color
uses the same family as the background (e.g., bg-emerald-100 -> text-emerald-800
for Fast, bg-slate-100 -> text-slate-700 for Balanced, bg-purple-100 ->
text-purple-700 for Smart). Locate the className fields in
client/src/components/pool-badge.tsx (the PoolBadge or pool config entries
referenced on the three lines) and replace the mismatched text color tokens to
match each background family for consistency.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/generalized-thread-protection/requirements.md:

Line 5: The requirement line is truncated and contains a stray '{' so rewrite
it into a single complete sentence that clearly states the problem and expected
outcome: describe that the proxy route handler (the exported router/handler in
proxy.ts) currently contains 6+ hardcoded branches that special-case the
"longcat" route/host/prefix and specify what should be done instead (e.g.,
remove hardcoded branches and replace with a configurable/generalized routing or
policy mechanism), ensuring no stray punctuation and including the scope (which
branches/behavior to change) and the acceptance criteria for the fix.

In @.roo/specs/model-pools-display/design.md:

Around line 109-132: The fenced ASCII diagrams (the triple-backtick blocks
that start before the "Bandit routing" diagram and the similar blocks for
"Balanced (5 models)" and "Smart (2 models)") must include a markdown language
identifier; update each opening fence from totext (or ```plain) so
linters/renderers recognize them as plain text and the diagrams render
consistently.

In @.roo/specs/model-pools-display/requirements.md:

Line 90: The document ends with a truncated heading "## Out of" — change that
heading to "## Out of Scope" and populate it with explicit exclusions for this
feature (e.g., what model-pool UI interactions, permissions, API endpoints, or
integrations are not covered), ensuring the section clarifies boundaries for
implementation and QA; update any table of contents or cross-references if
present to reflect the completed heading.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md:

Around line 5-46: The markdown diagrams use triple-fenced blocks without a
language which triggers MD040; update each ASCII diagram/code block (the ones
showing Balanced Router / Smart Router, the routeRequest() flow, and the
handleChatCompletion() flow — e.g., the block containing "Balanced Router",
"Smart Router", "routeRequest() in router.ts", and "handleChatCompletion() in
proxy.ts") to include a fence language of text (use ```text) so markdownlint no
longer flags them; apply the same change to the other listed blocks (the ones
noted around lines 52-78, 82-92, 96-105).

In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md:

Around line 122-126: The Markdown links in the requirements list are using
../server/... which resolves relative to
.roo/specs/owl-alpha-longcat-model-routing/ instead of the repo root; update
each link target to go up three levels to reference the true server files (e.g.,
change ../server/src/services/router.ts to
../../../server/src/services/router.ts and similarly update
../server/src/routes/proxy.ts and ../server/src/db/index.ts references used for
migrateModelsV15/migrateModelsV16 and the LongCat smart/sticky/ban logic).

In @.roo/specs/recency-biased-thompson-sampling/design.md:

Line 5: The relative source links in
.roo/specs/recency-biased-thompson-sampling/design.md point to server files
using paths like server/src/... which resolve under the spec folder; update each
such link to use the repository-root relative prefix "../../../server/..." so
they open correctly from the spec location (apply the same fix to the other
occurrences noted around lines 31, 85, 115, 132, 138, and 229-230); search for
markdown links referencing "server/src/..." in this file and replace them with
"../../../server/src/..." to correct all references.

In @.roo/specs/wrapped-error-interception/tasks.md:

Around line 28-31: In OpenAICompatProvider.streamChatCompletion(), constrain
the wrapped-error check to only the first parsed SSE payload: after parsing each
chunk (assign JSON.parse(data) to a variable), check this.isWrappedError(parsed)
and call this.throwWrappedError(parsed) only if no previous payload has been
yielded yet (e.g., guard with a local flag like sawFirstPayload or check a
stream-start state) so later parsed chunks do not abort an already-started
downstream stream; then proceed to yield the parsed payload and set the flag.

In @client/src/components/pool-section.tsx:

Around line 5-30: The PoolSection component currently renders a static
container but needs collapse/expand behavior; update PoolSection to maintain
internal open state (e.g., useState) or accept a prop like defaultOpen, render a
clickable header (wrap the PoolBadge + title in a button/div with onClick) that
toggles that state, show/hide the children based on the state, add an accessible
aria-expanded attribute on the header and visually indicate state with a
chevron/icon, and preserve the existing className and styling when collapsed or
expanded; reference the PoolSection component and PoolBadge for where to attach
the toggle and children visibility logic.

In @do_fix.py:

Around line 1-7: The file do_fix.py contains an incomplete helper script
(orphaned '{' at the top and a partial read into the variable content) and
should be either removed or completed; to fix, either delete do_fix.py from the
commit if it was temporary, or complete the script by removing the stray '{',
add necessary imports and a proper main() implementation that opens
server/src/routes/proxy.ts, reads content, performs the replacement of the
streaming block with the Promise.race-based stall-detection snippet, writes the
modified content back safely (use a temp file and atomic replace), handle
exceptions and return non-zero on failure, and ensure the shebang and file
permissions are correct so do_fix.py is a runnable, syntactically valid helper;
locate the incomplete references to content and the
open('server/src/routes/proxy.ts', 'r') call to implement or remove them.

In @fix_streaming.py:

Around line 12-21: The string assigned to the variable new_streaming in
fix_streaming.py is left unterminated (starts with r''' but no closing '''),
causing a SyntaxError; fix it by adding the missing closing triple-quote to
terminate the raw triple-quoted string (or remove the stray snippet entirely) so
new_streaming is a valid string literal and the module can be imported/executed.

In @new_streaming_block.txt:

Around line 21-28: The snippet for stallTimeout was truncated and left
unclosed; restore the full implementation of stallTimeout so it returns a
Promise that sets a timer via setTimeout to reject with an
Error('Upstream stream stalled') augmented with { status: 504, type:
'stream_timeout' after MAX_STREAM_STALL_MS, and close all opened
braces/parentheses; ensure the timer variable is declared and the promise body
is properly terminated so stallTimeout and its inner setTimeout callback are
syntactically complete (refer to stallTimeout and MAX_STREAM_STALL_MS to locate
where to fix).

In @server/src/__tests__/routes/stream-heartbeat-stall.test.ts:

Around line 74-119: The test asserts a keep-alive comment during the 300ms
gap, but the proxy implementation (streamStarted flag in
server/src/routes/proxy.ts) only emits ": keep-alive\n\n" once streamStarted ===
true; update the test in
server/src/tests/routes/stream-heartbeat-stall.test.ts to stop expecting a
keep-alive before the first chunk (remove or change the expect(raw).toContain(':
keep-alive') assertion) or adjust it to assert keep-alives only after the first
chunk/after streamStarted is true so the test matches the current behavior of
streamStarted-controlled heartbeats.

Around line 12-18: The test helper currently injects only the unified API key
via getUnifiedApiKey() when path.startsWith('/v1/'), which incorrectly allows
unauthenticated access to /api/* routes; update the helper (the fetch wrapper
used in these tests) to accept an explicit admin key or auth override and use
getAdminApiKey() when the request path startsWith('/api/'), while continuing to
use getUnifiedApiKey() for '/v1/'; alternatively add a parameter to the helper
to pass custom headers so tests can seed the DB or set Authorization explicitly
instead of implicitly baking unauthenticated /api/* access.

In @server/src/providers/openai-compat.ts:

Around line 131-143: Only check for and throw wrapped SSE errors before any
chunk has been yielded: add a boolean flag (e.g., hasYielded) that is false
until the first successful yield of parsed, and only call
isWrappedError/throwWrappedError when hasYielded is false; if you detect a
wrapped error before the first yield, cancel the underlying reader (call
reader.cancel()) before calling throwWrappedError(parsed) so the upstream stream
is aborted and the proxy can retry/fallback cleanly. Update the code around
parsed, isWrappedError, throwWrappedError, and the reader to implement this
gating and cancellation.

In @server/src/routes/proxy.ts:

Around line 1422-1425: The close listener is being added inside the retry loop
which causes listener leaks; register it once outside the retry loop (or use
req.once('close', ...) or remove any existing listener before adding) so
cleanupStream is not re-registered on each attempt—move the req.on('close', ...)
call (or switch to req.once or removeListener) so it executes a single time for
the request and still calls cleanupStream on client disconnect.

Around line 1279-1307: The exhaustion check currently compares
activeCooldownModels against all enabled models (using allEnabledIds) which is
incorrect because routing mode and existing skipModels further restrict the
actual routable candidates; change the check to compute the same candidate set
that routeRequest() uses (apply enabled filter, current routing mode filters,
existing skipModels and remove preferredModel) and then test whether
activeCooldownModels would exhaust that specific candidate set; use the same
variables/functions involved (activeCooldownModels, preferredModel, skipModels,
routeRequest or the helper that builds candidate IDs) so the logic mirrors
routing and avoids false negatives that lead to 429s.

In @server/src/services/router.ts:

Around line 345-370: The classifyModel function uses normalized thresholds
which yield incorrect top-40% cutoffs for small catalogs; change it to compute
pool membership from ordinal positions and total count instead: compute
totalCount = maxSpeedRank - minSpeedRank + 1 (and similarly for intelligence),
compute ordinalPosition = speedRank - minSpeedRank + 1 (and for intelligence),
compute cutoff = Math.max(1, Math.ceil(totalCount * 0.4)), then classify as Fast
if ordinalPosition <= cutoff, Smart if intelligence ordinal <= cutoff, else
Balanced; preserve the early-return when there is no variation and handle edge
cases (totalCount <= 0) in classifyModel and reference the ModelPool enum.

Around line 407-408: Call refreshStatsCache(getDb(), true) unconditionally
before returning cached stats so the TTL-gated refresh runs even when statsCache
is already set; specifically, invoke refreshStatsCache(getDb(), true)
unconditionally (rather than only when !statsCache) and then keep the existing
guard if (!statsCache) return []; this ensures statsCache and the TTL logic are
honored and prevents serving stale scores while preserving the cheap TTL check
inside refreshStatsCache.

In @server/write_test.py:

Around line 4-29: The file defines a JavaScript test payload in Python but
currently contains an unterminated string and a machine-specific absolute path;
fix the top-level variables 'path' and 'content' so the Python file parses:
replace the hard-coded '/home/vi/freellmapi/...' assigned to the variable path
with a repo-relative computed path (use file / pathlib to resolve the
project root and join the relative test path), and ensure the 'content' value is
a properly closed Python list or a single properly quoted multi-line string so
there are no unterminated string literals (verify brackets/quotes around the
array are balanced and the last line completes the JS snippet such as the
truncated "expect(() => route..." line). Ensure no other syntax errors remain so
the script runs.

In @server/write_tests.py:

Around line 4-45: The generator file server/write_tests.py currently contains
an unterminated triple-quoted string in the part1 variable (the test text stops
mid-token like "const groqKey{"), causing a SyntaxError; close or properly
terminate part1 (complete the test contents or split into smaller, properly
quoted strings and concatenate) and ensure all embedded JS quotes/braces are
correctly escaped if needed; also replace the hard-coded absolute output path
stored in path with a repo-relative construction (use
pathlib.Path(file).parent / 'src' / 'tests' / 'services' /
'router.test.ts' or os.path.join with repo root resolution) so the generator
writes to a portable location.

Outside diff comments:
In @server/src/routes/proxy.ts:

Around line 1479-1484: The code currently calls banPlatformFromSession(...)
and addProviderModelsToSkipModels(...) for LongCat branches, which bans the
entire platform; instead, remove those two calls and add the failing modelDbId
to the session-level skip list so only route.modelDbId is skipped. Locate the
LongCat branches (the block checking route.platform === 'longcat') and replace
banPlatformFromSession(normalizedMessages, routingMode, 'longcat',
route.modelDbId) and addProviderModelsToSkipModels(skipModels, 'longcat') with
logic that records route.modelDbId under the LongCat provider in skipModels
(e.g., push or set skipModels['longcat'] to include route.modelDbId or call an
existing addModel-level helper), ensuring you use the existing skipModels
variable and route.modelDbId so only that model is skipped for the session.

Nitpick comments:
In @.roo/specs/disable-sticky-on-auto/design.md:

Around line 11-17: The fenced code block containing the example for function
stickyOp is missing a language identifier; update the block to include a
language (e.g., "typescript" or "js") so syntax highlighters handle
getSessionKey, stickyOp, and stickySessionMap correctly—locate the block showing
function stickyOp(messages, routingMode, ...) and add the language specifier
immediately after the opening backticks (for example ```typescript).

In @.roo/specs/generalized-thread-protection/design.md:

Around line 43-45: Update the fenced code block that shows the environment
variable THREAD_PROTECTION_PLATFORMS so it includes a language identifier (e.g.,
use bash or shell) for proper markdown highlighting; locate the block
containing THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
and change the opening fence to include the language token.

In @client/src/components/pool-badge.tsx:

Line 7: The pool badge color classes mix different Tailwind color families;
update the className entries in the PoolBadge/pool configuration for Fast,
Balanced, and Smart so the text color uses the same family as the background
(e.g., bg-emerald-100 -> text-emerald-800 for Fast, bg-slate-100 ->
text-slate-700 for Balanced, bg-purple-100 -> text-purple-700 for Smart). Locate
the className fields in client/src/components/pool-badge.tsx (the PoolBadge or
pool config entries referenced on the three lines) and replace the mismatched
text color tokens to match each background family for consistency.

In @fix.py:

Around line 4-8: This helper script is incomplete and should be removed or
finished; either delete fix.py entirely, or implement its intended behavior:
open('server/src/routes/proxy.ts', 'r') should read into lines, locate the
streaming block start by searching for the string "for await (const chunk" and
then find the matching block end (matching braces) and then perform the intended
action (e.g., print the start/end indices, extract the block, or modify and
write back) before closing—use the existing lines variable and proper file write
handling to persist changes if any.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro Plus

**Run ID**: `ebc2d52b-36c4-4d90-9c6a-9b952a33334b`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 233e03142895cd8c18d441461d5c9568186c7123 and 40abf0b8c67d25c3d5526a9345e88d01db0cca9e.

</details>

<details>
<summary>📒 Files selected for processing (51)</summary>

* `.roo/specs/disable-sticky-on-auto/design.md`
* `.roo/specs/disable-sticky-on-auto/requirements.md`
* `.roo/specs/disable-sticky-on-auto/tasks.md`
* `.roo/specs/generalized-thread-protection/design.md`
* `.roo/specs/generalized-thread-protection/requirements.md`
* `.roo/specs/generalized-thread-protection/tasks.md`
* `.roo/specs/model-pools-display/design.md`
* `.roo/specs/model-pools-display/requirements.md`
* `.roo/specs/model-pools-display/tasks.md`
* `.roo/specs/owl-alpha-longcat-model-routing/design.md`
* `.roo/specs/owl-alpha-longcat-model-routing/requirements.md`
* `.roo/specs/owl-alpha-longcat-model-routing/tasks.md`
* `.roo/specs/recency-biased-thompson-sampling/design.md`
* `.roo/specs/recency-biased-thompson-sampling/requirements.md`
* `.roo/specs/recency-biased-thompson-sampling/tasks.md`
* `.roo/specs/sse-stream-heartbeat-stall-protection/design.md`
* `.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md`
* `.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md`
* `.roo/specs/transient-model-cooldown/design.md`
* `.roo/specs/transient-model-cooldown/requirements.md`
* `.roo/specs/transient-model-cooldown/tasks.md`
* `.roo/specs/wrapped-error-interception/design.md`
* `.roo/specs/wrapped-error-interception/requirements.md`
* `.roo/specs/wrapped-error-interception/tasks.md`
* `CLAUDE.md`
* `README.md`
* `client/src/components/pool-badge.tsx`
* `client/src/components/pool-section.tsx`
* `do_fix.py`
* `fix.py`
* `fix_streaming.py`
* `fix{`
* `new_streaming_block.txt`
* `server/src/__tests__/integration/full-flow.test.ts`
* `server/src/__tests__/routes/provider-session-ban.test.ts`
* `server/src/__tests__/routes/proxy-tools.test.ts`
* `server/src/__tests__/routes/stream-heartbeat-stall.test.ts`
* `server/src/__tests__/routes/transient-cooldown.test.ts`
* `server/src/__tests__/services/router.test.ts`
* `server/src/providers/base.ts`
* `server/src/providers/cloudflare.ts`
* `server/src/providers/cohere.ts`
* `server/src/providers/google.ts`
* `server/src/providers/openai-compat.ts`
* `server/src/routes/fallback.ts`
* `server/src/routes/proxy.ts`
* `server/src/services/router.ts`
* `server/src/services/threadProtection.ts`
* `server/write_test.py`
* `server/write_tests.py`
* `shared/types.ts`

</details>

</details>

coderabbitai · 2026-06-02T23:35:12Z

+
+## Problem Statement
+
+The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Finish the truncated problem statement.

This sentence ends mid-thought and includes a stray {, so the requirement is not readable enough to implement from.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/generalized-thread-protection/requirements.md at line 5, The requirement line is truncated and contains a stray '{' so rewrite it into a single complete sentence that clearly states the problem and expected outcome: describe that the proxy route handler (the exported router/handler in proxy.ts) currently contains 6+ hardcoded branches that special-case the "longcat" route/host/prefix and specify what should be done instead (e.g., remove hardcoded branches and replace with a configurable/generalized routing or policy mechanism), ensuring no stray punctuation and including the scope (which branches/behavior to change) and the acceptance criteria for the fix.

coderabbitai · 2026-06-02T23:35:12Z

+```
+┌─────────────────────────────────────────────────────────┐
+│ Bandit routing                                          │
+├─────────────────────────────────────────────────────────┤
+│ ⚡ Fast (3 models)                                      │
+│ ┌─────────────────────────────────────────────────────┐ │
+│ │ Model 1 ...                                    [ON] │ │
+│ │ Model 2 ...                                    [ON] │ │
+│ │ Model 3 ...                                    [ON] │ │
+│ └─────────────────────────────────────────────────────┘ │
+│                                                         │
+│ ⚖️ Balanced (5 models)                                  │
+│ ┌─────────────────────────────────────────────────────┐ │
+│ │ Model 4 ...                                    [ON] │ │
+│ │ ...                                                 │ │
+│ └─────────────────────────────────────────────────────┘ │
+│                                                         │
+│ 🧠 Smart (2 models)                                     │
+│ ┌─────────────────────────────────────────────────────┐ │
+│ │ Model 9 ...                                    [ON] │ │
+│ │ Model 10 ...                                   [ON] │ │
+│ └─────────────────────────────────────────────────────┘ │
+└─────────────────────────────────────────────────────────┘
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks for ASCII diagrams.

The ASCII layout diagrams should have a language identifier (e.g., text or plain) to satisfy markdown linting requirements and improve rendering consistency across different viewers.

📝 Proposed fix

-``` +```text ┌─────────────────────────────────────────────────────────┐ │ Bandit routing │

Apply the same change to the other two ASCII diagrams at lines 142 and 168.

Also applies to: 142-158, 168-177

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/model-pools-display/design.md around lines 109 - 132, The fenced ASCII diagrams (the triple-backtick blocks that start before the "Bandit routing" diagram and the similar blocks for "Balanced (5 models)" and "Smart (2 models)") must include a markdown language identifier; update each opening fence from ``` to ```text (or ```plain) so linters/renderers recognize them as plain text and the diagrams render consistently.

coderabbitai · 2026-06-02T23:35:12Z

+- Pool classification logic should match backend routing logic exactly
+- No changes to the pool classification algorithm (only display changes)
+
+## Out of


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

File appears truncated at "Out of" heading.

The requirements document ends abruptly with an incomplete "## Out of" section heading. This likely should be "## Out of Scope" with content defining what is explicitly excluded from this feature.

✅ Suggested completion

-## Out of +## Out of Scope + +- Changes to the pool classification algorithm itself +- Backend routing logic modifications +- Analytics calculation changes +- Model catalog structure changes

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## Out of

## Out of Scope

- Changes to the pool classification algorithm itself

- Backend routing logic modifications

- Analytics calculation changes

- Model catalog structure changes

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/model-pools-display/requirements.md at line 90, The document ends with a truncated heading "## Out of" — change that heading to "## Out of Scope" and populate it with explicit exclusions for this feature (e.g., what model-pool UI interactions, permissions, API endpoints, or integrations are not covered), ensuring the section clarifies boundaries for implementation and QA; update any table of contents or cross-references if present to reflect the completed heading.

coderabbitai · 2026-06-02T23:35:12Z

+```
+┌─────────────────────────────────────────────────────────────┐
+│                     Client Request                          │
+│              model: "freellmapi/auto" | "freellmapi/auto-smart"  │
+└─────────────┬───────────────────────────────┬───────────────┘
+              │                               │
+              ▼                               ▼
+┌─────────────────────────┐     ┌─────────────────────────┐
+│   Balanced Router       │     │   Smart Router          │
+│   (auto)                │     │   (auto-smart)          │
+│                         │     │                         │
+│  - Excludes longcat/*   │     │  - Prefers longcat/*    │
+│  - Excludes OR/owl-alpha│     │    and OR/owl-alpha     │
+│  - Normal bandit for    │     │    when valid keys exist│
+│    everything else      │     │  - Applies sticky       │
+│                         │     │    cooldown for both    │
+│                         │     │  - Model-level banning  │
+│                         │     │    on errors            │
+└─────────────┬───────────┘     └───────────┬─────────────┘
+              │                               │
+              ▼                               ▼
+┌─────────────────────────────────────────────────────────────┐
+│              routeRequest() in router.ts                    │
+│                                                             │
+│  1. Build chain from fallback_config + models               │
+│  2. Score via Thompson sampling                             │
+│  3. Apply balanced exclusions (REQ-1)                       │
+│  4. Apply smart preferences (REQ-2)                         │
+│  5. Apply sticky session pin                                │
+│  6. Iterate chain, find first model with valid key          │
+└─────────────────────────────────────────────────────────────┘
+              │
+              ▼
+┌─────────────────────────────────────────────────────────────┐
+│              handleChatCompletion() in proxy.ts             │
+│                                                             │
+│  - Sticky cooldown check for longcat + owl-alpha (REQ-3)   │
+│  - Model-level skipModels on 5xx/retryable (REQ-4, REQ-5)  │
+│  - Model-level skipModels on truncation (REQ-4, REQ-5)     │
+│  - Model-level skipModels on mid-stream errors (REQ-4, REQ-5)│
+└─────────────────────────────────────────────────────────────┘
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add fence languages to these new code blocks.

These fences will trip markdownlint (MD040). Using text for the ASCII diagrams/flows is enough.

Also applies to: 52-78, 82-92, 96-105

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/owl-alpha-longcat-model-routing/design.md around lines 5 - 46, The markdown diagrams use triple-fenced blocks without a language which triggers MD040; update each ASCII diagram/code block (the ones showing Balanced Router / Smart Router, the routeRequest() flow, and the handleChatCompletion() flow — e.g., the block containing "Balanced Router", "Smart Router", "routeRequest() in router.ts", and "handleChatCompletion() in proxy.ts") to include a fence language of text (use ```text) so markdownlint no longer flags them; apply the same change to the other listed blocks (the ones noted around lines 52-78, 82-92, 96-105).

coderabbitai · 2026-06-02T23:35:12Z

+- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts)
+- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
+- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
+- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15`
+- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16`


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the dependency links; they currently point to the wrong directory depth.

From .roo/specs/owl-alpha-longcat-model-routing/, ../server/... resolves to .roo/specs/server/..., not the repo root. These should go up to the root first (for example, ../../../server/...).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md around lines 122 - 126, The Markdown links in the requirements list are using ../server/... which resolves relative to .roo/specs/owl-alpha-longcat-model-routing/ instead of the repo root; update each link target to go up three levels to reference the true server files (e.g., change ../server/src/services/router.ts to ../../../server/src/services/router.ts and similarly update ../server/src/routes/proxy.ts and ../server/src/db/index.ts references used for migrateModelsV15/migrateModelsV16 and the LongCat smart/sticky/ban logic).

coderabbitai · 2026-06-02T23:35:12Z

+        // Attach client-disconnect listener
+        req.on('close', () => {
+          cleanupStream();
+        });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't register a new close listener on every retry attempt.

This code sits inside the retry loop, so repeated pre-stream failures will stack listeners on the same req object. Once a request crosses ~10 attempts, Node will start warning about listener leaks, and every disconnect will fan out through all prior handlers.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1422 - 1425, The close listener is being added inside the retry loop which causes listener leaks; register it once outside the retry loop (or use req.once('close', ...) or remove any existing listener before adding) so cleanupStream is not re-registered on each attempt—move the req.on('close', ...) call (or switch to req.once or removeListener) so it executes a single time for the request and still calls cleanupStream on client disconnect.

coderabbitai · 2026-06-02T23:35:12Z

+export function classifyModel(
+  speedRank: number,
+  intelligenceRank: number,
+  minSpeedRank: number,
+  maxSpeedRank: number,
+  minIntelligenceRank: number,
+  maxIntelligenceRank: number,
+): ModelPool {
+  const speedRange = maxSpeedRank - minSpeedRank;
+  const intelRange = maxIntelligenceRank - minIntelligenceRank;
+
+  // If no variation at all, default to Balanced
+  if (speedRange <= 0 && intelRange <= 0) return ModelPool.Balanced;
+
+  const normalizedSpeed = speedRange > 0
+    ? (speedRank - minSpeedRank) / speedRange
+    : 0.5;
+  const normalizedIntel = intelRange > 0
+    ? (intelligenceRank - minIntelligenceRank) / intelRange
+    : 0.5;
+
+  // Fast pool: top 40% fastest models (lowest normalized speed_rank)
+  if (normalizedSpeed <= 0.4) return ModelPool.Fast;
+  // Smart pool: top 40% smartest models (lowest normalized intelligence_rank)
+  if (normalizedIntel <= 0.4) return ModelPool.Smart;
+  return ModelPool.Balanced;


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

classifyModel() does not implement a true top-40% cutoff.

This normalized-rank check over-selects for common catalog sizes. With 6 models, normalized <= 0.4 marks ranks 1-3 as Fast/Smart, which is 50%, not 40%. That changes both pool exposure and fast-mode routing decisions. Please compute pool membership from ordinal position/count (or precomputed cutoff ranks) instead of a fixed normalized threshold.

As per coding guidelines, **/*router*.{ts,tsx,js,jsx}: Models are classified into Fast (top 40% speed), Smart (top 40% intelligence), or Balanced pools via the classifyModel() function in router.ts. Fast mode routes to Fast pool first, then borrows from Balanced pool

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/router.ts` around lines 345 - 370, The classifyModel function uses normalized thresholds which yield incorrect top-40% cutoffs for small catalogs; change it to compute pool membership from ordinal positions and total count instead: compute totalCount = maxSpeedRank - minSpeedRank + 1 (and similarly for intelligence), compute ordinalPosition = speedRank - minSpeedRank + 1 (and for intelligence), compute cutoff = Math.max(1, Math.ceil(totalCount * 0.4)), then classify as Fast if ordinalPosition <= cutoff, Smart if intelligence ordinal <= cutoff, else Balanced; preserve the early-return when there is no variation and handle edge cases (totalCount <= 0) in classifyModel and reference the ModelPool enum.

coderabbitai · 2026-06-02T23:35:12Z

+  if (!statsCache) { refreshStatsCache(getDb(), true); }
  if (!statsCache) return [];


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Call the TTL-gated refresh unconditionally here.

Once statsCache exists, this path never refreshes it again, so the dashboard can keep serving stale scores until some other route happens to touch refreshStatsCache(). The new TTL gate already makes the unconditional call cheap.

Suggested fix

-export function getAnalyticsScores(): Array<{ +export function getAnalyticsScores(): Array<{ platform: string; modelName: string; modelId: string; @@ - if (!statsCache) { refreshStatsCache(getDb(), true); } + refreshStatsCache(getDb()); if (!statsCache) return [];

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/router.ts` around lines 407 - 408, Call refreshStatsCache(getDb(), true) unconditionally before returning cached stats so the TTL-gated refresh runs even when statsCache is already set; specifically, invoke refreshStatsCache(getDb(), true) unconditionally (rather than only when !statsCache) and then keep the existing guard if (!statsCache) return []; this ensures statsCache and the TTL logic are honored and prevents serving stale scores while preserving the cheap TTL check inside refreshStatsCache.

coderabbitai · 2026-06-02T23:35:12Z

+path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'
+
+content = [
+    "import { describe, it, expect, beforeAll, beforeEach } from 'vitest';",
+    "import { initDb, getDb } from '../../db/index.js';",
+    "import { encrypt } from '../../lib/crypto.js';",
+    "import { routeRequest, refreshStatsCache, getAnalyticsScores } from '../../services/router.js';",
+    "",
+    "describe('Router', () => {",
+    "  beforeAll(() => {",
+    "    process.env.ENCRYPTION_KEY = '0'.repeat(64);",
+    "    initDb(':memory:');",
+    "  });",
+    "",
+    "  beforeEach(() => {",
+    "    const db = getDb();",
+    "    db.prepare('DELETE FROM api_keys').run();",
+    "    const models = db.prepare('SELECT id, intelligence_rank FROM models ORDER BY intelligence_rank ASC').all() as any[];",
+    "    const update = db.prepare('UPDATE fallback_config SET priority = ? WHERE model_db_id = ?');",
+    "    for (let i = 0; i < models.length; i++) {",
+    "      update.run(i + 1, models[i].id);",
+    "    }",
+    "  });",
+    "",
+    "  it('should throw when no keys are configured', () => {",
+    "    expect(() => route{


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_test.py") src = path.read_text() try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 521

🏁 Script executed:

#!/bin/bash python3 - <<'PY' from pathlib import Path import ast path = Path("server/write_test.py") src = path.read_text(encoding="utf-8") try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 521

🏁 Script executed:

#!/bin/bash python3 - <<'PY' from pathlib import Path import ast path = Path("server/write_test.py") src = path.read_text(encoding="utf-8") try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 521

🏁 Script executed:

#!/bin/bash python3 - <<'PY' from pathlib import Path import ast path = Path("server/write_test.py") src = path.read_text(encoding="utf-8") try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 521

🏁 Script executed:

#!/bin/bash python3 - <<'PY' from pathlib import Path import ast path = Path("server/write_test.py") src = path.read_text(encoding="utf-8") ast.parse(src, filename=str(path)) print("syntax OK") PY

Repository: vi70x3/freellmapi

Length of output: 426

🏁 Script executed:

#!/bin/bash python3 - <<'PY' from pathlib import Path import ast path = Path("server/write_test.py") src = path.read_text(encoding="utf-8") try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 521

Critical: Fix server/write_test.py so it’s valid Python and remove the hard-coded absolute path.

Fails to parse with SyntaxError: unterminated string literal at server/write_test.py:29, so the script can’t run as committed.

Replace the machine-specific absolute path (/home/vi/freellmapi/...) with a repo-relative path (e.g., computed from __file__ / project root).

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 29-29: missing closing quote in string literal

(invalid-syntax)

[warning] 29-29: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/write_test.py` around lines 4 - 29, The file defines a JavaScript test payload in Python but currently contains an unterminated string and a machine-specific absolute path; fix the top-level variables 'path' and 'content' so the Python file parses: replace the hard-coded '/home/vi/freellmapi/...' assigned to the variable path with a repo-relative computed path (use __file__ / pathlib to resolve the project root and join the relative test path), and ensure the 'content' value is a properly closed Python list or a single properly quoted multi-line string so there are no unterminated string literals (verify brackets/quotes around the array are balanced and the last line completes the JS snippet such as the truncated "expect(() => route..." line). Ensure no other syntax errors remain so the script runs.

coderabbitai · 2026-06-02T23:35:12Z

+path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'
+
+# Part 1: existing tests
+part1 = """import { describe, it, expect, beforeAll, beforeEach } from 'vitest';
+import { initDb, getDb } from '../../db/index.js';
+import { encrypt } from '../../lib/crypto.js';
+import { routeRequest, refreshStatsCache, getAnalyticsScores } from '../../services/router.js';
+
+describe('Router', () => {
+  beforeAll(() => {
+    process.env.ENCRYPTION_KEY = '0'.repeat(64);
+    initDb(':memory:');
+  });
+
+  beforeEach(() => {
+    const db = getDb();
+    db.prepare('DELETE FROM api_keys').run();
+    const models = db.prepare('SELECT id, intelligence_rank FROM models ORDER BY intelligence_rank ASC').all() as any[];
+    const update = db.prepare('UPDATE fallback_config SET priority = ? WHERE model_db_id = ?');
+    for (let i = 0; i < models.length; i++) {
+      update.run(i + 1, models[i].id);
+    }
+  });
+
+  it('should throw when no keys are configured', () => {
+    expect(() => routeRequest()).toThrow(/exhausted/i);
+  });
+
+  it('should route to highest priority model with available key', () => {
+    const db = getDb();
+    const { encrypted, iv, authTag } = encrypt('test-groq-key');
+    db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('groq', 'test', encrypted, iv, authTag, 'healthy', 1);
+    const result = routeRequest();
+    expect(result.platform).toBe('groq');
+    expect(result.apiKey).toBe('test-groq-key');
+  });
+
+  it('should route to an available model when keys exist for multiple platforms', () => {
+    const db = getDb();
+    const googleKey = encrypt('test-google-key');
+    db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('google', 'test', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 1);
+    const groqKey{


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 613

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 613

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 613

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 613

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 613

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") PY

Repository: vi70x3/freellmapi

Length of output: 504

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") PY

Repository: vi70x3/freellmapi

Length of output: 504

🏁 Script executed:

#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("server/write_tests.py") src = path.read_text() try: ast.parse(src, filename=str(path)) print(f"{path}: syntax OK") except SyntaxError as e: print(f"{path}:{e.lineno}:{e.offset}: {e.msg}") raise PY

Repository: vi70x3/freellmapi

Length of output: 613

Critical: Fix server/write_tests.py generator (invalid Python + non-portable absolute path).

server/write_tests.py doesn’t parse: SyntaxError: unterminated triple-quoted string literal in part1 (part1 = """...), line 7 (detected at line 45).

Generator output path is hard-coded to /home/vi/freellmapi/server/src/__tests__/services/router.test.ts; switch to a repo-relative path or remove the generator.

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 7-45: missing closing quote in string literal

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/write_tests.py` around lines 4 - 45, The generator file server/write_tests.py currently contains an unterminated triple-quoted string in the part1 variable (the test text stops mid-token like "const groqKey{"), causing a SyntaxError; close or properly terminate part1 (complete the test contents or split into smaller, properly quoted strings and concatenate) and ensure all embedded JS quotes/braces are correctly escaped if needed; also replace the hard-coded absolute output path stored in path with a repo-relative construction (use pathlib.Path(__file__).parent / 'src' / '__tests__' / 'services' / 'router.test.ts' or os.path.join with repo root resolution) so the generator writes to a portable location.

coderabbitai

🧹 Nitpick comments (1)

client/src/pages/PlaygroundPage.tsx (1)

274-285: 💤 Low value

Consider adding ARIA attributes for accessibility.

The custom dropdown works but lacks semantic hints for screen readers. Adding aria-expanded to the trigger and role="menu" to the dropdown would improve accessibility without significant effort.

♿ Suggested accessibility improvements

 <Button
   variant="outline"
   className="w-[260px] justify-between font-normal"
   onClick={() => setModelPickerOpen(v => !v)}
+  aria-expanded={modelPickerOpen}
+  aria-haspopup="menu"
 >

-<div className="absolute right-0 top-full mt-2 z-50 max-h-[70vh] overflow-y-auto w-[360px] space-y-3 rounded-lg border bg-background p-3 shadow-lg">
+<div role="menu" className="absolute right-0 top-full mt-2 z-50 max-h-[70vh] overflow-y-auto w-[360px] space-y-3 rounded-lg border bg-background p-3 shadow-lg">

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/src/pages/PlaygroundPage.tsx` around lines 274 - 285, The dropdown
trigger Button (onClick={() => setModelPickerOpen(v => !v)}) and the conditional
dropdown div (rendered when modelPickerOpen is true) lack ARIA semantics; update
the Button to include aria-expanded={modelPickerOpen} and aria-controls
referencing the dropdown's id, and add role="menu" (and an id) to the dropdown
container div so screen readers can detect its expanded state and treat the
container as a menu; ensure any interactive items inside the dropdown use
appropriate roles (e.g., role="menuitem") if present.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@client/src/pages/PlaygroundPage.tsx`:
- Around line 274-285: The dropdown trigger Button (onClick={() =>
setModelPickerOpen(v => !v)}) and the conditional dropdown div (rendered when
modelPickerOpen is true) lack ARIA semantics; update the Button to include
aria-expanded={modelPickerOpen} and aria-controls referencing the dropdown's id,
and add role="menu" (and an id) to the dropdown container div so screen readers
can detect its expanded state and treat the container as a menu; ensure any
interactive items inside the dropdown use appropriate roles (e.g.,
role="menuitem") if present.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 56c2d9ac-e7b6-4dbc-90ed-56abb2c18f97

📥 Commits

Reviewing files that changed from the base of the PR and between 40abf0b and 134254c.

📒 Files selected for processing (2)

client/src/pages/PlaygroundPage.tsx
server/src/routes/proxy.ts

vi70x3 added 7 commits June 2, 2026 15:12

feat(providers): intercept wrapped error payloads on HTTP 200 responses

2fd34ba

feat(proxy): replace hardcoded LongCat/Owl Alpha cooldowns with gener…

09bffb0

…alized thread protection scanner

chore: temporary commit before switching branch

d4ea579

feat(router): add cache TTL gate, fix auto-refresh order, fix indenta…

be367a7

…tion, deduplicate RoutingMode type

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

feat(client): add PoolSection component for displaying model pools

40abf0b

vi70x3 added 2 commits June 3, 2026 02:30

refactor(proxy): implement pre-stream stall architecture, fix wouldEx…

5dbecea

…haustAll, remove double semicolons

feat(playground): group model selector by pool using PoolSection comp…

134254c

…onent

coderabbitai Bot suggested changes Jun 2, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

vi70x3 closed this Jun 5, 2026

vi70x4 deleted the spec/recency-biased-thompson-sampling branch June 7, 2026 00:30

	res.write('data: [DONE]\n\n');;
	res.write('data: [DONE]\\n\\n');


		## Problem Statement

		The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{ No newline at end of file

-## Out of
+## Out of Scope
+- Changes to the pool classification algorithm itself
+- Backend routing logic modifications
+- Analytics calculation changes
+- Model catalog structure changes

		if (!statsCache) { refreshStatsCache(getDb(), true); }
		if (!statsCache) return [];

Conversation

vi70x3 commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

vi70x3 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

kilo-code-bot Bot commented Jun 2, 2026 •

edited

Loading