feat(router): add Owl Alpha + LongCat model-level routing by vi70x3 · Pull Request #9 · animaios/api-llm-localhost

vi70x3 · 2026-06-02T04:03:11Z

Implements the owl-alpha-longcat-model-routing spec.

Summary by CodeRabbit

New Features
- Smart routing now prefers LongCat and Owl Alpha at model level when valid keys exist; balanced auto excludes them.
- Added sticky cooldown behavior for Owl Alpha.
Bug Fixes
- Disabled sticky-session behavior for balanced auto routing.
- Error handling changed from provider/platform bans to model-level skips for LongCat and Owl Alpha.
Tests
- Added balanced-mode coverage and updated smart-mode routing tests.
Documentation
- Added design, requirements, and task specs for these behaviors.

- Add EXCLUDED_FROM_BALANCED set for LongCat platform exclusion - Add EXCLUDED_MODELS_FROM_BALANCED map for Owl Alpha model exclusion - Filter excluded models in balanced mode before scoring - Extract hasValidKeys() helper for key capacity checks - Add Owl Alpha smart preference check in smart mode

coderabbitai · 2026-06-02T04:03:21Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 90ec7e3e-b39a-4762-be9d-f94afb23bcde

📥 Commits

Reviewing files that changed from the base of the PR and between 73d8dda and 8ddc74f.

📒 Files selected for processing (2)

server/src/routes/proxy.ts
server/src/services/router.ts

🚧 Files skipped from review as they are similar to previous changes (2)

server/src/services/router.ts
server/src/routes/proxy.ts

📝 Walkthrough

Walkthrough

This PR disables sticky-session pinning for balanced auto-routing via an early empty return from getSessionKey(), adds balanced-mode router exclusions and hasValidKeys() validation, implements smart-mode LongCat/Owl Alpha preference ordering, migrates provider-level bans to model-level skips, and updates tests accordingly.

Changes

Sticky Session Disable and Model-Level Routing

Layer / File(s)	Summary
Feature specs and design documentation `.roo/specs/disable-sticky-on-auto/`, `.roo/specs/owl-alpha-longcat-model-routing/`	Designs, requirements, and tasks describing disabling sticky for balanced mode and model-level routing/preferences for Owl Alpha and LongCat, including sticky cooldown and testing checklists.
Sticky session disable for balanced mode `server/src/routes/proxy.ts`, `server/src/__tests__/routes/provider-session-ban.test.ts`	`getSessionKey()` short-circuits to `''` when `routingMode === 'balanced'`, preventing downstream sticky operations; tests assert balanced mode ignores sticky entries, returns empty keys, forces ban checks false, and prevents entry creation.
Router model exclusions and key capacity validation `server/src/services/router.ts`	Adds balanced-mode exclusion constants for `longcat` and `openrouter/owl-alpha`; introduces `hasValidKeys()` to validate key eligibility (cooldown/capacity); computes a filtered chain for balanced routing and derives intelligence rank range from it.
Smart mode model preferences and sticky cooldown `server/src/services/router.ts`	Refactors smart-mode preference ordering: `longcat` moved to front only when `hasValidKeys()` succeeds; `openrouter/owl-alpha` repositioned to follow `longcat` (or appended); logs when owl-alpha preference activates; sticky cooldown extended to owl-alpha.
Proxy model-level skips and sticky clearing on errors `server/src/routes/proxy.ts`	Multiple truncation/5xx/mid-stream/retryable paths now add only the failing model to `skipModels` (by `route.modelDbId`) for LongCat and Owl Alpha, clear sticky pins when pinned to those models, and avoid provider-wide bans; adds owl-alpha sticky cooldown exclusion.
Test suite refactoring to smart mode and integration coverage `server/src/__tests__/routes/provider-session-ban.test.ts`	Migrated sticky-session tests to smart-mode (session keys, ban detection/creation, TTL expiry, reset behavior); added balanced-mode no-op suite; updated integration tests for ban lifecycle and skipModels; minor truncation sample tweak.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Proxy
  participant Router
  participant StickyStore
  participant SkipModels

  Client->>Proxy: POST request (routingMode)
  Proxy->>Proxy: getSessionKey(routingMode)
  alt routingMode == "balanced"
    Proxy-->>StickyStore: (no lookup — returns '')
  else
    Proxy->>StickyStore: lookup session key
  end
  Proxy->>Router: routeRequest(request, skipModels)
  Router->>SkipModels: apply balanced exclusions & sticky cooldown checks
  Router->>Router: hasValidKeys(platform, modelId) checks
  Router-->>Proxy: selected route
  Proxy->>StickyStore: set/clear sticky model if needed (skipped on balanced)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

vi70x3/freellmapi#2: Overlaps in proxy sticky-session behavior changes.
vi70x3/freellmapi#8: Related LongCat sticky-cooldown and preference logic.

Poem

🐰 Hopping through routes with a curious cheer,

Balanced lets go of the sticky-thread gear.
Owl and LongCat take ordered turns to play,
Smart mode prefers while balanced stays away.
Tests hum along as the routing finds its way.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: implementing model-level routing for Owl Alpha and LongCat, which is documented across multiple spec files and implemented in router.ts and proxy.ts.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/owl-alpha-longcat-model-routing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request disables sticky sessions on the balanced routing endpoint by returning an empty session key, and updates the smart routing endpoint to prefer both LongCat and Owl Alpha models when valid keys exist. It also introduces balanced-mode exclusions for these models and updates the test suite accordingly. The review feedback points out a potential issue in the smart preference sorting logic where the preferred openrouter/owl-alpha entry could be incorrectly placed behind a non-preferred longcat entry if the LongCat preference was not active but a longcat model happened to be sorted first by Thompson sampling, suggesting a more robust tracking of whether the preference was applied.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T04:04:29Z

  if (routingMode === 'smart') {
    const longcatEntries = sorted.filter(e => e.platform === 'longcat');
    if (longcatEntries.length > 0) {
-      // Check if any LongCat key passes rate-limit checks
-      const lcKeys = db.prepare(
-        'SELECT * FROM api_keys WHERE platform = ? AND enabled = 1 AND status != ?'
-      ).all('longcat', 'invalid') as KeyRow[];
-      if (lcKeys.length > 0) {
-        const sampleEntry = longcatEntries[0];
-        const lcLimits = {
-          rpm: sampleEntry.rpm_limit,
-          rpd: sampleEntry.rpd_limit,
-          tpm: sampleEntry.tpm_limit,
-          tpd: sampleEntry.tpd_limit,
-        };
-        const hasCapacity = lcKeys.some(key =>
-          !isOnCooldown(sampleEntry.platform, sampleEntry.model_id, key.id) &&
-          canMakeRequest(sampleEntry.platform, sampleEntry.model_id, key.id, lcLimits) &&
-          canUseTokens(sampleEntry.platform, sampleEntry.model_id, key.id, estimatedTokens, lcLimits)
-        );
-        if (hasCapacity) {
-          // Move all LongCat entries to front, preserving relative score order
-          const others = sorted.filter(e => e.platform !== 'longcat');
-          sorted.length = 0;
-          sorted.push(...longcatEntries, ...others);
+      const sampleEntry = longcatEntries[0];
+      const lcLimits = {
+        rpm: sampleEntry.rpm_limit,
+        rpd: sampleEntry.rpd_limit,
+        tpm: sampleEntry.tpm_limit,
+        tpd: sampleEntry.tpd_limit,
+      };
+      // T1.3: Use hasValidKeys() helper
+      if (hasValidKeys(sampleEntry.platform, sampleEntry.model_id, lcLimits, estimatedTokens)) {
+        // Move all LongCat entries to front, preserving relative score order
+        const others = sorted.filter(e => e.platform !== 'longcat');
+        sorted.length = 0;
+        sorted.push(...longcatEntries, ...others);
+      }
+    }
+
+    // T1.4: Owl Alpha smart preference — move openrouter/owl-alpha to front if any key has capacity
+    const owlAlphaEntry = sorted.find(e => e.platform === 'openrouter' && e.model_id === 'owl-alpha');
+    if (owlAlphaEntry) {
+      const oaLimits = {
+        rpm: owlAlphaEntry.rpm_limit,
+        rpd: owlAlphaEntry.rpd_limit,
+        tpm: owlAlphaEntry.tpm_limit,
+        tpd: owlAlphaEntry.tpd_limit,
+      };
+      if (hasValidKeys(owlAlphaEntry.platform, owlAlphaEntry.model_id, oaLimits, estimatedTokens)) {
+        // Remove owl-alpha from its current position
+        const owlIdx = sorted.indexOf(owlAlphaEntry);
+        if (owlIdx >= 0) {
+          sorted.splice(owlIdx, 1);
+        }
+        // Insert after LongCat entries (if any are at the front), preserving relative score order
+        const lastLongcatIdx = sorted.findIndex(e => e.platform !== 'longcat');
+        if (lastLongcatIdx === -1) {
+          // All entries are LongCat — append owl-alpha at the end
+          sorted.push(owlAlphaEntry);
+        } else {
+          // Insert right after the last LongCat entry
+          sorted.splice(lastLongcatIdx, 0, owlAlphaEntry);
        }
+        console.log('[Router] Owl Alpha preference active — moving openrouter/owl-alpha to front');
      }
    }
  }


The current logic for placing the preferred openrouter/owl-alpha entry uses sorted.findIndex(e => e.platform !== 'longcat') to find the insertion index. However, if LongCat was not preferred (e.g., due to no valid keys) but a longcat entry happened to be sorted at index 0 by chance of Thompson sampling, lastLongcatIdx will evaluate to 1. This incorrectly places the preferred owl-alpha entry at index 1 (behind a non-preferred longcat entry with no valid keys), leading to an unnecessary key validation check and fallback during routing.

Instead of scanning the array with findIndex, we can track whether the LongCat preference was actually applied (lcPreferred). If it was, we insert owl-alpha right after the preferred LongCat entries (at index longcatEntries.length). Otherwise, we insert it at the very front (index 0). This is more robust and avoids scanning the array.

if (routingMode === 'smart') { let lcPreferred = false; const longcatEntries = sorted.filter(e => e.platform === 'longcat'); if (longcatEntries.length > 0) { const sampleEntry = longcatEntries[0]; const lcLimits = { rpm: sampleEntry.rpm_limit, rpd: sampleEntry.rpd_limit, tpm: sampleEntry.tpm_limit, tpd: sampleEntry.tpd_limit, }; // T1.3: Use hasValidKeys() helper if (hasValidKeys(sampleEntry.platform, sampleEntry.model_id, lcLimits, estimatedTokens)) { // Move all LongCat entries to front, preserving relative score order const others = sorted.filter(e => e.platform !== 'longcat'); sorted.length = 0; sorted.push(...longcatEntries, ...others); lcPreferred = true; } } // T1.4: Owl Alpha smart preference — move openrouter/owl-alpha to front if any key has capacity const owlAlphaEntry = sorted.find(e => e.platform === 'openrouter' && e.model_id === 'owl-alpha'); if (owlAlphaEntry) { const oaLimits = { rpm: owlAlphaEntry.rpm_limit, rpd: owlAlphaEntry.rpd_limit, tpm: owlAlphaEntry.tpm_limit, tpd: owlAlphaEntry.tpd_limit, }; if (hasValidKeys(owlAlphaEntry.platform, owlAlphaEntry.model_id, oaLimits, estimatedTokens)) { // Remove owl-alpha from its current position const owlIdx = sorted.indexOf(owlAlphaEntry); if (owlIdx >= 0) { sorted.splice(owlIdx, 1); } // Insert after LongCat entries if they were preferred, otherwise at the front const insertIdx = lcPreferred ? longcatEntries.length : 0; sorted.splice(insertIdx, 0, owlAlphaEntry); console.log('[Router] Owl Alpha preference active — moving openrouter/owl-alpha to front'); } } }

- Add Owl Alpha sticky cooldown check alongside LongCat - Change all LongCat error handling from provider-level to model-level banning - Add identical model-level handling for Owl Alpha (openrouter/owl-alpha) - Replace banPlatformFromSession + addProviderModelsToSkipModels with skipModels.add(route.modelDbId) - Clear sticky preference when pinned to specific model on errors

coderabbitai

🧹 Nitpick comments (1)

server/src/services/router.ts (1)

572-578: 💤 Low value

Variable name lastLongcatIdx is misleading.

The variable actually holds the index of the first non-LongCat entry, not the last LongCat entry. The logic is correct, but the name could confuse future readers.

Suggested rename for clarity

-        const lastLongcatIdx = sorted.findIndex(e => e.platform !== 'longcat');
-        if (lastLongcatIdx === -1) {
+        const firstNonLongcatIdx = sorted.findIndex(e => e.platform !== 'longcat');
+        if (firstNonLongcatIdx === -1) {
           // All entries are LongCat — append owl-alpha at the end
           sorted.push(owlAlphaEntry);
         } else {
           // Insert right after the last LongCat entry
-          sorted.splice(lastLongcatIdx, 0, owlAlphaEntry);
+          sorted.splice(firstNonLongcatIdx, 0, owlAlphaEntry);
         }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 572 - 578, The variable
lastLongcatIdx is misnamed because it actually stores the index of the first
non-LongCat element; rename the identifier (e.g., lastLongcatIdx ->
firstNonLongcatIdx) throughout the block in the function/method where sorted is
used and update the surrounding comments to match ("first non-LongCat" / "insert
right after the last LongCat" as appropriate) so the name accurately reflects
its semantics; ensure the conditional check (=== -1) and the splice/pop logic
referencing that variable are updated to the new name as well.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/src/services/router.ts`:
- Around line 572-578: The variable lastLongcatIdx is misnamed because it
actually stores the index of the first non-LongCat element; rename the
identifier (e.g., lastLongcatIdx -> firstNonLongcatIdx) throughout the block in
the function/method where sorted is used and update the surrounding comments to
match ("first non-LongCat" / "insert right after the last LongCat" as
appropriate) so the name accurately reflects its semantics; ensure the
conditional check (=== -1) and the splice/pop logic referencing that variable
are updated to the new name as well.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9942f308-b710-4fd0-94eb-7f1864071b4b

📥 Commits

Reviewing files that changed from the base of the PR and between a468601 and c98bb7d.

📒 Files selected for processing (9)

.roo/specs/disable-sticky-on-auto/design.md
.roo/specs/disable-sticky-on-auto/requirements.md
.roo/specs/disable-sticky-on-auto/tasks.md
.roo/specs/owl-alpha-longcat-model-routing/design.md
.roo/specs/owl-alpha-longcat-model-routing/requirements.md
.roo/specs/owl-alpha-longcat-model-routing/tasks.md
server/src/__tests__/routes/provider-session-ban.test.ts
server/src/routes/proxy.ts
server/src/services/router.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/routes/proxy.ts`:
- Line 1234: The code calls skipModels.add(route.modelDbId) but route is only
declared later in the retry loop and is undefined here; replace that usage to
add the already-available DB id from preferredModel (i.e. call
skipModels.add(preferredModel) or skipModels.add(preferredModelModelDbId
depending on the shape of preferredModel) instead of route.modelDbId) so the
cooldown path doesn't throw a ReferenceError; update the skipModels.add call
near where preferredModel is in scope and remove reliance on the later-declared
route variable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 12bec45e-da83-4ebc-9de7-7b71c080604a

📥 Commits

Reviewing files that changed from the base of the PR and between c98bb7d and 73d8dda.

📒 Files selected for processing (2)

.roo/specs/owl-alpha-longcat-model-routing/tasks.md
server/src/routes/proxy.ts

✅ Files skipped from review due to trivial changes (1)

.roo/specs/owl-alpha-longcat-model-routing/tasks.md

…a cooldown

kilo-code-bot · 2026-06-02T04:28:16Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)

.roo/specs/owl-alpha-longcat-model-routing/requirements.md
.roo/specs/owl-alpha-longcat-model-routing/tasks.md
server/src/tests/routes/provider-session-ban.test.ts
server/src/routes/proxy.ts
server/src/services/router.ts

_{Reviewed by nemotron-3-super-120b-a12b-20230311:free · 495,738 tokens}

vi70x3 added 2 commits June 2, 2026 06:02

feat: disable sticky sessions for balanced mode

cb85bd7

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 2, 2026

View reviewed changes

coderabbitai Bot suggested changes Jun 2, 2026

View reviewed changes

Comment thread server/src/routes/proxy.ts Outdated

vi70x3 added 2 commits June 2, 2026 07:13

fix(router): correct Owl Alpha placement when LongCat is not preferred

33c76fd

fix(proxy): use preferredModel instead of route.modelDbId in Owl Alph…

8ddc74f

…a cooldown

coderabbitai Bot approved these changes Jun 2, 2026

View reviewed changes

vi70x3 merged commit 233e031 into main Jun 2, 2026
2 of 3 checks passed

vi70x3 deleted the spec/owl-alpha-longcat-model-routing branch June 2, 2026 04:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(router): add Owl Alpha + LongCat model-level routing#9

feat(router): add Owl Alpha + LongCat model-level routing#9
vi70x3 merged 5 commits into
mainfrom
spec/owl-alpha-longcat-model-routing

vi70x3 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

kilo-code-bot Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vi70x3 commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kilo-code-bot Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vi70x3 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

kilo-code-bot Bot commented Jun 2, 2026 •

edited

Loading