feat(router): implement recency-biased Thompson sampling (T1-T9) by vi70x3 · Pull Request #11 · animaios/api-llm-localhost

vi70x3 · 2026-06-02T17:17:26Z

Implements the recency-biased Thompson sampling spec (T1-T9):

T1: Added ANALYTICS_WINDOW_DAYS constant for SQL decay divisor
T2: Extended ModelStats interface with rawSuccesses/rawTotal fields
T3: Rewrote SQL query with CTE-based time-decay weighting using julianday()
T4: Updated statsCache population to use weighted column names
T5-T8: Added Math.max(0.1, ...) guards in thompsonSampleScore(), smartSampleScore(), getAnalyticsScore(), and getSmartAnalyticsScore() to handle fractional weights safely
T9: Updated getAnalyticsScores() dashboard to display rawTotal counts
Added modelId field to getAnalyticsScores() return type for fallback route compatibility

TypeScript compilation verified clean with npx tsc --noEmit.

Summary by CodeRabbit

New Features
- SSE stream heartbeats and stall protection for clearer timeouts and healthier streaming.
- New auto-fast routing mode that prioritizes speed.
Improvements
- Wrapped-error detection to surface provider error payloads reliably.
- Temporary global model cooldowns to reduce cascading failures.
- Recency‑biased routing (time-decay) for better model selection.
- Balanced mode skips sticky-session pinning and refines Owl‑Alpha/LongCat routing behavior.
Tests
- Added/updated tests covering streaming, cooldowns, routing, and sticky-session behaviors.

…alized thread protection scanner

- Add ANALYTICS_WINDOW_DAYS constant for SQL decay divisor - Extend ModelStats with rawSuccesses/rawTotal fields - Rewrite SQL query with CTE-based time-decay weighting (julianday) - Update statsCache population to use weighted columns - Add Math.max(0.1, ...) guards in thompsonSampleScore, smartSampleScore, getAnalyticsScore, and getSmartAnalyticsScore to handle fractional weights - Update getAnalyticsScores dashboard to display rawTotal counts - Add modelId field to getAnalyticsScores return type for fallback route compat

coderabbitai · 2026-06-02T17:17:40Z

📝 Walkthrough

Walkthrough

PR adds wrapped-error detection in providers, recency-weighted analytics and Thompson sampling guards, fast routing/pool classification plus Owl Alpha/LongCat preferences, shared transient model cooldowns and thread-protection scaffolding, balanced-mode sticky-session bypass, SSE stream keep-alive and stall abort mechanics, and matching tests, docs, and helper scripts.

Changes

Multi-feature Routing, Analytics, and Streaming Enhancements

Layer / File(s)	Summary
Wrapped provider error interception `server/src/providers/base.ts`, `server/src/providers/openai-compat.ts`, `server/src/providers/cohere.ts`, `server/src/providers/cloudflare.ts`, `server/src/providers/google.ts`, `.roo/specs/wrapped-error-interception/*`	`BaseProvider` gains protected `isWrappedError()` and `throwWrappedError()` and providers check/throw on wrapped errors in JSON and SSE chunks.
Recency-weighted analytics and Thompson sampling `server/src/services/router.ts`, `server/src/__tests__/services/router.test.ts`, `.roo/specs/recency-biased-thompson-sampling/*`	`refreshStatsCache()` uses a time-decay SQL CTE, stats include weighted and raw counters, Bayesian/Thompson parameters are clamped with `Math.max(0.1, ...)`, and `getAnalyticsScores()` exposes `modelName`/`thompsonScore`.
Router model selection: balanced exclusions and smart preferences `server/src/services/router.ts`, `server/src/__tests__/services/router.test.ts`, `README.md`, `CLAUDE.md`, `.roo/specs/owl-alpha-longcat-model-routing/*`	Adds `fast` routing mode and `ModelPool` classification, balanced mode excludes LongCat and Owl Alpha, smart mode prefers LongCat and conditionally prefers Owl Alpha via `hasValidKeys()`.
Transient model cooldowns `server/src/routes/proxy.ts`, `server/src/__tests__/routes/transient-cooldown.test.ts`, `.roo/specs/transient-model-cooldown/*`	Adds exported `transientModelCooldowns` and `TRANSIENT_COOLDOWN_MS`; prunes/injects cooldowns into per-request `skipModels`, clears sticky preferred model when cooled, and registers cooldowns on retryable/mid-stream 5xx errors.
Thread protection scanner (scaffold) `server/src/services/threadProtection.ts`, `.roo/specs/generalized-thread-protection/*`	New types and `evaluateThreadProtection()` placeholder; proxy imports evaluator and generalizes sticky cooldown gating to use protection decisions.
Sticky session behavior by routing mode `server/src/__tests__/routes/provider-session-ban.test.ts`, `.roo/specs/disable-sticky-on-auto/*`	Balanced mode returns empty session key (no-oping sticky helpers); smart mode retains sticky behavior. Tests updated to assert balanced skips and smart behavior.
SSE stream heartbeat and stall detection `server/src/routes/proxy.ts`, `server/src/__tests__/routes/stream-heartbeat-stall.test.ts`, `.roo/specs/sse-stream-heartbeat-stall-protection/*`	Adds `streamKeepaliveConfig`, per-stream last-chunk tracking, periodic `: keep-alive` comments after headers, pre/mid-stream stall handling (504 pre-stream, error-frame mid-stream), idempotent cleanup on client close/error, and tests.
Tests and helper scripts `server/src/__tests__/*`, `do_fix.py`, `fix.py`, `fix_streaming.py`, `new_streaming_block.txt`, `server/write_test.py`, `server/write_tests.py`	Extensive new/updated tests for router, proxy, streaming, and cooldowns; small Python scripts for local streaming-block replacement and router test generation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

vi70x3/freellmapi#9: Overlaps on Owl Alpha/LongCat routing preferences and model-level banning conversions.
vi70x3/freellmapi#8: Modifies proxy sticky-cooldown/skipModels behavior and thread-protection mechanics closely related to this PR.

Poem

🐰 With routes rebalanced, errors caught mid-flight,
And heartbeats thumping through the streaming night—
LongCat and Owl now prefer their rightful lanes,
While Thompson samples dance on decay's refrains.
Cooldowns shared and stalls now tamed with care, this rabbit hops through changes, light as air!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/recency-biased-thompson-sampling

⚔️ Resolve merge conflicts

Resolve merge conflict in branch spec/recency-biased-thompson-sampling

kilo-code-bot · 2026-06-02T17:19:28Z

+    const intelligenceRank = intelligenceMap.get(`${platform}:${modelName}`);
+    const thompsonScore = thompsonSampleScore(platform, modelName, intelligenceRank, minIntelligenceRank, maxIntelligenceRank);
    result.push({
+    modelId: modelName,


WARNING: Inconsistent indentation — modelId is indented at 4 spaces while the surrounding fields (platform, modelName, score, etc.) are at 6 spaces. This makes the object literal harder to scan. Should be 6 spaces to match the rest of the block.

kilo-code-bot · 2026-06-02T17:19:34Z

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	2
SUGGESTION	0

Issue Details (click to expand)

WARNING

File	Line	Issue
`server/src/providers/base.ts`	116	TypeScript type error - missing closing brace in type assertion
`server/src/services/router.ts`	379	Inconsistent indentation — `modelId` is indented at 4 spaces while surrounding fields are at 6 spaces

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File	Line	Issue
None

Files Reviewed (4 files)

server/src/providers/base.ts - 1 issues
server/src/providers/cloudflare.ts - 0 issues
server/src/providers/cohere.ts - 0 issues
server/src/providers/openai-compat.ts - 0 issues
server/src/services/router.ts - 1 issues (carried forward)

_{Reviewed by nemotron-3-super-120b-a12b-20230311:free · 260,483 tokens}

gemini-code-assist

Code Review

This pull request introduces several major enhancements to the routing and proxy layers, including disabling sticky sessions on the balanced endpoint, implementing recency-biased Thompson sampling with linear time-decay, adding SSE stream heartbeats and stall protection, introducing shared transient model cooldowns, and intercepting wrapped error payloads on HTTP 200 responses. The review feedback highlights critical issues that need to be addressed: a missing cache TTL check in refreshStatsCache() causing performance degradation, unreachable auto-refresh logic in getAnalyticsScores(), unsafe 504 handling inside a setInterval callback for pre-stream stalls, an incomplete exhaustion check in the thread protection logic that ignores already-skipped models, and an invalid test case seeding Unix timestamps instead of ISO-8601 strings.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T17:19:41Z


 export function refreshStatsCache(db: Database, force = false): void {
-  if (!force && statsCache && Date.now() - statsCacheTime < ANALYTICS_CACHE_TTL_MS) return;



The cache TTL check was removed during refactoring. This causes the heavy SQLite time-decay query to run on every single request, creating a major performance bottleneck. Re-add the early return check to ensure the cache is respected.

Suggested change

if (!force && statsCache && Date.now() - statsCacheTime < ANALYTICS_CACHE_TTL_MS) return;

gemini-code-assist · 2026-06-02T17:19:41Z

  if (!statsCache) return [];
+  if (!statsCache) { refreshStatsCache(getDb(), true); }


The check if (!statsCache) return []; is placed before the auto-refresh logic if (!statsCache) { refreshStatsCache(getDb(), true); }. This makes the auto-refresh logic completely unreachable, and the function will always return [] if the cache is uninitialized. Swap the order of these statements.

Suggested change

if (!statsCache) return [];

if (!statsCache) { refreshStatsCache(getDb(), true); }

if (!statsCache) { refreshStatsCache(getDb(), true); }

if (!statsCache) return [];

gemini-code-assist · 2026-06-02T17:19:41Z

+            } else {
+              // Pre-stream stall — no headers sent yet, response is still retryable
+              // Send a 504 so the client gets a proper error signal
+              try {
+                res.status(504).json({
+                  error: { message: `Upstream provider stalled before yielding any data from ${route.displayName}`, type: 'stream_timeout' },
+                });
+              } catch { /* Socket already gone */ }
+            }


Sending a 504 response directly from inside the setInterval callback does not stop the pending for await loop or the upstream fetch request, leading to leaked sockets and suspended generator execution. Additionally, because no error is thrown to the outer retry loop (and throwing inside setInterval would crash the process), pre-stream stalls do not trigger fallback to other healthy providers.

To fix this safely without crashing the process or leaking resources, consider wrapping the async generator in a timeout-wrapped iterator (using Promise.race on each chunk) so that the timeout error is thrown synchronously within the for await loop. This allows the error to be caught by the existing try/catch block, triggering proper cleanup, logging, and fallback retries.

gemini-code-assist · 2026-06-02T17:19:41Z

+      // Check if cooldown would exhaust all models
+      let wouldExhaustAll = true;
+      for (const id of allEnabledIds) {
+        if (!activeCooldownModels.has(id)) {
+          wouldExhaustAll = false;
+          break;
+        }
+      }


The exhaustion check only checks if activeCooldownModels covers all enabled models, ignoring any models already present in skipModels (e.g., from session bans or transient cooldowns). This can lead to false 'All models exhausted' errors when the combined set of skipped models covers all enabled models. Update the check to evaluate the union of both sets.

Suggested change

// Check if cooldown would exhaust all models

let wouldExhaustAll = true;

for (const id of allEnabledIds) {

if (!activeCooldownModels.has(id)) {

wouldExhaustAll = false;

break;

}

}

// Check if cooldown would exhaust all models

let wouldExhaustAll = true;

for (const id of allEnabledIds) {

if (!activeCooldownModels.has(id) && !skipModels.has(id)) {

wouldExhaustAll = false;

break;

}

}

gemini-code-assist · 2026-06-02T17:19:42Z

+      db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
+        .run('test', 'old-model', 'success', 0, 0, 0, null, now - 14 * 24 * 60 * 60);
+      db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
+        .run('test', 'new-model', 'success', 0, 0, 0, null, now - 1 * 24 * 60 * 60);


The test seeds created_at as a Unix timestamp (seconds), but the production code uses julianday(created_at) which expects an ISO-8601 string. This causes julianday to return null and makes the test invalid. Use ISO-8601 strings for the test requests.

Suggested change

db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')

.run('test', 'old-model', 'success', 0, 0, 0, null, now - 14 * 24 * 60 * 60);

db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')

.run('test', 'new-model', 'success', 0, 0, 0, null, now - 1 * 24 * 60 * 60);

const dateOffset = (days: number) => new Date(Date.now() - days * 24 * 60 * 60 * 1000).toISOString();

db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')

.run('test', 'old-model', 'success', 0, 0, 0, null, dateOffset(14));

db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')

.run('test', 'new-model', 'success', 0, 0, 0, null, dateOffset(1));

coderabbitai

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/src/services/router.ts (1)

334-335: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale doc comment: window is 7 days, not 24 h.

The window constant (ANALYTICS_WINDOW_MS) is 7 days; update the comment to avoid misleading readers about the analytics horizon.

📝 Proposed fix

- * Returns current analytics scores for every (platform, model_id) pair seen in
- * the last 24 h. Used by the fallback dashboard to surface routing rationale.
+ * Returns current analytics scores for every (platform, model_id) pair seen in
+ * the analytics window (7 days). Used by the fallback dashboard to surface routing rationale.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 334 - 335, Update the stale doc
comment that says "last 24 h" to reflect the actual analytics window of 7 days;
locate the comment above the function that returns analytics scores (the block
referencing ANALYTICS_WINDOW_MS and returning current analytics scores for every
(platform, model_id) pair) and change "24 h" to "7 days" (or reference
ANALYTICS_WINDOW_MS) so the documentation matches the constant and
implementation.

🧹 Nitpick comments (6)

server/write_tests.py (1)
1-45: ⚡ Quick win

Remove committed scaffolding scripts.

write_tests.py (and the related write_test.py) appear to be one-off generators that emit router.test.ts by string concatenation against a hardcoded absolute path (/home/vi/freellmapi/...). These look like temporary artifacts ("chore: temporary commit before switching branch") and shouldn't ship in the repo — the generated router.test.ts is the source of truth. The static-analysis "missing closing quote" warning is just the truncated diff; not a real defect.

Want me to open an issue to track removing these generator scripts (and do_fix.py/fix.py/fix_streaming.py/new_streaming_block.txt)?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/write_tests.py` around lines 1 - 45, Remove the one-off test generator
scripts and any committed scaffolding: delete write_tests.py and write_test.py
(and also do_fix.py, fix.py, fix_streaming.py, new_streaming_block.txt if
present), leaving the generated router.test.ts as the canonical test file;
ensure no hardcoded absolute paths (e.g., '/home/vi/freellmapi/...') remain in
the repo, and if you want to keep a generator, move it out of source control or
update it to write relative paths and add it to .gitignore.
server/src/services/router.ts (1)
602-602: 💤 Low value

Unconditional console.log on every smart-mode Owl Alpha routing.

This fires on each request whenever Owl Alpha has capacity, adding noise to logs on the hot routing path. Consider gating behind a debug flag or removing it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` at line 602, Remove or gate the unconditional
console.log call so it doesn't spam on every Owl Alpha routing; replace the line
containing console.log('[Router] Owl Alpha preference active — moving
openrouter/owl-alpha to front') with a conditional debug/logger call (e.g.,
processLogger.debug(...) or if (process.env.DEBUG === 'true') console.log(...))
or remove it entirely, ensuring the change is made in the router routing code
that performs the Owl Alpha capacity check so the message only appears when
debug/verbose logging is enabled.
.roo/specs/generalized-thread-protection/requirements.md (1)
5-5: 💤 Low value

Requirements doc is incomplete/truncated.

Line 5 ends mid-sentence with a dangling `longcat`{ and the document contains no enumerated requirements. The corresponding threadProtection.ts is also only a placeholder, so this spec doesn't yet describe the intended scanner behavior.

Want me to draft the missing problem statement and requirements (R1–Rn) based on the existing hardcoded longcat branches and the ThreadProtectionAction shape?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/requirements.md at line 5, The
requirements document is truncated and missing the problem statement and
enumerated requirements; update
.roo/specs/generalized-thread-protection/requirements.md to include a complete
problem statement plus clear R1–Rn requirements that reflect the existing
implementation: inspect server/src/routes/proxy.ts to enumerate the 6+ hardcoded
"longcat" branches and derive the behaviors they special-case, then use the
ThreadProtectionAction shape from threadProtection.ts to define the scanner's
inputs, outputs, and side-effects (e.g., detection rules, allowed actions,
escalation flow, and telemetry). Ensure the spec lists concrete requirements
(R1, R2, ...) mapping each hardcoded branch to the intended generalized rule and
include expected action types from ThreadProtectionAction so implementers can
replace the hardcodes with a generalized scanner.
server/src/__tests__/routes/transient-cooldown.test.ts (1)
80-98: ⚖️ Poor tradeoff

Tests re-implement production logic instead of exercising it.

Suites 2, 3, 5, and 6 inline-copy the cooldown pruning loop and (in Suite 5) the error-classification booleans rather than calling the actual functions exported from proxy.ts. As written, these assert a private re-implementation, so they will keep passing even if the production injection/classification logic diverges or regresses. Consider exporting and invoking the real pruning/classification helpers so the suite guards behavior rather than a copy of it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/transient-cooldown.test.ts` around lines 80 - 98,
The test re-implements the cooldown pruning loop using transientModelCooldowns
and TRANSIENT_COOLDOWN_MS instead of calling the production helpers; update the
tests to import and invoke the real functions exported from proxy.ts (e.g., the
cooldown pruning helper such as pruneTransientCooldowns and the
error-classification helper such as isTransientError) rather than duplicating
the loop and booleans, and assert behavior based on those functions' outputs;
ensure transientModelCooldowns is set up as the test fixture and then call
pruneTransientCooldowns() (or the actual exported name) and use
isTransientError()/the exported classifier where suites 2,3,5,6 currently
inline-copy logic.
fix.py (1)
1-8: ⚡ Quick win

Remove or finish this temporary patch script before merge.

This file is currently an incomplete utility and reads like a checkpoint artifact. Please either complete it with a clear invocation contract or remove it to avoid dead maintenance surface.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fix.py` around lines 1 - 8, fix.py is an incomplete checkpoint script that
should not be merged as-is; either delete the file or finish it by implementing
a clear invocation contract (add a main() entry, argparse for input/output
paths, error handling, and explicit write/backup behavior) and fully implement
the logic that locates and replaces the streaming block (currently searching for
the marker "for await (const chunk{") in server/src/routes/proxy.ts, plus add
tests and README usage notes; ensure the final change removes this dead
maintenance surface or converts it into a well-tested utility with a documented
CLI before merging.
.roo/specs/sse-stream-heartbeat-stall-protection/design.md (1)
198-278: 💤 Low value

Consider streamlining the design document.

This section walks through multiple iterations of the stall-handling approach (lines 198–239 discuss one approach, then lines 240–278 present a revised "better approach," then lines 243–278 present a "final approach"). While the thought process is valuable, the multiple revisions make it harder for future readers to quickly understand the settled design. Consider moving the earlier approaches to an "Alternatives Considered" appendix and keeping only the final approach in the main flow.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 198
- 278, The "Stall Before Headers Sent" section contains multiple draft
approaches (initial, "better", and "final") which should be consolidated: move
the earlier iterations into an "Alternatives Considered" appendix and keep only
the final approach in the main flow that throws a retryable error when
streamStarted === false; update the main text to describe just the settled
behavior (use of streamAborted, cleanupStream(), MAX_STREAM_STALL_MS, and the
thrown 504 error referencing route.displayName) and ensure the appendix
documents the other variants for historical context.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/generalized-thread-protection/design.md:
- Around line 43-45: The fenced code block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" is missing a
language tag and triggers markdownlint MD040; update the fence to include the
dotenv language identifier (i.e., change the opening ``` to ```dotenv) so the
block reads as a dotenv snippet and satisfies the linter while preserving the
exact THREAD_PROTECTION_PLATFORMS content.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md:
- Line 5: Several fenced code blocks in the document are missing language
identifiers which triggers MD040; update each triple-backtick block to include a
language tag (e.g., ```text) so they pass markdown lint and improve readability
by replacing occurrences of ``` with ```text for all code fences in the file.

In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md:
- Around line 122-126: Update the broken Markdown links in requirements.md that
reference server files (mentions of server/src/services/router.ts,
server/src/routes/proxy.ts and server/src/db/index.ts and symbols
migrateModelsV15/migrateModelsV16) so they resolve from this document's
location: replace the incorrect relative paths with correct relative paths from
.roo/specs/owl-alpha-longcat-model-routing/requirements.md (or use repo-root
absolute links) and verify each link opens the target file and symbol names
(router.ts, proxy.ts, index.ts, migrateModelsV15, migrateModelsV16) correctly in
Markdown viewers.

In @.roo/specs/recency-biased-thompson-sampling/requirements.md:
- Line 13: The markdown link in requirements.md currently points to a
non-resolving source target for refreshStatsCache; update the link so it
correctly references the refreshStatsCache implementation (router.ts) by
replacing the broken target with a valid repo-relative or permalink to the file
and line or to a function-anchor (e.g., link directly to the refreshStatsCache
symbol) so readers can jump to the actual implementation.

In `@do_fix.py`:
- Around line 1-7: The file do_fix.py is an incomplete local utility (unclosed
block after the docstring) that shouldn't be committed; delete do_fix.py from
the commit or remove its contents so it no longer appears in the repo, and
ensure no references to it remain; locate the file named "do_fix.py" and either
remove the file entirely or restore it to a complete, intentional script
(closing the stray block and adding working code) before including it in the
repo.

In `@fix_streaming.py`:
- Around line 1-21: This file is a WIP utility with an unclosed string literal
(starts at the new_streaming raw string) and should not be committed; remove
fix_streaming.py from the PR (delete the file or revert its addition), ensure no
other modules reference symbols like new_streaming, ttfbMs, lastChunkTimestamp,
or heartbeatInterval, and if you need to keep it locally add it to .gitignore or
stash it so it isn’t re-added to the repo.

In `@new_streaming_block.txt`:
- Around line 1-28: The included patch/scaffold file new_streaming_block.txt
(and related local artifacts do_fix.py, fix.py, fix_streaming.py) is incomplete
and contains a dangling '{' inside the stallTimeout block (see stallTimeout and
the stray opening brace), so remove these files from the PR or move them out of
the repository; if any intended changes belong in real source, reapply them as a
complete patch in the correct source files (fix the truncated stallTimeout
implementation in the streaming logic before committing).

In `@server/src/__tests__/services/router.test.ts`:
- Around line 81-92: The test is asserting on the raw unweighted total returned
by getAnalyticsScores(), which won’t show recency preference and may omit
old-model because its request is outside the 7-day window; update the test to
insert both requests within the window (e.g., created_at within the last 7 days)
and assert on a recency-weighted metric returned by getAnalyticsScores() such as
successRate or thompsonScore (instead of total), e.g., verify new-model’s
successRate/thompsonScore is greater than old-model’s; keep calls to
refreshStatsCache(getDb()) and use scores.find(s => s.modelName === '...') to
locate entries.
- Around line 106-113: The test freezes stochastic behavior by calling
getAnalyticsScores() once and reusing the same thompsonScore values; change the
loop so scores are re-sampled each iteration (e.g., call getAnalyticsScores() or
a dedicated sampling function inside the for loop) and then pick the max from
that freshly computed array, ensuring the test exercises Thompson sampling's
randomness and validates probabilistic preference between model-a and model-b.

In `@server/src/providers/base.ts`:
- Around line 142-145: The assignment to error.status using Number((errPayload
as Record<string, unknown>).code) can produce NaN; update the logic in the block
that sets error.status (referencing error.status and errPayload) to parse and
validate the code value first—coerce to a numeric value, confirm
Number.isFinite(value) and value > 0 (and optionally Math.floor(value) === value
for integer-only statuses), and only then assign it; otherwise fall back to 200.
Ensure the guard handles non-object or missing code cases and avoids propagating
NaN into error.status.

In `@server/src/routes/proxy.ts`:
- Line 1376: Remove the stray double semicolon after the res.write call by
changing the statement that writes the SSE terminator (res.write('data:
[DONE]\n\n');;) to a single-semicolon termination; locate the res.write('data:
[DONE]\n\n') invocation in proxy route handling (the res.write call shown in the
diff) and ensure it ends with only one semicolon.
- Around line 1380-1388: The handler currently sends a 504 JSON response
directly when detecting a pre-stream stall (using res.status(504).json(...)),
which bypasses the outer retry loop; instead, change the behavior in that branch
to throw an Error (or custom Error) with a status property set to 504 and
include the same message referencing route.displayName so the outer catch can
detect status===504 and retry providers; remove the direct res.* call from that
branch and ensure the thrown error shape matches what the outer error handling
expects (status: 504, message) so MAX_RETRIES/fallback logic will be exercised
and update stream-heartbeat-stall.test.ts to assert retry behavior rather than
immediate 504.

In `@server/src/services/router.ts`:
- Around line 348-349: The early return prevents the on-demand refresh: when
statsCache is falsy the function immediately returns [] so the subsequent call
to refreshStatsCache(getDb(), true) never executes; remove or move the early
return and instead call refreshStatsCache(getDb(), true) when statsCache is
falsy (using statsCache check) and only return [] after awaiting or deciding not
to wait for refresh, ensuring the code around statsCache, refreshStatsCache(...)
and getDb() is reordered so the refresh path runs when the cache is missing.
- Around line 183-216: The SQL is broken for epoch-second timestamps: update the
query in router.ts (the weighted_requests CTE used by the stats calculation in
refreshStatsCache) to treat created_at as Unix epoch by using
julianday(created_at, 'unixepoch') for recency_weight and converting the WHERE
clause to compare equivalent datetimes (e.g., datetime(created_at, 'unixepoch')
>= ?) so rows inserted as epoch seconds are matched; ensure both the recency
calculation and the window predicate use the same unixepoch conversion rather
than plain julianday(created_at) or a mismatched ISO comparison.

In `@server/src/services/threadProtection.ts`:
- Around line 20-23: evaluateThreadProtection currently returns a placeholder
ThreadProtectionAction but is never invoked and its fields (banProvider,
skipModel, clearStickyIfPinned) are unused; update the request handling in
server/src/routes/proxy.ts to import and call
evaluateThreadProtection(ErrorContext) inside the proxy request flow (e.g.,
within the handler that currently uses hardcoded LongCat branches), then apply
the returned action flags: if banProvider true, trigger the provider ban flow;
if skipModel true, bypass or select an allowed model path; if
clearStickyIfPinned true, clear any pinned sticky context before processing.
Also either implement real thread-protection logic inside
evaluateThreadProtection or document/remove the placeholder so the function and
its ThreadProtectionAction fields are actually exercised by the proxy flow.

In `@server/write_test.py`:
- Line 4: The hardcoded machine-specific absolute path assigned to the variable
path in write_test.py should be replaced with a portable resolution (e.g., build
the test file path relative to the repository or current file using pathlib or
os.path.join and __file__) so CI and other developers can run it; update the
variable path (reference: variable name "path" in write_test.py) to compute a
relative path to src/__tests__/services/router.test.ts instead of the absolute
'/home/vi/...'.
- Line 29: Fix the unterminated string in the list named content by completing
or closing the quoted entry so the Python list literal parses; locate the
`content` array in server/write_test.py (the failing entry starts with "   
expect(() => route{) and either terminate the string with a matching quote and
any needed escape sequences or replace it with a properly quoted string literal
so the list is syntactically valid.

---

Outside diff comments:
In `@server/src/services/router.ts`:
- Around line 334-335: Update the stale doc comment that says "last 24 h" to
reflect the actual analytics window of 7 days; locate the comment above the
function that returns analytics scores (the block referencing
ANALYTICS_WINDOW_MS and returning current analytics scores for every (platform,
model_id) pair) and change "24 h" to "7 days" (or reference ANALYTICS_WINDOW_MS)
so the documentation matches the constant and implementation.

---

Nitpick comments:
In @.roo/specs/generalized-thread-protection/requirements.md:
- Line 5: The requirements document is truncated and missing the problem
statement and enumerated requirements; update
.roo/specs/generalized-thread-protection/requirements.md to include a complete
problem statement plus clear R1–Rn requirements that reflect the existing
implementation: inspect server/src/routes/proxy.ts to enumerate the 6+ hardcoded
"longcat" branches and derive the behaviors they special-case, then use the
ThreadProtectionAction shape from threadProtection.ts to define the scanner's
inputs, outputs, and side-effects (e.g., detection rules, allowed actions,
escalation flow, and telemetry). Ensure the spec lists concrete requirements
(R1, R2, ...) mapping each hardcoded branch to the intended generalized rule and
include expected action types from ThreadProtectionAction so implementers can
replace the hardcodes with a generalized scanner.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md:
- Around line 198-278: The "Stall Before Headers Sent" section contains multiple
draft approaches (initial, "better", and "final") which should be consolidated:
move the earlier iterations into an "Alternatives Considered" appendix and keep
only the final approach in the main flow that throws a retryable error when
streamStarted === false; update the main text to describe just the settled
behavior (use of streamAborted, cleanupStream(), MAX_STREAM_STALL_MS, and the
thrown 504 error referencing route.displayName) and ensure the appendix
documents the other variants for historical context.

In `@fix.py`:
- Around line 1-8: fix.py is an incomplete checkpoint script that should not be
merged as-is; either delete the file or finish it by implementing a clear
invocation contract (add a main() entry, argparse for input/output paths, error
handling, and explicit write/backup behavior) and fully implement the logic that
locates and replaces the streaming block (currently searching for the marker
"for await (const chunk{") in server/src/routes/proxy.ts, plus add tests and
README usage notes; ensure the final change removes this dead maintenance
surface or converts it into a well-tested utility with a documented CLI before
merging.

In `@server/src/__tests__/routes/transient-cooldown.test.ts`:
- Around line 80-98: The test re-implements the cooldown pruning loop using
transientModelCooldowns and TRANSIENT_COOLDOWN_MS instead of calling the
production helpers; update the tests to import and invoke the real functions
exported from proxy.ts (e.g., the cooldown pruning helper such as
pruneTransientCooldowns and the error-classification helper such as
isTransientError) rather than duplicating the loop and booleans, and assert
behavior based on those functions' outputs; ensure transientModelCooldowns is
set up as the test fixture and then call pruneTransientCooldowns() (or the
actual exported name) and use isTransientError()/the exported classifier where
suites 2,3,5,6 currently inline-copy logic.

In `@server/src/services/router.ts`:
- Line 602: Remove or gate the unconditional console.log call so it doesn't spam
on every Owl Alpha routing; replace the line containing console.log('[Router]
Owl Alpha preference active — moving openrouter/owl-alpha to front') with a
conditional debug/logger call (e.g., processLogger.debug(...) or if
(process.env.DEBUG === 'true') console.log(...)) or remove it entirely, ensuring
the change is made in the router routing code that performs the Owl Alpha
capacity check so the message only appears when debug/verbose logging is
enabled.

In `@server/write_tests.py`:
- Around line 1-45: Remove the one-off test generator scripts and any committed
scaffolding: delete write_tests.py and write_test.py (and also do_fix.py,
fix.py, fix_streaming.py, new_streaming_block.txt if present), leaving the
generated router.test.ts as the canonical test file; ensure no hardcoded
absolute paths (e.g., '/home/vi/freellmapi/...') remain in the repo, and if you
want to keep a generator, move it out of source control or update it to write
relative paths and add it to .gitignore.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8671dcfb-981d-4f90-897b-25ef8a0a3cab

📥 Commits

Reviewing files that changed from the base of the PR and between 233e031 and 45d2546.

📒 Files selected for processing (41)

.roo/specs/disable-sticky-on-auto/design.md
.roo/specs/disable-sticky-on-auto/requirements.md
.roo/specs/disable-sticky-on-auto/tasks.md
.roo/specs/generalized-thread-protection/design.md
.roo/specs/generalized-thread-protection/requirements.md
.roo/specs/generalized-thread-protection/tasks.md
.roo/specs/owl-alpha-longcat-model-routing/design.md
.roo/specs/owl-alpha-longcat-model-routing/requirements.md
.roo/specs/owl-alpha-longcat-model-routing/tasks.md
.roo/specs/recency-biased-thompson-sampling/design.md
.roo/specs/recency-biased-thompson-sampling/requirements.md
.roo/specs/recency-biased-thompson-sampling/tasks.md
.roo/specs/sse-stream-heartbeat-stall-protection/design.md
.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md
.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md
.roo/specs/transient-model-cooldown/design.md
.roo/specs/transient-model-cooldown/requirements.md
.roo/specs/transient-model-cooldown/tasks.md
.roo/specs/wrapped-error-interception/design.md
.roo/specs/wrapped-error-interception/requirements.md
.roo/specs/wrapped-error-interception/tasks.md
do_fix.py
fix.py
fix_streaming.py
fix{
new_streaming_block.txt
server/src/__tests__/routes/provider-session-ban.test.ts
server/src/__tests__/routes/proxy-tools.test.ts
server/src/__tests__/routes/stream-heartbeat-stall.test.ts
server/src/__tests__/routes/transient-cooldown.test.ts
server/src/__tests__/services/router.test.ts
server/src/providers/base.ts
server/src/providers/cloudflare.ts
server/src/providers/cohere.ts
server/src/providers/google.ts
server/src/providers/openai-compat.ts
server/src/routes/proxy.ts
server/src/services/router.ts
server/src/services/threadProtection.ts
server/write_test.py
server/write_tests.py

coderabbitai · 2026-06-02T17:29:43Z

+```
+THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

This fence is missing a language identifier and will trip markdownlint MD040.

🛠️ Suggested fix

-``` +```dotenv THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 43-43: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 45, The fenced code block containing THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" is missing a language tag and triggers markdownlint MD040; update the fence to include the dotenv language identifier (i.e., change the opening ``` to ```dotenv) so the block reads as a dotenv snippet and satisfies the linter while preserving the exact THREAD_PROTECTION_PLATFORMS content.

coderabbitai · 2026-06-02T17:29:44Z

+
+## Architecture Overview
+
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks (MD040).

These code fences are missing language tags, which will fail markdown lint and reduce readability.

Suggested fix

-``` +```text ... -``` +``` -``` +```text ... -``` +``` -``` +```text ... -``` +``` -``` +```text ... -``` +```

Also applies to: 52-52, 82-82, 96-96

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/owl-alpha-longcat-model-routing/design.md at line 5, Several fenced code blocks in the document are missing language identifiers which triggers MD040; update each triple-backtick block to include a language tag (e.g., ```text) so they pass markdown lint and improve readability by replacing occurrences of ``` with ```text for all code fences in the file.

coderabbitai · 2026-06-02T17:29:44Z

+- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts)
+- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
+- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
+- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15`
+- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16`


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dependency links appear incorrectly relative.

Lines 122–126 likely point to non-existent paths from this folder, so these references won’t open in markdown viewers.

Suggested fix

-- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts) -- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts) -- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts) -- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15` -- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16` +- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../../../server/src/services/router.ts) +- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts) +- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts) +- Owl Alpha model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV15` +- LongCat model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV16`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts)

- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)

- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)

- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15`

- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16`

- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../../../server/src/services/router.ts)

- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts)

- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts)

- Owl Alpha model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV15`

- LongCat model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV16`

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md around lines 122 - 126, Update the broken Markdown links in requirements.md that reference server files (mentions of server/src/services/router.ts, server/src/routes/proxy.ts and server/src/db/index.ts and symbols migrateModelsV15/migrateModelsV16) so they resolve from this document's location: replace the incorrect relative paths with correct relative paths from .roo/specs/owl-alpha-longcat-model-routing/requirements.md (or use repo-root absolute links) and verify each link opens the target file and symbol names (router.ts, proxy.ts, index.ts, migrateModelsV15, migrateModelsV16) correctly in Markdown viewers.

coderabbitai · 2026-06-02T17:29:44Z

+
+### R-1: Linear Time-Decay Weighting
+
+The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix broken source link target in spec.

Line 13’s link path likely won’t resolve from this directory (server/src/...:174), so readers can’t jump to the referenced implementation line.

Suggested fix

-The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. +The SQL query aggregating historical requests in [`refreshStatsCache()`](../../../server/src/services/router.ts) must calculate a weight for each logged request based on its age.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.

The SQL query aggregating historical requests in [`refreshStatsCache()`](../../../server/src/services/router.ts) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/recency-biased-thompson-sampling/requirements.md at line 13, The markdown link in requirements.md currently points to a non-resolving source target for refreshStatsCache; update the link so it correctly references the refreshStatsCache implementation (router.ts) by replacing the broken target with a valid repo-relative or permalink to the file and line or to a function-anchor (e.g., link directly to the refreshStatsCache symbol) so readers can jump to the actual implementation.

coderabbitai · 2026-06-02T17:29:44Z

+#!/usr/bin/env python3
+"""Replace the streaming block in proxy.ts with Promise.race-based stall detection."""
+
+with open('server/src/routes/proxy.ts', 'r') as f:
+    content = f.read()
+
+{


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove work-in-progress utility script.

This Python script is incomplete (line 7 has an unclosed code block) and appears to be a local development utility that was accidentally committed. The script's docstring mentions "Promise.race-based stall detection," but the actual implementation in proxy.ts uses setInterval, suggesting this script predates or diverges from the final approach.

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 7-7: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@do_fix.py` around lines 1 - 7, The file do_fix.py is an incomplete local utility (unclosed block after the docstring) that shouldn't be committed; delete do_fix.py from the commit or remove its contents so it no longer appears in the repo, and ensure no references to it remain; locate the file named "do_fix.py" and either remove the file entirely or restore it to a complete, intentional script (closing the stray block and adding working code) before including it in the repo.

coderabbitai · 2026-06-02T17:29:45Z

  const rows = db.prepare(`
-    SELECT platform, model_id,
-      COUNT(*) as total,
-      SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
+    WITH weighted_requests AS (
+      SELECT 
+        platform, 
+        model_id,
+        status,
+        latency_ms,
+        output_tokens,
+        ttfb_ms,
+        MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / ?)) as recency_weight
+      FROM requests
+      WHERE created_at >= ?
+    )
+    SELECT 
+      platform, 
+      model_id,
+      SUM(recency_weight) as total_weighted,
+      SUM(CASE WHEN status = 'success' THEN recency_weight ELSE 0 END) as successes_weighted,
+      COUNT(*) as raw_total,
+      SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as raw_successes,
      CASE
        WHEN SUM(CASE WHEN status = 'success' THEN latency_ms ELSE 0 END) > 0
        THEN SUM(CASE WHEN status = 'success' THEN output_tokens ELSE 0 END) * 1000.0
             / SUM(CASE WHEN status = 'success' THEN latency_ms ELSE 0 END)
        ELSE 0
      END as tok_per_sec,
      AVG(CASE WHEN status = 'success' AND ttfb_ms IS NOT NULL THEN ttfb_ms END) as avg_ttfb_ms
-    FROM requests
-    WHERE created_at >= ?
+    FROM weighted_requests
    GROUP BY platform, model_id
-  `).all(since) as Array<{
-    platform: string; model_id: string; total: number; successes: number;
+  `).all(ANALYTICS_WINDOW_DAYS, since) as Array<{
+    platform: string; model_id: string; total_weighted: number; successes_weighted: number;
+    raw_total: number; raw_successes: number;
    tok_per_sec: number; avg_ttfb_ms: number | null;
  }>;


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find created_at column definition and how it's written for the requests table rg -nP "created_at" --type=ts -C2 -g '!**/__tests__/**' fd -e sql | xargs rg -nP "created_at|CREATE TABLE requests" -C2 rg -nP "INSERT INTO requests" --type=ts -C2 -g '!**/__tests__/**'

Repository: vi70x3/freellmapi

Length of output: 19477

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the test that inserts created_at and how "now" is computed sed -n '1,220p' server/src/__tests__/services/router.test.ts # Check how production inserts into requests (whether created_at is omitted and thus uses default datetime('now')) rg -n "INSERT INTO requests" -S server/src/routes/proxy.ts sed -n '1760,1845p' server/src/routes/proxy.ts # Check requests table schema sed -n '1,180p' server/src/db/index.ts # Also locate the API/analytics "since" param formation rg -n "const since|since =" server/src -S

Repository: vi70x3/freellmapi

Length of output: 16890

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,260p' server/src/services/router.ts rg -n "refreshStatsCache|getAnalyticsScores|getSinceTimestamp|ANALYTICS_WINDOW" server/src/services/router.ts rg -n "created_at" server/src/services/router.ts

Repository: vi70x3/freellmapi

Length of output: 10753

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for unixepoch usage rg -n "unixepoch|julianday\$\\s*created_at\$|julianday\$'" server/src -S # Search for any schema/triggers/views referencing requests.created_at rg -n "CREATE\\s+(TRIGGER|VIEW|VIRTUAL)\\b|requests\\b.*created_at|created_at.*requests" server/src/db -S || true rg -n "requests\\b" server/src/db/index.ts server/src/db -S # Check router tests and any other tests inserting created_at rg -n "INSERT INTO requests \\(.*created_at\$" server/src/__tests__ -S

Repository: vi70x3/freellmapi

Length of output: 3091

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check if sqlite3 CLI is available command -v sqlite3 >/dev/null 2>&1 && echo "sqlite3_cli_available=1" || echo "sqlite3_cli_available=0" # Run a couple of direct SQLite expressions to see how julianday() treats numeric values # Use an epoch-seconds-like value (~2026-01-01) for the "numeric created_at" case. EPOCH_SECS=1767225600 # 2026-01-01T00:00:00Z ISO="2026-01-01T00:00:00Z" sqlite3 -batch :memory: " SELECT julianday('now') AS jd_now, julianday($EPOCH_SECS) AS jd_from_epoch_seconds_number, julianday('$ISO') AS jd_from_iso_string, (julianday('now') - julianday($EPOCH_SECS)) AS delta_days_numeric, (julianday('now') - julianday('$ISO')) AS delta_days_iso; "

Repository: vi70x3/freellmapi

Length of output: 152

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' import sqlite3, datetime, time, math con = sqlite3.connect(':memory:') cur = con.cursor() # Ensure column is TEXT like in schema cur.execute("CREATE TABLE requests(created_at TEXT NOT NULL)") con.commit() # Pick a known epoch seconds epoch_secs = 1767225600 # 2026-01-01T00:00:00Z iso = "2026-01-01T00:00:00Z" # Insert numeric into TEXT column cur.execute("INSERT INTO requests(created_at) VALUES (?)", (epoch_secs,)) con.commit() row = cur.execute(""" SELECT julianday('now') AS jd_now, julianday(?) AS jd_from_numeric_arg, julianday(?) AS jd_from_iso_arg, julianday(created_at) AS jd_from_numeric_stored_in_text, created_at AS stored_value FROM requests; """, (epoch_secs, iso)).fetchone() print("jd_now:", row[0]) print("jd_from_numeric_arg:", row[1]) print("jd_from_iso_arg:", row[2]) print("jd_from_numeric_stored_in_text:", row[3]) print("stored_value:", row[4]) # Also compare WHERE created_at >= ? for both cases # Use since as ISO string (as production does) since_iso = datetime.datetime.utcnow().replace(microsecond=0).isoformat()+"Z" # Numeric compare target: store "0" and see coercion q = """ SELECT SUM(CASE WHEN created_at >= ? THEN 1 ELSE 0 END) AS cnt_ge_since_iso, typeof(created_at) AS created_at_type FROM requests; """ cnt = cur.execute(q, (since_iso,)).fetchone()[0] type_ = cur.execute(q, (since_iso,)).fetchone()[1] print("since_iso:", since_iso) print("cnt_ge_since_iso:", cnt) print("created_at_type:", type_) # Show what happens with unixepoch modifier row2 = cur.execute(""" SELECT julianday(created_at, 'unixepoch') AS jd_from_numeric_unixepoch_modifier FROM requests; """).fetchone() print("jd_from_numeric_unixepoch_modifier:", row2[0]) PY

Repository: vi70x3/freellmapi

Length of output: 323

Fix recency decay + windowing for Unix-epoch requests.created_at

refreshStatsCache passes since as an ISO-8601 string, but tests insert requests.created_at as Unix epoch seconds; with epoch-seconds-in-TEXT, julianday(created_at) evaluates to NULL (so recency_weight is wrong) and WHERE created_at >= ? returns no matching rows.

Update the SQL to use epoch-safe conversion (e.g., julianday(created_at,'unixepoch')), and make the window predicate compare in the same domain (e.g., datetime(created_at,'unixepoch') >= ?, or pass since as epoch seconds instead of ISO).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/router.ts` around lines 183 - 216, The SQL is broken for epoch-second timestamps: update the query in router.ts (the weighted_requests CTE used by the stats calculation in refreshStatsCache) to treat created_at as Unix epoch by using julianday(created_at, 'unixepoch') for recency_weight and converting the WHERE clause to compare equivalent datetimes (e.g., datetime(created_at, 'unixepoch') >= ?) so rows inserted as epoch seconds are matched; ensure both the recency calculation and the window predicate use the same unixepoch conversion rather than plain julianday(created_at) or a mismatched ISO comparison.

coderabbitai · 2026-06-02T17:29:45Z

  if (!statsCache) return [];
+  if (!statsCache) { refreshStatsCache(getDb(), true); }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

On-demand cache refresh is unreachable — the early return [] defeats it.

Line 348 returns when statsCache is null, so the newly added refresh on line 349 can never run. The intended "refresh stats on-demand when the cache is missing" behavior (T9) is dead code, and the dashboard will return an empty array on the first call before any other code path has populated the cache.

🐛 Proposed fix

- if (!statsCache) return []; - if (!statsCache) { refreshStatsCache(getDb(), true); } + if (!statsCache) { refreshStatsCache(getDb(), true); } + if (!statsCache) return [];

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!statsCache) return [];

if (!statsCache) { refreshStatsCache(getDb(), true); }

if (!statsCache) { refreshStatsCache(getDb(), true); }

if (!statsCache) return [];

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/router.ts` around lines 348 - 349, The early return prevents the on-demand refresh: when statsCache is falsy the function immediately returns [] so the subsequent call to refreshStatsCache(getDb(), true) never executes; remove or move the early return and instead call refreshStatsCache(getDb(), true) when statsCache is falsy (using statsCache check) and only return [] after awaiting or deciding not to wait for refresh, ensuring the code around statsCache, refreshStatsCache(...) and getDb() is reordered so the refresh path runs when the cache is missing.

coderabbitai · 2026-06-02T17:29:45Z

+export function evaluateThreadProtection(_ctx: ErrorContext): ThreadProtectionAction {
+  // Placeholder implementation: no protection
+  return { banProvider: false, skipModel: false, clearStickyIfPinned: false, reason: 'off' };
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect how proxy.ts imports and uses evaluateThreadProtection / its action flags. rg -nP --type=ts -C4 'evaluateThreadProtection|ThreadProtectionAction|banProvider|skipModel|clearStickyIfPinned'

Repository: vi70x3/freellmapi

Length of output: 39415

🏁 Script executed:

#!/bin/bash # Find all calls to evaluateThreadProtection and where its returned flags are used. rg -n --type=ts 'evaluateThreadProtection\s*\(' server/src/routes/proxy.ts server/src/services server/src/routes # Inspect flag-property usage across the codebase (esp. in proxy.ts). rg -n --type=ts '\b(banProvider|skipModel|clearStickyIfPinned|ThreadProtectionAction)\b' server/src/routes/proxy.ts server/src/services

Repository: vi70x3/freellmapi

Length of output: 750

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Any call sites anywhere in server/src rg -n --type=ts 'evaluateThreadProtection\s*\(' server/src # 2) Any usage of the action fields anywhere in server/src rg -n --type=ts '\b(banProvider|skipModel|clearStickyIfPinned)\b' server/src # 3) Where proxy.ts references thread protection (log labels, comments, etc.) rg -n --type=ts 'ThreadProtection|thread protection|evaluateThreadProtection' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 898

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find any mention/call sites of evaluateThreadProtection across the repo rg -n --type=ts 'evaluateThreadProtection' server/src # Inspect the proxy.ts area where [ThreadProtection] logs occur sed -n '1240,1345p' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 4292

🏁 Script executed:

#!/bin/bash set -euo pipefail # Confirm no call sites in proxy.ts rg -n --type=ts 'evaluateThreadProtection\s*\(' server/src/routes/proxy.ts # Confirm only import in proxy.ts rg -n --type=ts 'evaluateThreadProtection' server/src/routes/proxy.ts # Confirm action flags aren’t used elsewhere rg -n --type=ts '\bbanProvider\b|\bskipModel\b|\bclearStickyIfPinned\b' server/src

Repository: vi70x3/freellmapi

Length of output: 43

Thread protection placeholder won’t disable anything right now
evaluateThreadProtection() is a placeholder returning “off”, but server/src/routes/proxy.ts only imports it and never calls it (no evaluateThreadProtection( usage), and the action fields banProvider/skipModel/clearStickyIfPinned aren’t referenced anywhere outside threadProtection.ts. LongCat protection continues to be handled by existing hardcoded proxy branches, so there’s no behavioral regression from this placeholder.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/threadProtection.ts` around lines 20 - 23, evaluateThreadProtection currently returns a placeholder ThreadProtectionAction but is never invoked and its fields (banProvider, skipModel, clearStickyIfPinned) are unused; update the request handling in server/src/routes/proxy.ts to import and call evaluateThreadProtection(ErrorContext) inside the proxy request flow (e.g., within the handler that currently uses hardcoded LongCat branches), then apply the returned action flags: if banProvider true, trigger the provider ban flow; if skipModel true, bypass or select an allowed model path; if clearStickyIfPinned true, clear any pinned sticky context before processing. Also either implement real thread-protection logic inside evaluateThreadProtection or document/remove the placeholder so the function and its ThreadProtectionAction fields are actually exercised by the proxy flow.

coderabbitai · 2026-06-02T17:29:45Z

+#!/usr/bin/env python3
+"""Write the complete router.test.ts file."""
+
+path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid machine-specific absolute output paths.

Line 4 hardcodes a local filesystem path, so the generator will fail for other devs/CI environments.

Suggested fix

+from pathlib import Path + -path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts' +path = Path(__file__).resolve().parent / "src/__tests__/services/router.test.ts"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/write_test.py` at line 4, The hardcoded machine-specific absolute path assigned to the variable path in write_test.py should be replaced with a portable resolution (e.g., build the test file path relative to the repository or current file using pathlib or os.path.join and __file__) so CI and other developers can run it; update the variable path (reference: variable name "path" in write_test.py) to compute a relative path to src/__tests__/services/router.test.ts instead of the absolute '/home/vi/...'.

coderabbitai · 2026-06-02T17:29:45Z

+    "  });",
+    "",
+    "  it('should throw when no keys are configured', () => {",
+    "    expect(() => route{


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show relevant portion of the file around the reported line echo "---- server/write_test.py: lines 1-120 (numbered) ----" nl -ba server/write_test.py | sed -n '1,120p' echo "---- server/write_test.py: focus lines 15-45 (numbered) ----" nl -ba server/write_test.py | sed -n '15,45p' echo "---- rg patterns in server/write_test.py ----" rg -n "expect\$\\(\$ => route|route\\{|routeRequest|unterminated|string literal" server/write_test.py || true

Repository: vi70x3/freellmapi

Length of output: 177

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- server/write_test.py: lines 1-120 (numbered via cat -n) ----" cat -n server/write_test.py | sed -n '1,120p' echo "---- server/write_test.py: focus lines 15-45 (numbered via cat -n) ----" cat -n server/write_test.py | sed -n '15,45p' echo "---- rg patterns in server/write_test.py ----" rg -n "expect\$\\(\$ => route|route\\{|routeRequest" server/write_test.py || true

Repository: vi70x3/freellmapi

Length of output: 2505

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- line count ----" wc -l server/write_test.py echo "---- server/write_test.py: lines 20-45 ----" sed -n '20,45p' server/write_test.py | cat -n echo "---- server/write_test.py: tail (last 30 lines) ----" tail -n 30 server/write_test.py | cat -n

Repository: vi70x3/freellmapi

Length of output: 2179

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- py_compile (syntax check) ----" python -m py_compile server/write_test.py echo "---- tail with cat -n ----" tail -n 10 server/write_test.py | cat -n

Repository: vi70x3/freellmapi

Length of output: 252

Fix unterminated string literal in content array (line 29).

server/write_test.py fails to parse with SyntaxError: unterminated string literal (detected at line 29). Close/complete the quoted entry at line 29 so the Python list literal is syntactically valid.

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 29-29: missing closing quote in string literal

(invalid-syntax)

[warning] 29-29: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/write_test.py` at line 29, Fix the unterminated string in the list named content by completing or closing the quoted entry so the Python list literal parses; locate the `content` array in server/write_test.py (the failing entry starts with " expect(() => route{) and either terminate the string with a matching quote and any needed escape sequences or replace it with a properly quoted string literal so the list is syntactically valid.

- Add 'fast' to RoutingMode type and ModelPool enum (Fast/Balanced/Smart) - Implement classifyModel() to categorize models by speed/intelligence ranks - Implement fastSampleScore() with 3x speed weight, 1.5x TTFB factor, minimal intelligence - Add pool-based routing in routeRequest: fast pool first, balanced pool borrowing fallback - Add freellmapi/auto-fast model ID support in proxy routes - Extend /v1/models endpoint with primaryPool field for all models - Add unit tests for pool classification (T13-T16) and fast routing (T17-T18) - Add integration tests for /v1/models primaryPool fields (T13-T17) - Update README.md and CLAUDE.md documentation

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

server/src/services/router.ts (2)

436-439: ⚠️ Potential issue | 🟡 Minor | 💤 Low value

modelId indentation is inconsistent with sibling fields.

modelId sits at 4 spaces while platform, modelName, etc. are at 6.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 436 - 439, The object literal
pushed in result.push has inconsistent indentation for the modelId property;
align modelId with its sibling properties (platform, modelName) inside the same
object literal in the function that constructs the result array (look for
result.push({ modelId: modelName, platform, modelName, ... })). Adjust spacing
so modelId uses the same indentation level as platform and modelName.

406-407: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

On-demand refresh is unreachable — the return [] on Line 406 precedes the refresh on Line 407.

The early return executes first, so refreshStatsCache(getDb(), true) never runs and the dashboard returns [] on a cold cache.

🐛 Proposed fix

-  if (!statsCache) return [];
-  if (!statsCache) { refreshStatsCache(getDb(), true); }
+  if (!statsCache) refreshStatsCache(getDb(), true);
+  if (!statsCache) return [];

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 406 - 407, The early return makes
the on-demand refresh unreachable; change the logic around statsCache so
refreshStatsCache(getDb(), true) is invoked when statsCache is falsy before
returning. Specifically, replace the two-check pattern with a single conditional
that calls refreshStatsCache(getDb(), true) and then returns [] (e.g., if
(!statsCache) { refreshStatsCache(getDb(), true); return []; }) so
refreshStatsCache is actually executed; ensure you update the block around
statsCache and remove the redundant check.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/services/router.ts`:
- Line 170: Remove the local type alias RoutingMode in
server/src/services/router.ts and instead re-export it from the shared types
module; specifically delete the line "export type RoutingMode = 'balanced' |
'smart' | 'fast';" and add an explicit re-export like "export type { RoutingMode
} from '`@freellmapi/shared/types.js`'" so existing imports (e.g., type
RoutingMode used by ../routes/proxy.ts) continue to work and the definition is
deduplicated.

---

Duplicate comments:
In `@server/src/services/router.ts`:
- Around line 436-439: The object literal pushed in result.push has inconsistent
indentation for the modelId property; align modelId with its sibling properties
(platform, modelName) inside the same object literal in the function that
constructs the result array (look for result.push({ modelId: modelName,
platform, modelName, ... })). Adjust spacing so modelId uses the same
indentation level as platform and modelName.
- Around line 406-407: The early return makes the on-demand refresh unreachable;
change the logic around statsCache so refreshStatsCache(getDb(), true) is
invoked when statsCache is falsy before returning. Specifically, replace the
two-check pattern with a single conditional that calls
refreshStatsCache(getDb(), true) and then returns [] (e.g., if (!statsCache) {
refreshStatsCache(getDb(), true); return []; }) so refreshStatsCache is actually
executed; ensure you update the block around statsCache and remove the redundant
check.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 70c02531-22d4-4634-8883-b25713001452

📥 Commits

Reviewing files that changed from the base of the PR and between 45d2546 and 5617fc3.

📒 Files selected for processing (7)

CLAUDE.md
README.md
server/src/__tests__/integration/full-flow.test.ts
server/src/__tests__/services/router.test.ts
server/src/routes/proxy.ts
server/src/services/router.ts
shared/types.ts

✅ Files skipped from review due to trivial changes (2)

CLAUDE.md
README.md

🚧 Files skipped from review as they are similar to previous changes (1)

server/src/routes/proxy.ts

coderabbitai · 2026-06-02T19:46:39Z

 }

-export type RoutingMode = 'balanced' | 'smart';
+export type RoutingMode = 'balanced' | 'smart' | 'fast';


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP "RoutingMode" --type=ts -C2

Repository: vi70x3/freellmapi

Length of output: 5164

🏁 Script executed:

#!/bin/bash rg -n --type=ts "type RoutingMode|RoutingMode" server/src -S

Repository: vi70x3/freellmapi

Length of output: 1449

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- router.ts imports (top of file) ----" sed -n '1,120p' server/src/services/router.ts | nl -ba | head -n 120 echo echo "---- router.ts RoutingMode declaration ----" sed -n '140,220p' server/src/services/router.ts | nl -ba echo echo "---- proxy.ts imports (top of file) ----" sed -n '1,120p' server/src/routes/proxy.ts | nl -ba | head -n 120

Repository: vi70x3/freellmapi

Length of output: 164

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- router.ts imports (top of file) ----" sed -n '1,120p' server/src/services/router.ts | cat -n | head -n 120 echo echo "---- router.ts RoutingMode declaration ----" sed -n '140,220p' server/src/services/router.ts | cat -n echo echo "---- proxy.ts imports (top of file) ----" sed -n '1,140p' server/src/routes/proxy.ts | cat -n | head -n 140

Repository: vi70x3/freellmapi

Length of output: 16450

Deduplicate RoutingMode by re-exporting it from shared types

server/src/services/router.ts locally defines export type RoutingMode = 'balanced' | 'smart' | 'fast', while server/src/routes/proxy.ts imports type RoutingMode from ../services/router.js. Instead of maintaining a second independent definition, have router.ts re-export RoutingMode from @freellmapi/shared/types.js (there are no other RoutingMode imports from router.ts).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/router.ts` at line 170, Remove the local type alias RoutingMode in server/src/services/router.ts and instead re-export it from the shared types module; specifically delete the line "export type RoutingMode = 'balanced' | 'smart' | 'fast';" and add an explicit re-export like "export type { RoutingMode } from '`@freellmapi/shared/types.js`'" so existing imports (e.g., type RoutingMode used by ../routes/proxy.ts) continue to work and the definition is deduplicated.

- extractErrorMessage: handle string error payloads and object errors with message field - throwWrappedError: guard against non-numeric error codes (NaN prevention) - cloudflare/cohere/openai-compat: move isWrappedError check outside JSON.parse try/catch in SSE loops

kilo-code-bot · 2026-06-02T23:05:53Z


-    const err = body as { error?: { message?: string }; errors?: Array<{ message?: string }>; message?: string };
-    return err.error?.message ?? err.errors?.[0]?.message ?? err.message ?? fallback;
+    const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };


WARNING: TypeScript type error - missing closing brace in type assertion

The type assertion for err has an incomplete type: Array<{ message?: string>; is missing a closing brace for the object type. This will cause a TypeScript compilation error.

Suggested change

const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };

const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

server/src/providers/base.ts (1)

159-173: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Constrain wrapped code to a real HTTP status.

Number(codeVal) no longer produces NaN, but it still accepts values like "Infinity", 0, -1, or 429.5. proxy.ts only checks typeof status === 'number', so these still bypass the retry/rate-limit branches this helper is meant to trigger. Fall back to 200 unless the parsed value is a finite HTTP status.

Suggested fix

-      const num = Number(codeVal);
-      if (!isNaN(num)) {
-        statusCode = num;
-      }
+      const num = Number(codeVal);
+      if (Number.isInteger(num) && num >= 100 && num < 600) {
+        statusCode = num;
+      }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/providers/base.ts` around lines 159 - 173, The parsed
errPayload.code should be validated as a finite integer HTTP status before
assigning to error.status: replace the loose Number(codeVal) acceptance with
logic that converts codeVal to a Number, verifies Number.isFinite(num) and
Number.isInteger(num), and ensures the value is within the valid HTTP range
(100–599); only then set statusCode = num, otherwise leave statusCode as 200 so
proxy.ts's typeof number checks don't get bypassed by "Infinity", 0, negatives,
or non-integer values (referencing variables errPayload, codeVal, statusCode,
and error.status).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/providers/cloudflare.ts`:
- Around line 132-135: Before calling this.throwWrappedError(parsed)
cancel/abort the upstream SSE stream so the reader/body is closed; specifically,
when you detect a wrapped error via this.isWrappedError(parsed) call the stream
cancellation API (e.g., await reader.cancel() or this.abortController.abort())
and handle/ignore any cancellation errors, then invoke
this.throwWrappedError(parsed) — ensure the cancel/abort is awaited or completed
synchronously so the generator exits with the stream already closed.

---

Duplicate comments:
In `@server/src/providers/base.ts`:
- Around line 159-173: The parsed errPayload.code should be validated as a
finite integer HTTP status before assigning to error.status: replace the loose
Number(codeVal) acceptance with logic that converts codeVal to a Number,
verifies Number.isFinite(num) and Number.isInteger(num), and ensures the value
is within the valid HTTP range (100–599); only then set statusCode = num,
otherwise leave statusCode as 200 so proxy.ts's typeof number checks don't get
bypassed by "Infinity", 0, negatives, or non-integer values (referencing
variables errPayload, codeVal, statusCode, and error.status).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 61c0201f-eda8-48c9-802b-a3f625d518d9

📥 Commits

Reviewing files that changed from the base of the PR and between 5617fc3 and 1ac7c15.

📒 Files selected for processing (4)

server/src/providers/base.ts
server/src/providers/cloudflare.ts
server/src/providers/cohere.ts
server/src/providers/openai-compat.ts

🚧 Files skipped from review as they are similar to previous changes (2)

server/src/providers/openai-compat.ts
server/src/providers/cohere.ts

coderabbitai · 2026-06-02T23:07:47Z

+        // Check for wrapped errors outside of the JSON parse try/catch block
+        if (this.isWrappedError(parsed)) {
+          this.throwWrappedError(parsed);
        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancel the SSE reader before throwing the wrapped error.

The spec for this feature requires the stream to be aborted when a chunk is actually an error payload. Throwing here exits the generator, but it leaves the upstream reader/body open until the remote side closes it.

Suggested fix

// Check for wrapped errors outside of the JSON parse try/catch block if (this.isWrappedError(parsed)) { + await reader.cancel().catch(() => undefined); this.throwWrappedError(parsed); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/providers/cloudflare.ts` around lines 132 - 135, Before calling this.throwWrappedError(parsed) cancel/abort the upstream SSE stream so the reader/body is closed; specifically, when you detect a wrapped error via this.isWrappedError(parsed) call the stream cancellation API (e.g., await reader.cancel() or this.abortController.abort()) and handle/ignore any cancellation errors, then invoke this.throwWrappedError(parsed) — ensure the cancel/abort is awaited or completed synchronously so the generator exits with the stream already closed.

vi70x3 added 4 commits June 2, 2026 15:12

feat(providers): intercept wrapped error payloads on HTTP 200 responses

2fd34ba

feat(proxy): replace hardcoded LongCat/Owl Alpha cooldowns with gener…

09bffb0

…alized thread protection scanner

chore: temporary commit before switching branch

d4ea579

kilo-code-bot Bot reviewed Jun 2, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot suggested changes Jun 2, 2026

View reviewed changes

kilo-code-bot Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot suggested changes Jun 2, 2026

View reviewed changes

vi70x3 closed this Jun 2, 2026


		export function refreshStatsCache(db: Database, force = false): void {
		if (!force && statsCache && Date.now() - statsCacheTime < ANALYTICS_CACHE_TTL_MS) return;

		if (!statsCache) return [];
		if (!statsCache) { refreshStatsCache(getDb(), true); }


		### R-1: Linear Time-Decay Weighting

		The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.

	const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };
	const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };

Conversation

vi70x3 commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

kilo-code-bot Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

WARNING

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

vi70x3 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

kilo-code-bot Bot commented Jun 2, 2026 •

edited

Loading