Skip to content

feat(router): implement recency-biased Thompson sampling (T1-T9)#11

Closed
vi70x3 wants to merge 6 commits into
mainfrom
spec/recency-biased-thompson-sampling
Closed

feat(router): implement recency-biased Thompson sampling (T1-T9)#11
vi70x3 wants to merge 6 commits into
mainfrom
spec/recency-biased-thompson-sampling

Conversation

@vi70x3

@vi70x3 vi70x3 commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Implements the recency-biased Thompson sampling spec (T1-T9):

  • T1: Added ANALYTICS_WINDOW_DAYS constant for SQL decay divisor
  • T2: Extended ModelStats interface with rawSuccesses/rawTotal fields
  • T3: Rewrote SQL query with CTE-based time-decay weighting using julianday()
  • T4: Updated statsCache population to use weighted column names
  • T5-T8: Added Math.max(0.1, ...) guards in thompsonSampleScore(), smartSampleScore(), getAnalyticsScore(), and getSmartAnalyticsScore() to handle fractional weights safely
  • T9: Updated getAnalyticsScores() dashboard to display rawTotal counts
  • Added modelId field to getAnalyticsScores() return type for fallback route compatibility

TypeScript compilation verified clean with npx tsc --noEmit.

Summary by CodeRabbit

  • New Features

    • SSE stream heartbeats and stall protection for clearer timeouts and healthier streaming.
    • New auto-fast routing mode that prioritizes speed.
  • Improvements

    • Wrapped-error detection to surface provider error payloads reliably.
    • Temporary global model cooldowns to reduce cascading failures.
    • Recency‑biased routing (time-decay) for better model selection.
    • Balanced mode skips sticky-session pinning and refines Owl‑Alpha/LongCat routing behavior.
  • Tests

    • Added/updated tests covering streaming, cooldowns, routing, and sticky-session behaviors.

vi70x3 added 4 commits June 2, 2026 15:12
- Add ANALYTICS_WINDOW_DAYS constant for SQL decay divisor
- Extend ModelStats with rawSuccesses/rawTotal fields
- Rewrite SQL query with CTE-based time-decay weighting (julianday)
- Update statsCache population to use weighted columns
- Add Math.max(0.1, ...) guards in thompsonSampleScore, smartSampleScore,
  getAnalyticsScore, and getSmartAnalyticsScore to handle fractional weights
- Update getAnalyticsScores dashboard to display rawTotal counts
- Add modelId field to getAnalyticsScores return type for fallback route compat
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

PR adds wrapped-error detection in providers, recency-weighted analytics and Thompson sampling guards, fast routing/pool classification plus Owl Alpha/LongCat preferences, shared transient model cooldowns and thread-protection scaffolding, balanced-mode sticky-session bypass, SSE stream keep-alive and stall abort mechanics, and matching tests, docs, and helper scripts.

Changes

Multi-feature Routing, Analytics, and Streaming Enhancements

Layer / File(s) Summary
Wrapped provider error interception
server/src/providers/base.ts, server/src/providers/openai-compat.ts, server/src/providers/cohere.ts, server/src/providers/cloudflare.ts, server/src/providers/google.ts, .roo/specs/wrapped-error-interception/*
BaseProvider gains protected isWrappedError() and throwWrappedError() and providers check/throw on wrapped errors in JSON and SSE chunks.
Recency-weighted analytics and Thompson sampling
server/src/services/router.ts, server/src/__tests__/services/router.test.ts, .roo/specs/recency-biased-thompson-sampling/*
refreshStatsCache() uses a time-decay SQL CTE, stats include weighted and raw counters, Bayesian/Thompson parameters are clamped with Math.max(0.1, ...), and getAnalyticsScores() exposes modelName/thompsonScore.
Router model selection: balanced exclusions and smart preferences
server/src/services/router.ts, server/src/__tests__/services/router.test.ts, README.md, CLAUDE.md, .roo/specs/owl-alpha-longcat-model-routing/*
Adds fast routing mode and ModelPool classification, balanced mode excludes LongCat and Owl Alpha, smart mode prefers LongCat and conditionally prefers Owl Alpha via hasValidKeys().
Transient model cooldowns
server/src/routes/proxy.ts, server/src/__tests__/routes/transient-cooldown.test.ts, .roo/specs/transient-model-cooldown/*
Adds exported transientModelCooldowns and TRANSIENT_COOLDOWN_MS; prunes/injects cooldowns into per-request skipModels, clears sticky preferred model when cooled, and registers cooldowns on retryable/mid-stream 5xx errors.
Thread protection scanner (scaffold)
server/src/services/threadProtection.ts, .roo/specs/generalized-thread-protection/*
New types and evaluateThreadProtection() placeholder; proxy imports evaluator and generalizes sticky cooldown gating to use protection decisions.
Sticky session behavior by routing mode
server/src/__tests__/routes/provider-session-ban.test.ts, .roo/specs/disable-sticky-on-auto/*
Balanced mode returns empty session key (no-oping sticky helpers); smart mode retains sticky behavior. Tests updated to assert balanced skips and smart behavior.
SSE stream heartbeat and stall detection
server/src/routes/proxy.ts, server/src/__tests__/routes/stream-heartbeat-stall.test.ts, .roo/specs/sse-stream-heartbeat-stall-protection/*
Adds streamKeepaliveConfig, per-stream last-chunk tracking, periodic : keep-alive comments after headers, pre/mid-stream stall handling (504 pre-stream, error-frame mid-stream), idempotent cleanup on client close/error, and tests.
Tests and helper scripts
server/src/__tests__/*, do_fix.py, fix.py, fix_streaming.py, new_streaming_block.txt, server/write_test.py, server/write_tests.py
Extensive new/updated tests for router, proxy, streaming, and cooldowns; small Python scripts for local streaming-block replacement and router test generation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • vi70x3/freellmapi#9: Overlaps on Owl Alpha/LongCat routing preferences and model-level banning conversions.
  • vi70x3/freellmapi#8: Modifies proxy sticky-cooldown/skipModels behavior and thread-protection mechanics closely related to this PR.

Poem

🐰 With routes rebalanced, errors caught mid-flight,
And heartbeats thumping through the streaming night—
LongCat and Owl now prefer their rightful lanes,
While Thompson samples dance on decay's refrains.
Cooldowns shared and stalls now tamed with care, this rabbit hops through changes, light as air!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec/recency-biased-thompson-sampling
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch spec/recency-biased-thompson-sampling

const intelligenceRank = intelligenceMap.get(`${platform}:${modelName}`);
const thompsonScore = thompsonSampleScore(platform, modelName, intelligenceRank, minIntelligenceRank, maxIntelligenceRank);
result.push({
modelId: modelName,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Inconsistent indentation — modelId is indented at 4 spaces while the surrounding fields (platform, modelName, score, etc.) are at 6 spaces. This makes the object literal harder to scan. Should be 6 spaces to match the rest of the block.

@kilo-code-bot

kilo-code-bot Bot commented Jun 2, 2026

Copy link
Copy Markdown

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
server/src/providers/base.ts 116 TypeScript type error - missing closing brace in type assertion
server/src/services/router.ts 379 Inconsistent indentation — modelId is indented at 4 spaces while surrounding fields are at 6 spaces
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
None
Files Reviewed (4 files)
  • server/src/providers/base.ts - 1 issues
  • server/src/providers/cloudflare.ts - 0 issues
  • server/src/providers/cohere.ts - 0 issues
  • server/src/providers/openai-compat.ts - 0 issues
  • server/src/services/router.ts - 1 issues (carried forward)

Reviewed by nemotron-3-super-120b-a12b-20230311:free · 260,483 tokens

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several major enhancements to the routing and proxy layers, including disabling sticky sessions on the balanced endpoint, implementing recency-biased Thompson sampling with linear time-decay, adding SSE stream heartbeats and stall protection, introducing shared transient model cooldowns, and intercepting wrapped error payloads on HTTP 200 responses. The review feedback highlights critical issues that need to be addressed: a missing cache TTL check in refreshStatsCache() causing performance degradation, unreachable auto-refresh logic in getAnalyticsScores(), unsafe 504 handling inside a setInterval callback for pre-stream stalls, an incomplete exhaustion check in the thread protection logic that ignores already-skipped models, and an invalid test case seeding Unix timestamps instead of ISO-8601 strings.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


export function refreshStatsCache(db: Database, force = false): void {
if (!force && statsCache && Date.now() - statsCacheTime < ANALYTICS_CACHE_TTL_MS) return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cache TTL check was removed during refactoring. This causes the heavy SQLite time-decay query to run on every single request, creating a major performance bottleneck. Re-add the early return check to ensure the cache is respected.

Suggested change
if (!force && statsCache && Date.now() - statsCacheTime < ANALYTICS_CACHE_TTL_MS) return;

Comment on lines 348 to +349
if (!statsCache) return [];
if (!statsCache) { refreshStatsCache(getDb(), true); }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The check if (!statsCache) return []; is placed before the auto-refresh logic if (!statsCache) { refreshStatsCache(getDb(), true); }. This makes the auto-refresh logic completely unreachable, and the function will always return [] if the cache is uninitialized. Swap the order of these statements.

Suggested change
if (!statsCache) return [];
if (!statsCache) { refreshStatsCache(getDb(), true); }
if (!statsCache) { refreshStatsCache(getDb(), true); }
if (!statsCache) return [];

Comment on lines +1380 to +1388
} else {
// Pre-stream stall — no headers sent yet, response is still retryable
// Send a 504 so the client gets a proper error signal
try {
res.status(504).json({
error: { message: `Upstream provider stalled before yielding any data from ${route.displayName}`, type: 'stream_timeout' },
});
} catch { /* Socket already gone */ }
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Sending a 504 response directly from inside the setInterval callback does not stop the pending for await loop or the upstream fetch request, leading to leaked sockets and suspended generator execution. Additionally, because no error is thrown to the outer retry loop (and throwing inside setInterval would crash the process), pre-stream stalls do not trigger fallback to other healthy providers.

To fix this safely without crashing the process or leaking resources, consider wrapping the async generator in a timeout-wrapped iterator (using Promise.race on each chunk) so that the timeout error is thrown synchronously within the for await loop. This allows the error to be caught by the existing try/catch block, triggering proper cleanup, logging, and fallback retries.

Comment on lines +1270 to +1277
// Check if cooldown would exhaust all models
let wouldExhaustAll = true;
for (const id of allEnabledIds) {
if (!activeCooldownModels.has(id)) {
wouldExhaustAll = false;
break;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The exhaustion check only checks if activeCooldownModels covers all enabled models, ignoring any models already present in skipModels (e.g., from session bans or transient cooldowns). This can lead to false 'All models exhausted' errors when the combined set of skipped models covers all enabled models. Update the check to evaluate the union of both sets.

Suggested change
// Check if cooldown would exhaust all models
let wouldExhaustAll = true;
for (const id of allEnabledIds) {
if (!activeCooldownModels.has(id)) {
wouldExhaustAll = false;
break;
}
}
// Check if cooldown would exhaust all models
let wouldExhaustAll = true;
for (const id of allEnabledIds) {
if (!activeCooldownModels.has(id) && !skipModels.has(id)) {
wouldExhaustAll = false;
break;
}
}

Comment on lines +81 to +84
db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
.run('test', 'old-model', 'success', 0, 0, 0, null, now - 14 * 24 * 60 * 60);
db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
.run('test', 'new-model', 'success', 0, 0, 0, null, now - 1 * 24 * 60 * 60);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test seeds created_at as a Unix timestamp (seconds), but the production code uses julianday(created_at) which expects an ISO-8601 string. This causes julianday to return null and makes the test invalid. Use ISO-8601 strings for the test requests.

Suggested change
db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
.run('test', 'old-model', 'success', 0, 0, 0, null, now - 14 * 24 * 60 * 60);
db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
.run('test', 'new-model', 'success', 0, 0, 0, null, now - 1 * 24 * 60 * 60);
const dateOffset = (days: number) => new Date(Date.now() - days * 24 * 60 * 60 * 1000).toISOString();
db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
.run('test', 'old-model', 'success', 0, 0, 0, null, dateOffset(14));
db.prepare('INSERT INTO requests (platform, model_id, status, input_tokens, output_tokens, latency_ms, error, created_at) VALUES (?,?,?,?,?,?,?,?)')
.run('test', 'new-model', 'success', 0, 0, 0, null, dateOffset(1));

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/services/router.ts (1)

334-335: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale doc comment: window is 7 days, not 24 h.

The window constant (ANALYTICS_WINDOW_MS) is 7 days; update the comment to avoid misleading readers about the analytics horizon.

📝 Proposed fix
- * Returns current analytics scores for every (platform, model_id) pair seen in
- * the last 24 h. Used by the fallback dashboard to surface routing rationale.
+ * Returns current analytics scores for every (platform, model_id) pair seen in
+ * the analytics window (7 days). Used by the fallback dashboard to surface routing rationale.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 334 - 335, Update the stale doc
comment that says "last 24 h" to reflect the actual analytics window of 7 days;
locate the comment above the function that returns analytics scores (the block
referencing ANALYTICS_WINDOW_MS and returning current analytics scores for every
(platform, model_id) pair) and change "24 h" to "7 days" (or reference
ANALYTICS_WINDOW_MS) so the documentation matches the constant and
implementation.
🧹 Nitpick comments (6)
server/write_tests.py (1)

1-45: ⚡ Quick win

Remove committed scaffolding scripts.

write_tests.py (and the related write_test.py) appear to be one-off generators that emit router.test.ts by string concatenation against a hardcoded absolute path (/home/vi/freellmapi/...). These look like temporary artifacts ("chore: temporary commit before switching branch") and shouldn't ship in the repo — the generated router.test.ts is the source of truth. The static-analysis "missing closing quote" warning is just the truncated diff; not a real defect.

Want me to open an issue to track removing these generator scripts (and do_fix.py/fix.py/fix_streaming.py/new_streaming_block.txt)?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/write_tests.py` around lines 1 - 45, Remove the one-off test generator
scripts and any committed scaffolding: delete write_tests.py and write_test.py
(and also do_fix.py, fix.py, fix_streaming.py, new_streaming_block.txt if
present), leaving the generated router.test.ts as the canonical test file;
ensure no hardcoded absolute paths (e.g., '/home/vi/freellmapi/...') remain in
the repo, and if you want to keep a generator, move it out of source control or
update it to write relative paths and add it to .gitignore.
server/src/services/router.ts (1)

602-602: 💤 Low value

Unconditional console.log on every smart-mode Owl Alpha routing.

This fires on each request whenever Owl Alpha has capacity, adding noise to logs on the hot routing path. Consider gating behind a debug flag or removing it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` at line 602, Remove or gate the unconditional
console.log call so it doesn't spam on every Owl Alpha routing; replace the line
containing console.log('[Router] Owl Alpha preference active — moving
openrouter/owl-alpha to front') with a conditional debug/logger call (e.g.,
processLogger.debug(...) or if (process.env.DEBUG === 'true') console.log(...))
or remove it entirely, ensuring the change is made in the router routing code
that performs the Owl Alpha capacity check so the message only appears when
debug/verbose logging is enabled.
.roo/specs/generalized-thread-protection/requirements.md (1)

5-5: 💤 Low value

Requirements doc is incomplete/truncated.

Line 5 ends mid-sentence with a dangling `longcat`{ and the document contains no enumerated requirements. The corresponding threadProtection.ts is also only a placeholder, so this spec doesn't yet describe the intended scanner behavior.

Want me to draft the missing problem statement and requirements (R1–Rn) based on the existing hardcoded longcat branches and the ThreadProtectionAction shape?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/requirements.md at line 5, The
requirements document is truncated and missing the problem statement and
enumerated requirements; update
.roo/specs/generalized-thread-protection/requirements.md to include a complete
problem statement plus clear R1–Rn requirements that reflect the existing
implementation: inspect server/src/routes/proxy.ts to enumerate the 6+ hardcoded
"longcat" branches and derive the behaviors they special-case, then use the
ThreadProtectionAction shape from threadProtection.ts to define the scanner's
inputs, outputs, and side-effects (e.g., detection rules, allowed actions,
escalation flow, and telemetry). Ensure the spec lists concrete requirements
(R1, R2, ...) mapping each hardcoded branch to the intended generalized rule and
include expected action types from ThreadProtectionAction so implementers can
replace the hardcodes with a generalized scanner.
server/src/__tests__/routes/transient-cooldown.test.ts (1)

80-98: ⚖️ Poor tradeoff

Tests re-implement production logic instead of exercising it.

Suites 2, 3, 5, and 6 inline-copy the cooldown pruning loop and (in Suite 5) the error-classification booleans rather than calling the actual functions exported from proxy.ts. As written, these assert a private re-implementation, so they will keep passing even if the production injection/classification logic diverges or regresses. Consider exporting and invoking the real pruning/classification helpers so the suite guards behavior rather than a copy of it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/transient-cooldown.test.ts` around lines 80 - 98,
The test re-implements the cooldown pruning loop using transientModelCooldowns
and TRANSIENT_COOLDOWN_MS instead of calling the production helpers; update the
tests to import and invoke the real functions exported from proxy.ts (e.g., the
cooldown pruning helper such as pruneTransientCooldowns and the
error-classification helper such as isTransientError) rather than duplicating
the loop and booleans, and assert behavior based on those functions' outputs;
ensure transientModelCooldowns is set up as the test fixture and then call
pruneTransientCooldowns() (or the actual exported name) and use
isTransientError()/the exported classifier where suites 2,3,5,6 currently
inline-copy logic.
fix.py (1)

1-8: ⚡ Quick win

Remove or finish this temporary patch script before merge.

This file is currently an incomplete utility and reads like a checkpoint artifact. Please either complete it with a clear invocation contract or remove it to avoid dead maintenance surface.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fix.py` around lines 1 - 8, fix.py is an incomplete checkpoint script that
should not be merged as-is; either delete the file or finish it by implementing
a clear invocation contract (add a main() entry, argparse for input/output
paths, error handling, and explicit write/backup behavior) and fully implement
the logic that locates and replaces the streaming block (currently searching for
the marker "for await (const chunk{") in server/src/routes/proxy.ts, plus add
tests and README usage notes; ensure the final change removes this dead
maintenance surface or converts it into a well-tested utility with a documented
CLI before merging.
.roo/specs/sse-stream-heartbeat-stall-protection/design.md (1)

198-278: 💤 Low value

Consider streamlining the design document.

This section walks through multiple iterations of the stall-handling approach (lines 198–239 discuss one approach, then lines 240–278 present a revised "better approach," then lines 243–278 present a "final approach"). While the thought process is valuable, the multiple revisions make it harder for future readers to quickly understand the settled design. Consider moving the earlier approaches to an "Alternatives Considered" appendix and keeping only the final approach in the main flow.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 198
- 278, The "Stall Before Headers Sent" section contains multiple draft
approaches (initial, "better", and "final") which should be consolidated: move
the earlier iterations into an "Alternatives Considered" appendix and keep only
the final approach in the main flow that throws a retryable error when
streamStarted === false; update the main text to describe just the settled
behavior (use of streamAborted, cleanupStream(), MAX_STREAM_STALL_MS, and the
thrown 504 error referencing route.displayName) and ensure the appendix
documents the other variants for historical context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/generalized-thread-protection/design.md:
- Around line 43-45: The fenced code block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" is missing a
language tag and triggers markdownlint MD040; update the fence to include the
dotenv language identifier (i.e., change the opening ``` to ```dotenv) so the
block reads as a dotenv snippet and satisfies the linter while preserving the
exact THREAD_PROTECTION_PLATFORMS content.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md:
- Line 5: Several fenced code blocks in the document are missing language
identifiers which triggers MD040; update each triple-backtick block to include a
language tag (e.g., ```text) so they pass markdown lint and improve readability
by replacing occurrences of ``` with ```text for all code fences in the file.

In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md:
- Around line 122-126: Update the broken Markdown links in requirements.md that
reference server files (mentions of server/src/services/router.ts,
server/src/routes/proxy.ts and server/src/db/index.ts and symbols
migrateModelsV15/migrateModelsV16) so they resolve from this document's
location: replace the incorrect relative paths with correct relative paths from
.roo/specs/owl-alpha-longcat-model-routing/requirements.md (or use repo-root
absolute links) and verify each link opens the target file and symbol names
(router.ts, proxy.ts, index.ts, migrateModelsV15, migrateModelsV16) correctly in
Markdown viewers.

In @.roo/specs/recency-biased-thompson-sampling/requirements.md:
- Line 13: The markdown link in requirements.md currently points to a
non-resolving source target for refreshStatsCache; update the link so it
correctly references the refreshStatsCache implementation (router.ts) by
replacing the broken target with a valid repo-relative or permalink to the file
and line or to a function-anchor (e.g., link directly to the refreshStatsCache
symbol) so readers can jump to the actual implementation.

In `@do_fix.py`:
- Around line 1-7: The file do_fix.py is an incomplete local utility (unclosed
block after the docstring) that shouldn't be committed; delete do_fix.py from
the commit or remove its contents so it no longer appears in the repo, and
ensure no references to it remain; locate the file named "do_fix.py" and either
remove the file entirely or restore it to a complete, intentional script
(closing the stray block and adding working code) before including it in the
repo.

In `@fix_streaming.py`:
- Around line 1-21: This file is a WIP utility with an unclosed string literal
(starts at the new_streaming raw string) and should not be committed; remove
fix_streaming.py from the PR (delete the file or revert its addition), ensure no
other modules reference symbols like new_streaming, ttfbMs, lastChunkTimestamp,
or heartbeatInterval, and if you need to keep it locally add it to .gitignore or
stash it so it isn’t re-added to the repo.

In `@new_streaming_block.txt`:
- Around line 1-28: The included patch/scaffold file new_streaming_block.txt
(and related local artifacts do_fix.py, fix.py, fix_streaming.py) is incomplete
and contains a dangling '{' inside the stallTimeout block (see stallTimeout and
the stray opening brace), so remove these files from the PR or move them out of
the repository; if any intended changes belong in real source, reapply them as a
complete patch in the correct source files (fix the truncated stallTimeout
implementation in the streaming logic before committing).

In `@server/src/__tests__/services/router.test.ts`:
- Around line 81-92: The test is asserting on the raw unweighted total returned
by getAnalyticsScores(), which won’t show recency preference and may omit
old-model because its request is outside the 7-day window; update the test to
insert both requests within the window (e.g., created_at within the last 7 days)
and assert on a recency-weighted metric returned by getAnalyticsScores() such as
successRate or thompsonScore (instead of total), e.g., verify new-model’s
successRate/thompsonScore is greater than old-model’s; keep calls to
refreshStatsCache(getDb()) and use scores.find(s => s.modelName === '...') to
locate entries.
- Around line 106-113: The test freezes stochastic behavior by calling
getAnalyticsScores() once and reusing the same thompsonScore values; change the
loop so scores are re-sampled each iteration (e.g., call getAnalyticsScores() or
a dedicated sampling function inside the for loop) and then pick the max from
that freshly computed array, ensuring the test exercises Thompson sampling's
randomness and validates probabilistic preference between model-a and model-b.

In `@server/src/providers/base.ts`:
- Around line 142-145: The assignment to error.status using Number((errPayload
as Record<string, unknown>).code) can produce NaN; update the logic in the block
that sets error.status (referencing error.status and errPayload) to parse and
validate the code value first—coerce to a numeric value, confirm
Number.isFinite(value) and value > 0 (and optionally Math.floor(value) === value
for integer-only statuses), and only then assign it; otherwise fall back to 200.
Ensure the guard handles non-object or missing code cases and avoids propagating
NaN into error.status.

In `@server/src/routes/proxy.ts`:
- Line 1376: Remove the stray double semicolon after the res.write call by
changing the statement that writes the SSE terminator (res.write('data:
[DONE]\n\n');;) to a single-semicolon termination; locate the res.write('data:
[DONE]\n\n') invocation in proxy route handling (the res.write call shown in the
diff) and ensure it ends with only one semicolon.
- Around line 1380-1388: The handler currently sends a 504 JSON response
directly when detecting a pre-stream stall (using res.status(504).json(...)),
which bypasses the outer retry loop; instead, change the behavior in that branch
to throw an Error (or custom Error) with a status property set to 504 and
include the same message referencing route.displayName so the outer catch can
detect status===504 and retry providers; remove the direct res.* call from that
branch and ensure the thrown error shape matches what the outer error handling
expects (status: 504, message) so MAX_RETRIES/fallback logic will be exercised
and update stream-heartbeat-stall.test.ts to assert retry behavior rather than
immediate 504.

In `@server/src/services/router.ts`:
- Around line 348-349: The early return prevents the on-demand refresh: when
statsCache is falsy the function immediately returns [] so the subsequent call
to refreshStatsCache(getDb(), true) never executes; remove or move the early
return and instead call refreshStatsCache(getDb(), true) when statsCache is
falsy (using statsCache check) and only return [] after awaiting or deciding not
to wait for refresh, ensuring the code around statsCache, refreshStatsCache(...)
and getDb() is reordered so the refresh path runs when the cache is missing.
- Around line 183-216: The SQL is broken for epoch-second timestamps: update the
query in router.ts (the weighted_requests CTE used by the stats calculation in
refreshStatsCache) to treat created_at as Unix epoch by using
julianday(created_at, 'unixepoch') for recency_weight and converting the WHERE
clause to compare equivalent datetimes (e.g., datetime(created_at, 'unixepoch')
>= ?) so rows inserted as epoch seconds are matched; ensure both the recency
calculation and the window predicate use the same unixepoch conversion rather
than plain julianday(created_at) or a mismatched ISO comparison.

In `@server/src/services/threadProtection.ts`:
- Around line 20-23: evaluateThreadProtection currently returns a placeholder
ThreadProtectionAction but is never invoked and its fields (banProvider,
skipModel, clearStickyIfPinned) are unused; update the request handling in
server/src/routes/proxy.ts to import and call
evaluateThreadProtection(ErrorContext) inside the proxy request flow (e.g.,
within the handler that currently uses hardcoded LongCat branches), then apply
the returned action flags: if banProvider true, trigger the provider ban flow;
if skipModel true, bypass or select an allowed model path; if
clearStickyIfPinned true, clear any pinned sticky context before processing.
Also either implement real thread-protection logic inside
evaluateThreadProtection or document/remove the placeholder so the function and
its ThreadProtectionAction fields are actually exercised by the proxy flow.

In `@server/write_test.py`:
- Line 4: The hardcoded machine-specific absolute path assigned to the variable
path in write_test.py should be replaced with a portable resolution (e.g., build
the test file path relative to the repository or current file using pathlib or
os.path.join and __file__) so CI and other developers can run it; update the
variable path (reference: variable name "path" in write_test.py) to compute a
relative path to src/__tests__/services/router.test.ts instead of the absolute
'/home/vi/...'.
- Line 29: Fix the unterminated string in the list named content by completing
or closing the quoted entry so the Python list literal parses; locate the
`content` array in server/write_test.py (the failing entry starts with "   
expect(() => route{) and either terminate the string with a matching quote and
any needed escape sequences or replace it with a properly quoted string literal
so the list is syntactically valid.

---

Outside diff comments:
In `@server/src/services/router.ts`:
- Around line 334-335: Update the stale doc comment that says "last 24 h" to
reflect the actual analytics window of 7 days; locate the comment above the
function that returns analytics scores (the block referencing
ANALYTICS_WINDOW_MS and returning current analytics scores for every (platform,
model_id) pair) and change "24 h" to "7 days" (or reference ANALYTICS_WINDOW_MS)
so the documentation matches the constant and implementation.

---

Nitpick comments:
In @.roo/specs/generalized-thread-protection/requirements.md:
- Line 5: The requirements document is truncated and missing the problem
statement and enumerated requirements; update
.roo/specs/generalized-thread-protection/requirements.md to include a complete
problem statement plus clear R1–Rn requirements that reflect the existing
implementation: inspect server/src/routes/proxy.ts to enumerate the 6+ hardcoded
"longcat" branches and derive the behaviors they special-case, then use the
ThreadProtectionAction shape from threadProtection.ts to define the scanner's
inputs, outputs, and side-effects (e.g., detection rules, allowed actions,
escalation flow, and telemetry). Ensure the spec lists concrete requirements
(R1, R2, ...) mapping each hardcoded branch to the intended generalized rule and
include expected action types from ThreadProtectionAction so implementers can
replace the hardcodes with a generalized scanner.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md:
- Around line 198-278: The "Stall Before Headers Sent" section contains multiple
draft approaches (initial, "better", and "final") which should be consolidated:
move the earlier iterations into an "Alternatives Considered" appendix and keep
only the final approach in the main flow that throws a retryable error when
streamStarted === false; update the main text to describe just the settled
behavior (use of streamAborted, cleanupStream(), MAX_STREAM_STALL_MS, and the
thrown 504 error referencing route.displayName) and ensure the appendix
documents the other variants for historical context.

In `@fix.py`:
- Around line 1-8: fix.py is an incomplete checkpoint script that should not be
merged as-is; either delete the file or finish it by implementing a clear
invocation contract (add a main() entry, argparse for input/output paths, error
handling, and explicit write/backup behavior) and fully implement the logic that
locates and replaces the streaming block (currently searching for the marker
"for await (const chunk{") in server/src/routes/proxy.ts, plus add tests and
README usage notes; ensure the final change removes this dead maintenance
surface or converts it into a well-tested utility with a documented CLI before
merging.

In `@server/src/__tests__/routes/transient-cooldown.test.ts`:
- Around line 80-98: The test re-implements the cooldown pruning loop using
transientModelCooldowns and TRANSIENT_COOLDOWN_MS instead of calling the
production helpers; update the tests to import and invoke the real functions
exported from proxy.ts (e.g., the cooldown pruning helper such as
pruneTransientCooldowns and the error-classification helper such as
isTransientError) rather than duplicating the loop and booleans, and assert
behavior based on those functions' outputs; ensure transientModelCooldowns is
set up as the test fixture and then call pruneTransientCooldowns() (or the
actual exported name) and use isTransientError()/the exported classifier where
suites 2,3,5,6 currently inline-copy logic.

In `@server/src/services/router.ts`:
- Line 602: Remove or gate the unconditional console.log call so it doesn't spam
on every Owl Alpha routing; replace the line containing console.log('[Router]
Owl Alpha preference active — moving openrouter/owl-alpha to front') with a
conditional debug/logger call (e.g., processLogger.debug(...) or if
(process.env.DEBUG === 'true') console.log(...)) or remove it entirely, ensuring
the change is made in the router routing code that performs the Owl Alpha
capacity check so the message only appears when debug/verbose logging is
enabled.

In `@server/write_tests.py`:
- Around line 1-45: Remove the one-off test generator scripts and any committed
scaffolding: delete write_tests.py and write_test.py (and also do_fix.py,
fix.py, fix_streaming.py, new_streaming_block.txt if present), leaving the
generated router.test.ts as the canonical test file; ensure no hardcoded
absolute paths (e.g., '/home/vi/freellmapi/...') remain in the repo, and if you
want to keep a generator, move it out of source control or update it to write
relative paths and add it to .gitignore.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8671dcfb-981d-4f90-897b-25ef8a0a3cab

📥 Commits

Reviewing files that changed from the base of the PR and between 233e031 and 45d2546.

📒 Files selected for processing (41)
  • .roo/specs/disable-sticky-on-auto/design.md
  • .roo/specs/disable-sticky-on-auto/requirements.md
  • .roo/specs/disable-sticky-on-auto/tasks.md
  • .roo/specs/generalized-thread-protection/design.md
  • .roo/specs/generalized-thread-protection/requirements.md
  • .roo/specs/generalized-thread-protection/tasks.md
  • .roo/specs/owl-alpha-longcat-model-routing/design.md
  • .roo/specs/owl-alpha-longcat-model-routing/requirements.md
  • .roo/specs/owl-alpha-longcat-model-routing/tasks.md
  • .roo/specs/recency-biased-thompson-sampling/design.md
  • .roo/specs/recency-biased-thompson-sampling/requirements.md
  • .roo/specs/recency-biased-thompson-sampling/tasks.md
  • .roo/specs/sse-stream-heartbeat-stall-protection/design.md
  • .roo/specs/sse-stream-heartbeat-stall-protection/requirements.md
  • .roo/specs/sse-stream-heartbeat-stall-protection/tasks.md
  • .roo/specs/transient-model-cooldown/design.md
  • .roo/specs/transient-model-cooldown/requirements.md
  • .roo/specs/transient-model-cooldown/tasks.md
  • .roo/specs/wrapped-error-interception/design.md
  • .roo/specs/wrapped-error-interception/requirements.md
  • .roo/specs/wrapped-error-interception/tasks.md
  • do_fix.py
  • fix.py
  • fix_streaming.py
  • fix{
  • new_streaming_block.txt
  • server/src/__tests__/routes/provider-session-ban.test.ts
  • server/src/__tests__/routes/proxy-tools.test.ts
  • server/src/__tests__/routes/stream-heartbeat-stall.test.ts
  • server/src/__tests__/routes/transient-cooldown.test.ts
  • server/src/__tests__/services/router.test.ts
  • server/src/providers/base.ts
  • server/src/providers/cloudflare.ts
  • server/src/providers/cohere.ts
  • server/src/providers/google.ts
  • server/src/providers/openai-compat.ts
  • server/src/routes/proxy.ts
  • server/src/services/router.ts
  • server/src/services/threadProtection.ts
  • server/write_test.py
  • server/write_tests.py

Comment on lines +43 to +45
```
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

This fence is missing a language identifier and will trip markdownlint MD040.

🛠️ Suggested fix
-```
+```dotenv
 THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 43-43: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 45, The
fenced code block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" is missing a
language tag and triggers markdownlint MD040; update the fence to include the
dotenv language identifier (i.e., change the opening ``` to ```dotenv) so the
block reads as a dotenv snippet and satisfies the linter while preserving the
exact THREAD_PROTECTION_PLATFORMS content.


## Architecture Overview

```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks (MD040).

These code fences are missing language tags, which will fail markdown lint and reduce readability.

Suggested fix
-```
+```text
...
-```
+```

-```
+```text
...
-```
+```

-```
+```text
...
-```
+```

-```
+```text
...
-```
+```

Also applies to: 52-52, 82-82, 96-96

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md at line 5, Several
fenced code blocks in the document are missing language identifiers which
triggers MD040; update each triple-backtick block to include a language tag
(e.g., ```text) so they pass markdown lint and improve readability by replacing
occurrences of ``` with ```text for all code fences in the file.

Comment on lines +122 to +126
- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts)
- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15`
- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dependency links appear incorrectly relative.

Lines 122–126 likely point to non-existent paths from this folder, so these references won’t open in markdown viewers.

Suggested fix
-- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts)
-- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
-- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
-- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15`
-- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16`
+- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../../../server/src/services/router.ts)
+- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts)
+- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts)
+- Owl Alpha model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV15`
+- LongCat model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV16`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../server/src/services/router.ts)
- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../server/src/routes/proxy.ts)
- Owl Alpha model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV15`
- LongCat model seeded in [`server/src/db/index.ts`](../server/src/db/index.ts) via `migrateModelsV16`
- Existing LongCat smart auto preference logic in [`server/src/services/router.ts`](../../../server/src/services/router.ts)
- Existing LongCat sticky cooldown logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts)
- Existing provider-level ban logic in [`server/src/routes/proxy.ts`](../../../server/src/routes/proxy.ts)
- Owl Alpha model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV15`
- LongCat model seeded in [`server/src/db/index.ts`](../../../server/src/db/index.ts) via `migrateModelsV16`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/owl-alpha-longcat-model-routing/requirements.md around lines 122
- 126, Update the broken Markdown links in requirements.md that reference server
files (mentions of server/src/services/router.ts, server/src/routes/proxy.ts and
server/src/db/index.ts and symbols migrateModelsV15/migrateModelsV16) so they
resolve from this document's location: replace the incorrect relative paths with
correct relative paths from
.roo/specs/owl-alpha-longcat-model-routing/requirements.md (or use repo-root
absolute links) and verify each link opens the target file and symbol names
(router.ts, proxy.ts, index.ts, migrateModelsV15, migrateModelsV16) correctly in
Markdown viewers.


### R-1: Linear Time-Decay Weighting

The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix broken source link target in spec.

Line 13’s link path likely won’t resolve from this directory (server/src/...:174), so readers can’t jump to the referenced implementation line.

Suggested fix
-The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age.
+The SQL query aggregating historical requests in [`refreshStatsCache()`](../../../server/src/services/router.ts) must calculate a weight for each logged request based on its age.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.
The SQL query aggregating historical requests in [`refreshStatsCache()`](../../../server/src/services/router.ts) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/recency-biased-thompson-sampling/requirements.md at line 13, The
markdown link in requirements.md currently points to a non-resolving source
target for refreshStatsCache; update the link so it correctly references the
refreshStatsCache implementation (router.ts) by replacing the broken target with
a valid repo-relative or permalink to the file and line or to a function-anchor
(e.g., link directly to the refreshStatsCache symbol) so readers can jump to the
actual implementation.

Comment thread do_fix.py
Comment on lines +1 to +7
#!/usr/bin/env python3
"""Replace the streaming block in proxy.ts with Promise.race-based stall detection."""

with open('server/src/routes/proxy.ts', 'r') as f:
content = f.read()

{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove work-in-progress utility script.

This Python script is incomplete (line 7 has an unclosed code block) and appears to be a local development utility that was accidentally committed. The script's docstring mentions "Promise.race-based stall detection," but the actual implementation in proxy.ts uses setInterval, suggesting this script predates or diverges from the final approach.

🧰 Tools
🪛 Ruff (0.15.15)

[warning] 7-7: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@do_fix.py` around lines 1 - 7, The file do_fix.py is an incomplete local
utility (unclosed block after the docstring) that shouldn't be committed; delete
do_fix.py from the commit or remove its contents so it no longer appears in the
repo, and ensure no references to it remain; locate the file named "do_fix.py"
and either remove the file entirely or restore it to a complete, intentional
script (closing the stray block and adding working code) before including it in
the repo.

Comment on lines 183 to 216
const rows = db.prepare(`
SELECT platform, model_id,
COUNT(*) as total,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
WITH weighted_requests AS (
SELECT
platform,
model_id,
status,
latency_ms,
output_tokens,
ttfb_ms,
MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / ?)) as recency_weight
FROM requests
WHERE created_at >= ?
)
SELECT
platform,
model_id,
SUM(recency_weight) as total_weighted,
SUM(CASE WHEN status = 'success' THEN recency_weight ELSE 0 END) as successes_weighted,
COUNT(*) as raw_total,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as raw_successes,
CASE
WHEN SUM(CASE WHEN status = 'success' THEN latency_ms ELSE 0 END) > 0
THEN SUM(CASE WHEN status = 'success' THEN output_tokens ELSE 0 END) * 1000.0
/ SUM(CASE WHEN status = 'success' THEN latency_ms ELSE 0 END)
ELSE 0
END as tok_per_sec,
AVG(CASE WHEN status = 'success' AND ttfb_ms IS NOT NULL THEN ttfb_ms END) as avg_ttfb_ms
FROM requests
WHERE created_at >= ?
FROM weighted_requests
GROUP BY platform, model_id
`).all(since) as Array<{
platform: string; model_id: string; total: number; successes: number;
`).all(ANALYTICS_WINDOW_DAYS, since) as Array<{
platform: string; model_id: string; total_weighted: number; successes_weighted: number;
raw_total: number; raw_successes: number;
tok_per_sec: number; avg_ttfb_ms: number | null;
}>;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find created_at column definition and how it's written for the requests table
rg -nP "created_at" --type=ts -C2 -g '!**/__tests__/**'
fd -e sql | xargs rg -nP "created_at|CREATE TABLE requests" -C2
rg -nP "INSERT INTO requests" --type=ts -C2 -g '!**/__tests__/**'

Repository: vi70x3/freellmapi

Length of output: 19477


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the test that inserts created_at and how "now" is computed
sed -n '1,220p' server/src/__tests__/services/router.test.ts

# Check how production inserts into requests (whether created_at is omitted and thus uses default datetime('now'))
rg -n "INSERT INTO requests" -S server/src/routes/proxy.ts
sed -n '1760,1845p' server/src/routes/proxy.ts

# Check requests table schema
sed -n '1,180p' server/src/db/index.ts

# Also locate the API/analytics "since" param formation
rg -n "const since|since =" server/src -S

Repository: vi70x3/freellmapi

Length of output: 16890


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,260p' server/src/services/router.ts
rg -n "refreshStatsCache|getAnalyticsScores|getSinceTimestamp|ANALYTICS_WINDOW" server/src/services/router.ts
rg -n "created_at" server/src/services/router.ts

Repository: vi70x3/freellmapi

Length of output: 10753


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for unixepoch usage
rg -n "unixepoch|julianday\\(\\s*created_at\\)|julianday\\('" server/src -S

# Search for any schema/triggers/views referencing requests.created_at
rg -n "CREATE\\s+(TRIGGER|VIEW|VIRTUAL)\\b|requests\\b.*created_at|created_at.*requests" server/src/db -S || true
rg -n "requests\\b" server/src/db/index.ts server/src/db -S

# Check router tests and any other tests inserting created_at
rg -n "INSERT INTO requests \\(.*created_at\\)" server/src/__tests__ -S

Repository: vi70x3/freellmapi

Length of output: 3091


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if sqlite3 CLI is available
command -v sqlite3 >/dev/null 2>&1 && echo "sqlite3_cli_available=1" || echo "sqlite3_cli_available=0"

# Run a couple of direct SQLite expressions to see how julianday() treats numeric values
# Use an epoch-seconds-like value (~2026-01-01) for the "numeric created_at" case.
EPOCH_SECS=1767225600  # 2026-01-01T00:00:00Z
ISO="2026-01-01T00:00:00Z"

sqlite3 -batch :memory: "
  SELECT
    julianday('now') AS jd_now,
    julianday($EPOCH_SECS) AS jd_from_epoch_seconds_number,
    julianday('$ISO') AS jd_from_iso_string,
    (julianday('now') - julianday($EPOCH_SECS)) AS delta_days_numeric,
    (julianday('now') - julianday('$ISO')) AS delta_days_iso;
"

Repository: vi70x3/freellmapi

Length of output: 152


🏁 Script executed:

#!/bin/bash
set -euo pipefail
python3 - <<'PY'
import sqlite3, datetime, time, math
con = sqlite3.connect(':memory:')
cur = con.cursor()

# Ensure column is TEXT like in schema
cur.execute("CREATE TABLE requests(created_at TEXT NOT NULL)")
con.commit()

# Pick a known epoch seconds
epoch_secs = 1767225600  # 2026-01-01T00:00:00Z
iso = "2026-01-01T00:00:00Z"

# Insert numeric into TEXT column
cur.execute("INSERT INTO requests(created_at) VALUES (?)", (epoch_secs,))
con.commit()

row = cur.execute("""
SELECT
  julianday('now') AS jd_now,
  julianday(?) AS jd_from_numeric_arg,
  julianday(?) AS jd_from_iso_arg,
  julianday(created_at) AS jd_from_numeric_stored_in_text,
  created_at AS stored_value
FROM requests;
""", (epoch_secs, iso)).fetchone()

print("jd_now:", row[0])
print("jd_from_numeric_arg:", row[1])
print("jd_from_iso_arg:", row[2])
print("jd_from_numeric_stored_in_text:", row[3])
print("stored_value:", row[4])

# Also compare WHERE created_at >= ? for both cases
# Use since as ISO string (as production does)
since_iso = datetime.datetime.utcnow().replace(microsecond=0).isoformat()+"Z"
# Numeric compare target: store "0" and see coercion
q = """
SELECT
  SUM(CASE WHEN created_at >= ? THEN 1 ELSE 0 END) AS cnt_ge_since_iso,
  typeof(created_at) AS created_at_type
FROM requests;
"""
cnt = cur.execute(q, (since_iso,)).fetchone()[0]
type_ = cur.execute(q, (since_iso,)).fetchone()[1]
print("since_iso:", since_iso)
print("cnt_ge_since_iso:", cnt)
print("created_at_type:", type_)

# Show what happens with unixepoch modifier
row2 = cur.execute("""
SELECT
  julianday(created_at, 'unixepoch') AS jd_from_numeric_unixepoch_modifier
FROM requests;
""").fetchone()
print("jd_from_numeric_unixepoch_modifier:", row2[0])
PY

Repository: vi70x3/freellmapi

Length of output: 323


Fix recency decay + windowing for Unix-epoch requests.created_at

  • refreshStatsCache passes since as an ISO-8601 string, but tests insert requests.created_at as Unix epoch seconds; with epoch-seconds-in-TEXT, julianday(created_at) evaluates to NULL (so recency_weight is wrong) and WHERE created_at >= ? returns no matching rows.
  • Update the SQL to use epoch-safe conversion (e.g., julianday(created_at,'unixepoch')), and make the window predicate compare in the same domain (e.g., datetime(created_at,'unixepoch') >= ?, or pass since as epoch seconds instead of ISO).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 183 - 216, The SQL is broken for
epoch-second timestamps: update the query in router.ts (the weighted_requests
CTE used by the stats calculation in refreshStatsCache) to treat created_at as
Unix epoch by using julianday(created_at, 'unixepoch') for recency_weight and
converting the WHERE clause to compare equivalent datetimes (e.g.,
datetime(created_at, 'unixepoch') >= ?) so rows inserted as epoch seconds are
matched; ensure both the recency calculation and the window predicate use the
same unixepoch conversion rather than plain julianday(created_at) or a
mismatched ISO comparison.

Comment on lines 348 to +349
if (!statsCache) return [];
if (!statsCache) { refreshStatsCache(getDb(), true); }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

On-demand cache refresh is unreachable — the early return [] defeats it.

Line 348 returns when statsCache is null, so the newly added refresh on line 349 can never run. The intended "refresh stats on-demand when the cache is missing" behavior (T9) is dead code, and the dashboard will return an empty array on the first call before any other code path has populated the cache.

🐛 Proposed fix
-  if (!statsCache) return [];
-  if (!statsCache) { refreshStatsCache(getDb(), true); }
+  if (!statsCache) { refreshStatsCache(getDb(), true); }
+  if (!statsCache) return [];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!statsCache) return [];
if (!statsCache) { refreshStatsCache(getDb(), true); }
if (!statsCache) { refreshStatsCache(getDb(), true); }
if (!statsCache) return [];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 348 - 349, The early return
prevents the on-demand refresh: when statsCache is falsy the function
immediately returns [] so the subsequent call to refreshStatsCache(getDb(),
true) never executes; remove or move the early return and instead call
refreshStatsCache(getDb(), true) when statsCache is falsy (using statsCache
check) and only return [] after awaiting or deciding not to wait for refresh,
ensuring the code around statsCache, refreshStatsCache(...) and getDb() is
reordered so the refresh path runs when the cache is missing.

Comment on lines +20 to +23
export function evaluateThreadProtection(_ctx: ErrorContext): ThreadProtectionAction {
// Placeholder implementation: no protection
return { banProvider: false, skipModel: false, clearStickyIfPinned: false, reason: 'off' };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect how proxy.ts imports and uses evaluateThreadProtection / its action flags.
rg -nP --type=ts -C4 'evaluateThreadProtection|ThreadProtectionAction|banProvider|skipModel|clearStickyIfPinned'

Repository: vi70x3/freellmapi

Length of output: 39415


🏁 Script executed:

#!/bin/bash
# Find all calls to evaluateThreadProtection and where its returned flags are used.
rg -n --type=ts 'evaluateThreadProtection\s*\(' server/src/routes/proxy.ts server/src/services server/src/routes

# Inspect flag-property usage across the codebase (esp. in proxy.ts).
rg -n --type=ts '\b(banProvider|skipModel|clearStickyIfPinned|ThreadProtectionAction)\b' server/src/routes/proxy.ts server/src/services

Repository: vi70x3/freellmapi

Length of output: 750


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Any call sites anywhere in server/src
rg -n --type=ts 'evaluateThreadProtection\s*\(' server/src

# 2) Any usage of the action fields anywhere in server/src
rg -n --type=ts '\b(banProvider|skipModel|clearStickyIfPinned)\b' server/src

# 3) Where proxy.ts references thread protection (log labels, comments, etc.)
rg -n --type=ts 'ThreadProtection|thread protection|evaluateThreadProtection' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 898


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find any mention/call sites of evaluateThreadProtection across the repo
rg -n --type=ts 'evaluateThreadProtection' server/src

# Inspect the proxy.ts area where [ThreadProtection] logs occur
sed -n '1240,1345p' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 4292


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Confirm no call sites in proxy.ts
rg -n --type=ts 'evaluateThreadProtection\s*\(' server/src/routes/proxy.ts

# Confirm only import in proxy.ts
rg -n --type=ts 'evaluateThreadProtection' server/src/routes/proxy.ts

# Confirm action flags aren’t used elsewhere
rg -n --type=ts '\bbanProvider\b|\bskipModel\b|\bclearStickyIfPinned\b' server/src

Repository: vi70x3/freellmapi

Length of output: 43


Thread protection placeholder won’t disable anything right now
evaluateThreadProtection() is a placeholder returning “off”, but server/src/routes/proxy.ts only imports it and never calls it (no evaluateThreadProtection( usage), and the action fields banProvider/skipModel/clearStickyIfPinned aren’t referenced anywhere outside threadProtection.ts. LongCat protection continues to be handled by existing hardcoded proxy branches, so there’s no behavioral regression from this placeholder.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/threadProtection.ts` around lines 20 - 23,
evaluateThreadProtection currently returns a placeholder ThreadProtectionAction
but is never invoked and its fields (banProvider, skipModel,
clearStickyIfPinned) are unused; update the request handling in
server/src/routes/proxy.ts to import and call
evaluateThreadProtection(ErrorContext) inside the proxy request flow (e.g.,
within the handler that currently uses hardcoded LongCat branches), then apply
the returned action flags: if banProvider true, trigger the provider ban flow;
if skipModel true, bypass or select an allowed model path; if
clearStickyIfPinned true, clear any pinned sticky context before processing.
Also either implement real thread-protection logic inside
evaluateThreadProtection or document/remove the placeholder so the function and
its ThreadProtectionAction fields are actually exercised by the proxy flow.

Comment thread server/write_test.py
#!/usr/bin/env python3
"""Write the complete router.test.ts file."""

path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid machine-specific absolute output paths.

Line 4 hardcodes a local filesystem path, so the generator will fail for other devs/CI environments.

Suggested fix
+from pathlib import Path
+
-path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'
+path = Path(__file__).resolve().parent / "src/__tests__/services/router.test.ts"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/write_test.py` at line 4, The hardcoded machine-specific absolute path
assigned to the variable path in write_test.py should be replaced with a
portable resolution (e.g., build the test file path relative to the repository
or current file using pathlib or os.path.join and __file__) so CI and other
developers can run it; update the variable path (reference: variable name "path"
in write_test.py) to compute a relative path to
src/__tests__/services/router.test.ts instead of the absolute '/home/vi/...'.

Comment thread server/write_test.py
" });",
"",
" it('should throw when no keys are configured', () => {",
" expect(() => route{ No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show relevant portion of the file around the reported line
echo "---- server/write_test.py: lines 1-120 (numbered) ----"
nl -ba server/write_test.py | sed -n '1,120p'

echo "---- server/write_test.py: focus lines 15-45 (numbered) ----"
nl -ba server/write_test.py | sed -n '15,45p'

echo "---- rg patterns in server/write_test.py ----"
rg -n "expect\\(\\(\\) => route|route\\{|routeRequest|unterminated|string literal" server/write_test.py || true

Repository: vi70x3/freellmapi

Length of output: 177


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- server/write_test.py: lines 1-120 (numbered via cat -n) ----"
cat -n server/write_test.py | sed -n '1,120p'

echo "---- server/write_test.py: focus lines 15-45 (numbered via cat -n) ----"
cat -n server/write_test.py | sed -n '15,45p'

echo "---- rg patterns in server/write_test.py ----"
rg -n "expect\\(\\(\\) => route|route\\{|routeRequest" server/write_test.py || true

Repository: vi70x3/freellmapi

Length of output: 2505


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- line count ----"
wc -l server/write_test.py

echo "---- server/write_test.py: lines 20-45 ----"
sed -n '20,45p' server/write_test.py | cat -n

echo "---- server/write_test.py: tail (last 30 lines) ----"
tail -n 30 server/write_test.py | cat -n

Repository: vi70x3/freellmapi

Length of output: 2179


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- py_compile (syntax check) ----"
python -m py_compile server/write_test.py

echo "---- tail with cat -n ----"
tail -n 10 server/write_test.py | cat -n

Repository: vi70x3/freellmapi

Length of output: 252


Fix unterminated string literal in content array (line 29).

server/write_test.py fails to parse with SyntaxError: unterminated string literal (detected at line 29). Close/complete the quoted entry at line 29 so the Python list literal is syntactically valid.

🧰 Tools
🪛 Ruff (0.15.15)

[warning] 29-29: missing closing quote in string literal

(invalid-syntax)


[warning] 29-29: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/write_test.py` at line 29, Fix the unterminated string in the list
named content by completing or closing the quoted entry so the Python list
literal parses; locate the `content` array in server/write_test.py (the failing
entry starts with "    expect(() => route{) and either terminate the string with
a matching quote and any needed escape sequences or replace it with a properly
quoted string literal so the list is syntactically valid.

- Add 'fast' to RoutingMode type and ModelPool enum (Fast/Balanced/Smart)
- Implement classifyModel() to categorize models by speed/intelligence ranks
- Implement fastSampleScore() with 3x speed weight, 1.5x TTFB factor, minimal intelligence
- Add pool-based routing in routeRequest: fast pool first, balanced pool borrowing fallback
- Add freellmapi/auto-fast model ID support in proxy routes
- Extend /v1/models endpoint with primaryPool field for all models
- Add unit tests for pool classification (T13-T16) and fast routing (T17-T18)
- Add integration tests for /v1/models primaryPool fields (T13-T17)
- Update README.md and CLAUDE.md documentation

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
server/src/services/router.ts (2)

436-439: ⚠️ Potential issue | 🟡 Minor | 💤 Low value

modelId indentation is inconsistent with sibling fields.

modelId sits at 4 spaces while platform, modelName, etc. are at 6.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 436 - 439, The object literal
pushed in result.push has inconsistent indentation for the modelId property;
align modelId with its sibling properties (platform, modelName) inside the same
object literal in the function that constructs the result array (look for
result.push({ modelId: modelName, platform, modelName, ... })). Adjust spacing
so modelId uses the same indentation level as platform and modelName.

406-407: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

On-demand refresh is unreachable — the return [] on Line 406 precedes the refresh on Line 407.

The early return executes first, so refreshStatsCache(getDb(), true) never runs and the dashboard returns [] on a cold cache.

🐛 Proposed fix
-  if (!statsCache) return [];
-  if (!statsCache) { refreshStatsCache(getDb(), true); }
+  if (!statsCache) refreshStatsCache(getDb(), true);
+  if (!statsCache) return [];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` around lines 406 - 407, The early return makes
the on-demand refresh unreachable; change the logic around statsCache so
refreshStatsCache(getDb(), true) is invoked when statsCache is falsy before
returning. Specifically, replace the two-check pattern with a single conditional
that calls refreshStatsCache(getDb(), true) and then returns [] (e.g., if
(!statsCache) { refreshStatsCache(getDb(), true); return []; }) so
refreshStatsCache is actually executed; ensure you update the block around
statsCache and remove the redundant check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/services/router.ts`:
- Line 170: Remove the local type alias RoutingMode in
server/src/services/router.ts and instead re-export it from the shared types
module; specifically delete the line "export type RoutingMode = 'balanced' |
'smart' | 'fast';" and add an explicit re-export like "export type { RoutingMode
} from '`@freellmapi/shared/types.js`'" so existing imports (e.g., type
RoutingMode used by ../routes/proxy.ts) continue to work and the definition is
deduplicated.

---

Duplicate comments:
In `@server/src/services/router.ts`:
- Around line 436-439: The object literal pushed in result.push has inconsistent
indentation for the modelId property; align modelId with its sibling properties
(platform, modelName) inside the same object literal in the function that
constructs the result array (look for result.push({ modelId: modelName,
platform, modelName, ... })). Adjust spacing so modelId uses the same
indentation level as platform and modelName.
- Around line 406-407: The early return makes the on-demand refresh unreachable;
change the logic around statsCache so refreshStatsCache(getDb(), true) is
invoked when statsCache is falsy before returning. Specifically, replace the
two-check pattern with a single conditional that calls
refreshStatsCache(getDb(), true) and then returns [] (e.g., if (!statsCache) {
refreshStatsCache(getDb(), true); return []; }) so refreshStatsCache is actually
executed; ensure you update the block around statsCache and remove the redundant
check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 70c02531-22d4-4634-8883-b25713001452

📥 Commits

Reviewing files that changed from the base of the PR and between 45d2546 and 5617fc3.

📒 Files selected for processing (7)
  • CLAUDE.md
  • README.md
  • server/src/__tests__/integration/full-flow.test.ts
  • server/src/__tests__/services/router.test.ts
  • server/src/routes/proxy.ts
  • server/src/services/router.ts
  • shared/types.ts
✅ Files skipped from review due to trivial changes (2)
  • CLAUDE.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/src/routes/proxy.ts

}

export type RoutingMode = 'balanced' | 'smart';
export type RoutingMode = 'balanced' | 'smart' | 'fast';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP "RoutingMode" --type=ts -C2

Repository: vi70x3/freellmapi

Length of output: 5164


🏁 Script executed:

#!/bin/bash
rg -n --type=ts "type RoutingMode|RoutingMode" server/src -S

Repository: vi70x3/freellmapi

Length of output: 1449


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- router.ts imports (top of file) ----"
sed -n '1,120p' server/src/services/router.ts | nl -ba | head -n 120

echo
echo "---- router.ts RoutingMode declaration ----"
sed -n '140,220p' server/src/services/router.ts | nl -ba

echo
echo "---- proxy.ts imports (top of file) ----"
sed -n '1,120p' server/src/routes/proxy.ts | nl -ba | head -n 120

Repository: vi70x3/freellmapi

Length of output: 164


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- router.ts imports (top of file) ----"
sed -n '1,120p' server/src/services/router.ts | cat -n | head -n 120

echo
echo "---- router.ts RoutingMode declaration ----"
sed -n '140,220p' server/src/services/router.ts | cat -n

echo
echo "---- proxy.ts imports (top of file) ----"
sed -n '1,140p' server/src/routes/proxy.ts | cat -n | head -n 140

Repository: vi70x3/freellmapi

Length of output: 16450


Deduplicate RoutingMode by re-exporting it from shared types

server/src/services/router.ts locally defines export type RoutingMode = 'balanced' | 'smart' | 'fast', while server/src/routes/proxy.ts imports type RoutingMode from ../services/router.js. Instead of maintaining a second independent definition, have router.ts re-export RoutingMode from @freellmapi/shared/types.js (there are no other RoutingMode imports from router.ts).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/services/router.ts` at line 170, Remove the local type alias
RoutingMode in server/src/services/router.ts and instead re-export it from the
shared types module; specifically delete the line "export type RoutingMode =
'balanced' | 'smart' | 'fast';" and add an explicit re-export like "export type
{ RoutingMode } from '`@freellmapi/shared/types.js`'" so existing imports (e.g.,
type RoutingMode used by ../routes/proxy.ts) continue to work and the definition
is deduplicated.

- extractErrorMessage: handle string error payloads and object errors with message field
- throwWrappedError: guard against non-numeric error codes (NaN prevention)
- cloudflare/cohere/openai-compat: move isWrappedError check outside JSON.parse try/catch in SSE loops

const err = body as { error?: { message?: string }; errors?: Array<{ message?: string }>; message?: string };
return err.error?.message ?? err.errors?.[0]?.message ?? err.message ?? fallback;
const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: TypeScript type error - missing closing brace in type assertion

The type assertion for err has an incomplete type: Array<{ message?: string>; is missing a closing brace for the object type. This will cause a TypeScript compilation error.

Suggested change
const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };
const err = body as { error?: unknown; errors?: Array<{ message?: string }>; message?: string };

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
server/src/providers/base.ts (1)

159-173: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Constrain wrapped code to a real HTTP status.

Number(codeVal) no longer produces NaN, but it still accepts values like "Infinity", 0, -1, or 429.5. proxy.ts only checks typeof status === 'number', so these still bypass the retry/rate-limit branches this helper is meant to trigger. Fall back to 200 unless the parsed value is a finite HTTP status.

Suggested fix
-      const num = Number(codeVal);
-      if (!isNaN(num)) {
-        statusCode = num;
-      }
+      const num = Number(codeVal);
+      if (Number.isInteger(num) && num >= 100 && num < 600) {
+        statusCode = num;
+      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/providers/base.ts` around lines 159 - 173, The parsed
errPayload.code should be validated as a finite integer HTTP status before
assigning to error.status: replace the loose Number(codeVal) acceptance with
logic that converts codeVal to a Number, verifies Number.isFinite(num) and
Number.isInteger(num), and ensures the value is within the valid HTTP range
(100–599); only then set statusCode = num, otherwise leave statusCode as 200 so
proxy.ts's typeof number checks don't get bypassed by "Infinity", 0, negatives,
or non-integer values (referencing variables errPayload, codeVal, statusCode,
and error.status).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/providers/cloudflare.ts`:
- Around line 132-135: Before calling this.throwWrappedError(parsed)
cancel/abort the upstream SSE stream so the reader/body is closed; specifically,
when you detect a wrapped error via this.isWrappedError(parsed) call the stream
cancellation API (e.g., await reader.cancel() or this.abortController.abort())
and handle/ignore any cancellation errors, then invoke
this.throwWrappedError(parsed) — ensure the cancel/abort is awaited or completed
synchronously so the generator exits with the stream already closed.

---

Duplicate comments:
In `@server/src/providers/base.ts`:
- Around line 159-173: The parsed errPayload.code should be validated as a
finite integer HTTP status before assigning to error.status: replace the loose
Number(codeVal) acceptance with logic that converts codeVal to a Number,
verifies Number.isFinite(num) and Number.isInteger(num), and ensures the value
is within the valid HTTP range (100–599); only then set statusCode = num,
otherwise leave statusCode as 200 so proxy.ts's typeof number checks don't get
bypassed by "Infinity", 0, negatives, or non-integer values (referencing
variables errPayload, codeVal, statusCode, and error.status).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 61c0201f-eda8-48c9-802b-a3f625d518d9

📥 Commits

Reviewing files that changed from the base of the PR and between 5617fc3 and 1ac7c15.

📒 Files selected for processing (4)
  • server/src/providers/base.ts
  • server/src/providers/cloudflare.ts
  • server/src/providers/cohere.ts
  • server/src/providers/openai-compat.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/src/providers/openai-compat.ts
  • server/src/providers/cohere.ts

Comment on lines +132 to 135
// Check for wrapped errors outside of the JSON parse try/catch block
if (this.isWrappedError(parsed)) {
this.throwWrappedError(parsed);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancel the SSE reader before throwing the wrapped error.

The spec for this feature requires the stream to be aborted when a chunk is actually an error payload. Throwing here exits the generator, but it leaves the upstream reader/body open until the remote side closes it.

Suggested fix
         // Check for wrapped errors outside of the JSON parse try/catch block
         if (this.isWrappedError(parsed)) {
+          await reader.cancel().catch(() => undefined);
           this.throwWrappedError(parsed);
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/providers/cloudflare.ts` around lines 132 - 135, Before calling
this.throwWrappedError(parsed) cancel/abort the upstream SSE stream so the
reader/body is closed; specifically, when you detect a wrapped error via
this.isWrappedError(parsed) call the stream cancellation API (e.g., await
reader.cancel() or this.abortController.abort()) and handle/ignore any
cancellation errors, then invoke this.throwWrappedError(parsed) — ensure the
cancel/abort is awaited or completed synchronously so the generator exits with
the stream already closed.

@vi70x3 vi70x3 closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant