feat: generalized thread protection scanner integration by vi70x3 · Pull Request #13 · animaios/api-llm-localhost

vi70x3 · 2026-06-02T23:45:15Z

Implements the generalized-thread-protection spec.

Changes

Task 4a: Rules engine in `threadProtection.ts`

Replaced the mock placeholder in evaluateThreadProtection with the actual protection scanner
Implemented configuration parsing from THREAD_PROTECTION_PLATFORMS env var (comma-separated platform:level pairs)
Added default protection map preserving backward compatibility: longcat → provider-ban, all others → model-skip
Exported getProtectionLevel() for sticky cooldown generalization
Implemented the decision matrix mapping 5xx, truncation, retryable error contexts to ThreadProtectionAction rules

Task 4b: Integration into `proxy.ts`

Replaced 6 hardcoded route.platform === 'longcat' branches with explicit evaluateThreadProtection() calls:
1. Stream truncation detection (post-stream)
2. Mid-stream 5xx error
3. Mid-stream truncation error
4. Mid-stream retryable error
5. Non-stream 5xx error
6. Non-stream retryable error
Each replacement uses the scanner's banProvider, skipModel, and clearStickyIfPinned action flags
No remaining hardcoded longcat references in proxy.ts

Summary by Sourcery

Integrate a generalized thread protection and transient cooldown system into the proxy, enhance routing and analytics behavior, and add SSE stream heartbeat/stall protection and wrapped-error handling across providers.

Enhancements:

Introduce a generalized thread protection rules engine and wire it into proxy error handling paths to replace hardcoded longcat-specific logic with configurable provider/model ban and skip policies.
Add transient, cross-request model cooldown tracking and sticky-session interaction so models that recently failed are temporarily excluded from routing while allowing automatic recovery after expiry.
Refine routing behavior by excluding specific models from balanced mode, preferring selected models (e.g., LongCat and Owl Alpha) in smart mode when keys have capacity, and applying recency-weighted analytics for Thompson sampling.
Implement SSE stream heartbeat and stall detection in the chat proxy to keep long-lived streams alive with periodic comments and to terminate stalled upstream connections with structured timeout errors.
Extend provider base and implementations to detect wrapped error payloads returned with HTTP 200 responses and surface them as structured provider errors for existing retry and fallback logic.
Reorganize the fallback admin UI to group models by pool (fast, balanced, smart) with clearer pool metadata while preserving sorting and toggling behavior.

Documentation:

Add internal design and requirements documents describing wrapped-error interception, SSE heartbeat and stall protection, recency-biased Thompson sampling, transient model cooldowns, generalized thread protection, sticky-session behavior in balanced mode, and model routing for Owl Alpha and LongCat.

Tests:

Add extensive unit and integration tests covering provider session bans, transient cooldown behavior, thread protection rules, SSE heartbeat and stall handling, routing analytics, and fallback API pool metadata to ensure the new behaviors are correct and regressions are prevented.

Summary by CodeRabbit

Release Notes

New Features
- SSE streams now include automatic keep-alive heartbeats and stall detection to improve stability
- Added transient cooldown mechanism to automatically skip models experiencing temporary failures
- Improved error detection for wrapped error payloads returned with HTTP 200 status
- Fallback page now groups available models by speed tier for better discoverability
Improvements
- Analytics now use recency-weighted performance tracking for smarter model selection
- Error handling now skips individual models instead of entire providers on transient failures
- Balanced routing mode no longer applies sticky session pinning for improved fairness

…alized thread protection scanner

… longcat branches

coderabbitai · 2026-06-02T23:45:25Z

📝 Walkthrough

Walkthrough

This PR implements seven coordinated proxy routing and resilience features: wrapped-error interception in all provider implementations, a generalized thread-protection decision engine, recency-biased Thompson sampling analytics, transient global cooldowns for failed models, SSE heartbeat/stall protection, model-level routing for Owl Alpha and LongCat with mutual exclusions, and disabling sticky sessions in balanced mode. Includes comprehensive specification documents and new test coverage.

Changes

Multi-Feature Proxy Routing Enhancement

Layer / File(s)	Summary
Feature specifications and design documentation `.roo/specs/disable-sticky-on-auto/`, `.roo/specs/generalized-thread-protection/`, `.roo/specs/owl-alpha-longcat-model-routing/`, `.roo/specs/recency-biased-thompson-sampling/`, `.roo/specs/sse-stream-heartbeat-stall-protection/`, `.roo/specs/transient-model-cooldown/`, `.roo/specs/wrapped-error-interception/*`	Specification documents (design, requirements, tasks) for all seven features being implemented in this PR.
Provider-layer wrapped-error interception `server/src/providers/base.ts`, `server/src/providers/cloudflare.ts`, `server/src/providers/cohere.ts`, `server/src/providers/google.ts`, `server/src/providers/openai-compat.ts`	`BaseProvider` adds protected `isWrappedError()` and `throwWrappedError()` helpers and exposes `extractErrorMessage()`. All provider implementations add wrapped-error detection after JSON parsing in both streaming and non-streaming paths, throwing `ProviderApiError` when HTTP 200 wraps an error payload.
Generalized thread protection decision engine `server/src/services/threadProtection.ts`	New module parsing `THREAD_PROTECTION_PLATFORMS` env var into platform-to-protection-level mappings and exporting `evaluateThreadProtection()` function to compute provider-ban/model-skip/sticky-clear decisions based on error kind and configured protection level, replacing hardcoded LongCat logic.
Router analytics weighting and model preferences `server/src/services/router.ts`	`ModelStats` extends with `rawTotal` (unweighted) alongside recency-weighted `total`/`successes`. `refreshStatsCache()` applies time-decay weighting using `requests.created_at`. Balanced mode excludes `longcat` and Owl Alpha platforms. New `hasValidKeys()` helper validates key capacity/cooldown/limits. Smart-mode preference logic moves LongCat and Owl Alpha to front when valid keys exist. Dashboard `getAnalyticsScores()` switches reported `total` to `rawTotal` for actual request counts.
Proxy streaming resilience, heartbeats, cooldowns, and thread protection `server/src/routes/proxy.ts`	Adds `streamKeepaliveConfig` for configurable heartbeat/stall intervals; implements SSE keep-alive comments during idle periods and stall-detection (HTTP 504 pre-stream, SSE error frame mid-stream). Introduces `transientModelCooldowns` global map with lazy pruning and `skipModels` injection to isolate concurrent failures. Extends sticky-session cooldown to scan other active sessions within `THREAD_COOLDOWN_MS`. Integrates `evaluateThreadProtection()` for provider-ban/model-skip/sticky-clear decisions across truncation/5xx/retryable error paths (streaming and non-stream). Exports cooldown state for test access.
Client UI pool grouping and test infrastructure `client/src/pages/FallbackPage.tsx`, `server/src/__tests__/routes/stream-heartbeat-stall.test.ts`, `server/src/__tests__/routes/transient-cooldown.test.ts`, `server/src/__tests__/routes/provider-session-ban.test.ts`, `server/src/__tests__/routes/fallback.test.ts`, `server/src/__tests__/routes/proxy-tools.test.ts`, `server/src/__tests__/services/router.test.ts`, `server/write_test.py`, `server/write_tests.py`, `do_fix.py`, `fix.py`, `fix_streaming.py`, `new_streaming_block.txt`	`FallbackPage` renders models grouped by pool (fast/balanced/smart) via `PoolSection` instead of flat list. New test suites validate SSE heartbeat/stall scenarios, transient cooldown injection/registration/recovery/eligibility, provider-session-ban behavior in both balanced and smart modes, fallback entry pool enum validation, and router analytics/exclusion logic. Test generators and placeholder transformation scripts included.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

vi70x3/freellmapi#9: Implements the same balanced-mode sticky-session disablement feature (making getSessionKey() return empty string for balanced mode and updating provider-session-ban tests).
vi70x3/freellmapi#2: Changes sticky-session machinery in proxy (session-key computation and sticky model/key storage), directly interacting with the balanced-mode sticky disablement logic.
vi70x3/freellmapi#8: Modifies proxy sticky-session plumbing where sticky state (including LongCat cooldown/session-ban) is suppressed via session-key gating, impacted by this PR's balanced-mode empty-key guard.

🐰 Seven features hop into one PR so fine,
Wrapped errors caught, analytics align,
Heartbeats keep streams alive through the night,
Thread protection shines with recency's light,
Cooldowns prevent cascading crashes bright,
Balanced and smart routing get it right! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/generalized-thread-protection

⚔️ Resolve merge conflicts

Resolve merge conflict in branch spec/generalized-thread-protection

sourcery-ai · 2026-06-02T23:45:27Z

Reviewer's Guide

Implements the generalized thread protection scanner and related routing resiliency features: centralizes provider/model ban decisions, adds transient model cooldowns and SSE heartbeat/stall protection, biases routing analytics toward recent performance, and adjusts router and UI behavior for LongCat/Owl Alpha and balanced vs smart pools.

File-Level Changes

Change	Details	Files
Introduce a configurable thread protection rules engine and wire it into all proxy error/truncation paths.	Add threadProtection service with env‑driven protection levels, defaulting longcat to provider‑ban and others to model‑skip, plus exported getProtectionLevel() helper. Replace all hardcoded longcat platform checks in streaming/non‑streaming 5xx, truncation, and retryable error handling with evaluateThreadProtection() calls that drive banProvider/skipModel/clearStickyIfPinned behavior. Generalize sticky cooldown behavior to use protection levels instead of special‑casing longcat so provider‑ban platforms get the same treatment automatically.	`server/src/services/threadProtection.ts` `server/src/routes/proxy.ts` `server/src/__tests__/routes/provider-session-ban.test.ts` `.roo/specs/generalized-thread-protection/design.md` `.roo/specs/generalized-thread-protection/tasks.md` `.roo/specs/generalized-thread-protection/requirements.md`
Add transient, cross‑request model cooldowns and integrate them with routing and sticky sessions.	Introduce transientModelCooldowns map and TRANSIENT_COOLDOWN_MS, export them for tests, and inject active cooldowns into skipModels with lazy pruning of expired entries. Clear preferredModel/preferredKeyId when the sticky model is on transient cooldown so pinned sessions don’t stick to a globally failing model. Register transient cooldowns for mid‑stream and non‑stream 5xx/retryable errors and ensure cooldowns combine with per‑session bans via addProviderModelsToSkipModels. Add an extensive transient‑cooldown test suite validating map behavior, pruning, registration conditions, interaction with skipModels and stickySessionMap, and error‑classification rules.	`server/src/routes/proxy.ts` `server/src/__tests__/routes/transient-cooldown.test.ts` `.roo/specs/transient-model-cooldown/design.md` `.roo/specs/transient-model-cooldown/requirements.md` `.roo/specs/transient-model-cooldown/tasks.md`
Implement SSE stream heartbeat and stall protection with configurable keepalive and stall thresholds.	Add streamKeepaliveConfig (KEEPALIVE_INTERVAL_MS, MAX_STREAM_STALL_MS) exported from proxy for test overrides. Wrap the streaming path with lastChunkTimestamp tracking, a heartbeat setInterval that sends : keep-alive comments once headers are sent, and stall detection that either emits a stream_timeout SSE error (mid‑stream) or a 504 JSON error (pre‑stream). Ensure cleanupStream() clears timers on normal completion, errors, and client disconnects, and guard the for‑await loop with streamAborted checks and special logging on stalled streams. Add dedicated stream-heartbeat-stall tests that stub upstream streaming behavior to exercise heartbeats, stalls before first chunk, mid‑stream stalls, client disconnect cleanup, and normal fast streams.	`server/src/routes/proxy.ts` `server/src/__tests__/routes/stream-heartbeat-stall.test.ts` `.roo/specs/sse-stream-heartbeat-stall-protection/design.md` `.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md` `.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md`
Adjust router analytics to be recency‑biased and expose model pool information for UI grouping.	Change router stats aggregation SQL to compute recency‑weighted total/success counts over the analytics window using julianday(), while also tracking rawTotal for display. Extend ModelStats with rawTotal, use weighted successes/total in rates but rawTotal in getAnalyticsScores(), and add safe Math.max guards around Beta parameters used in Thompson sampling. Introduce EXCLUDED_FROM_BALANCED and EXCLUDED_MODELS_FROM_BALANCED so balanced routing omits longcat and openrouter/owl-alpha, while smart routing prefers them when hasValidKeys() detects a viable key. Refactor key‑capacity checks into hasValidKeys() and update router tests; a write_test.py helper appears to regenerate router.test.ts but is not part of runtime logic. Expose pool (fast/balanced/smart) in fallback API responses and validate via new tests.	`server/src/services/router.ts` `server/src/__tests__/services/router.test.ts` `server/src/__tests__/routes/fallback.test.ts` `.roo/specs/recency-biased-thompson-sampling/design.md` `.roo/specs/recency-biased-thompson-sampling/requirements.md` `.roo/specs/recency-biased-thompson-sampling/tasks.md` `.roo/specs/owl-alpha-longcat-model-routing/design.md` `.roo/specs/owl-alpha-longcat-model-routing/requirements.md` `.roo/specs/owl-alpha-longcat-model-routing/tasks.md` `server/write_test.py` `fix_streaming.py` `fix.py` `do_fix.py` `new_streaming_block.txt` `server/write_tests.py`
Update proxy routing, sticky behavior, and tests to disable sticky sessions for balanced mode and clean up routing mode assumptions.	Change getSessionKey() semantics (via tests) so balanced mode returns an empty key, making all sticky session operations no‑ops for balanced; add explicit balanced‑mode tests for getStickyModel, isSessionBannedFromPlatform, banPlatformFromSession, and setStickyModel. Switch many test invocations from routingMode 'balanced' to 'smart' where sticky behavior is expected, ensuring tests now reflect sticky‑only semantics for smart routing. Document this behavior in disable-sticky-on-auto specs and tasks.	`server/src/routes/proxy.ts` `server/src/__tests__/routes/provider-session-ban.test.ts` `.roo/specs/disable-sticky-on-auto/design.md` `.roo/specs/disable-sticky-on-auto/requirements.md` `.roo/specs/disable-sticky-on-auto/tasks.md`
Improve wrapped‑error handling in providers so HTTP 200 responses carrying error payloads are normalized into ProviderApiError failures.	Make BaseProvider.extractErrorMessage protected and add isWrappedError()/throwWrappedError() helpers to inspect parsed JSON bodies for root‑level error fields and throw ProviderApiError with appropriate status and message. Call isWrappedError/throwWrappedError in chatCompletion() and streamChatCompletion() for OpenAICompat, Cohere, Cloudflare, and Google providers immediately after JSON.parse, before normal processing. Spec out requirements and tasks for wrapped error interception to document behavior and constraints.	`server/src/providers/base.ts` `server/src/providers/openai-compat.ts` `server/src/providers/cohere.ts` `server/src/providers/cloudflare.ts` `server/src/providers/google.ts` `.roo/specs/wrapped-error-interception/design.md` `.roo/specs/wrapped-error-interception/requirements.md` `.roo/specs/wrapped-error-interception/tasks.md`
Enhance the fallback admin UI to group models by pool and display pool metadata.	Extend FallbackEntry to include pool type and import PoolSection/PoolType UI components. Compute poolGroups from displayEntries, and render one table per pool wrapped in PoolSection with pool‑specific titles, keeping existing sort headers and ModelRow rendering per pool. Add tests ensuring GET /api/fallback returns valid ModelPool enum values in the pool field.	`client/src/pages/FallbackPage.tsx` `server/src/__tests__/routes/fallback.test.ts` `.roo/specs/owl-alpha-longcat-model-routing/design.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

qodo-code-review · 2026-06-02T23:46:01Z

Review Summary by Qodo

Generalized thread protection, stream heartbeat, transient cooldowns, and model-level routing with wrapped error detection

✨ Enhancement 🧪 Tests 🐞 Bug fix

Walkthroughs

Description

• **Generalized thread protection scanner**: Replaced 6 hardcoded longcat platform checks with
  configurable evaluateThreadProtection() function that maps error contexts (5xx, truncation,
  retryable) to actions (ban provider, skip model, clear sticky) based on per-platform protection
  levels
• **Stream heartbeat and stall protection**: Added SSE keep-alive comments (15s interval) to prevent
  proxy timeouts and stall detection (45s threshold) to abort unresponsive streams with graceful error
  handling
• **Transient model cooldowns**: Implemented shared in-memory cooldown collection (15s window) for
  5xx and connection failures across concurrent requests to prevent thundering herd on degraded
  models, with sticky session override and exhaustion protection
• **Recency-weighted analytics**: Applied 7-day exponential decay formula to success/total counts in
  Thompson sampling to prioritize recent performance over historical data
• **Model-level routing for Owl Alpha and LongCat**: Balanced mode excludes both platforms entirely;
  smart mode prefers them when valid keys exist; replaced provider-level banning with model-level
  skipModels
• **Wrapped error detection**: Added detection of root-level error field in HTTP 200 responses
  across all providers (OpenAI, Cohere, Cloudflare, Google) with proper status code extraction
• **Fallback API enhancements**: Added pool property to fallback response with ModelPool enum
  validation and pool-based grouping in UI
• **Test coverage**: Added comprehensive test suites for transient cooldowns (40+ cases), stream
  heartbeat/stall (5 scenarios), balanced mode routing, and fallback pool validation; updated existing
  tests for balanced/smart mode distinction
• **Bug fix**: Fixed SQL INSERT statement syntax error in router tests (duplicate VALUES clause)

Diagram

flowchart LR
  A["Error Context<br/>5xx/truncation/retryable"] -->|evaluateThreadProtection| B["Protection Scanner<br/>threadProtection.ts"]
  B -->|per-platform rules| C["Action Decision<br/>ban/skip/clear"]
  C -->|applied to| D["Proxy Request<br/>proxy.ts"]
  
  E["Stream Active"] -->|15s idle| F["Heartbeat Comment<br/>SSE keep-alive"]
  E -->|45s stall| G["Stream Abort<br/>stream_timeout"]
  
  H["Model Failure<br/>5xx/timeout"] -->|register| I["Transient Cooldown<br/>15s window"]
  I -->|exclude from| J["Model Selection<br/>prevent herd"]
  
  K["Analytics Stats"] -->|7-day decay| L["Recency Weight<br/>prioritize recent"]
  L -->|Thompson sampling| M["Model Ranking"]
  
  N["HTTP 200 Response"] -->|check| O["Wrapped Error<br/>root error field"]
  O -->|throw| P["ProviderApiError<br/>proper status"]

File Changes

1. server/src/routes/proxy.ts ✨ Enhancement +261/-82

Generalized thread protection, stream heartbeat, and transient cooldowns

• Replaced 6 hardcoded longcat platform checks with generalized evaluateThreadProtection() calls
 across stream truncation, mid-stream 5xx, mid-stream truncation, mid-stream retryable, non-stream
 5xx, and non-stream retryable error contexts
• Added stream heartbeat and stall protection: KEEPALIVE_INTERVAL_MS (15s) sends SSE comments
 during idle periods, MAX_STREAM_STALL_MS (45s) aborts stalled streams with stream_timeout error
• Implemented transient model cooldowns (transientModelCooldowns Map, TRANSIENT_COOLDOWN_MS 15s)
 to share failure state across concurrent requests and prevent thundering herd on degraded models
• Added thread protection logic to exclude models actively used by other sticky sessions within
 cooldown window, with exhaustion protection to prevent blocking all available models

server/src/routes/proxy.ts

2. server/src/services/threadProtection.ts ✨ Enhancement +119/-0

New thread protection scanner with configurable platform rules

• New module implementing generalized thread protection scanner with configurable protection levels
 per platform
• Exports evaluateThreadProtection() function that maps error contexts (5xx, truncation,
 retryable) to actions (banProvider, skipModel, clearStickyIfPinned)
• Parses THREAD_PROTECTION_PLATFORMS env var for platform-specific protection levels; defaults to
 longcat:provider-ban and all others to model-skip for backward compatibility
• Implements decision matrix matching protection levels to error kinds with human-readable reason
 strings for logging

server/src/services/threadProtection.ts

3. server/src/__tests__/routes/provider-session-ban.test.ts 🧪 Tests +92/-48

Updated tests for balanced/smart mode routing distinction

• Updated routing mode parameter from 'balanced' to 'smart' across 40+ test cases to reflect new
 balanced/smart mode distinction
• Changed truncation detection test sample from 'conflict in response' to 'cut off' for more
 accurate pattern matching
• Added new test suite for balanced mode behavior: verifies sticky session operations are skipped
 when routingMode === 'balanced'

server/src/tests/routes/provider-session-ban.test.ts

View more (40)

4. server/src/__tests__/routes/transient-cooldown.test.ts 🧪 Tests +415/-0

New transient cooldown test suite with 40+ test cases

• New comprehensive test suite (415 lines) covering transient model cooldown functionality with 6
 test suites
• Tests cooldown map basics, injection/pruning logic, auto-recovery after expiry, sticky session
 override behavior, error classification eligibility, and integration with session bans
• Validates that only 5xx and connection failures (undefined status) trigger cooldown registration,
 while 429/401/403/400 errors do not

server/src/tests/routes/transient-cooldown.test.ts

5. server/src/__tests__/routes/stream-heartbeat-stall.test.ts 🧪 Tests +324/-0

New stream heartbeat and stall protection test suite

• New test suite (324 lines) for SSE stream heartbeat and stall protection with 5 test scenarios
• Tests heartbeat comment emission during idle periods, stream termination on stall detection,
 pre-stream stall 504 response, client disconnect cleanup, and normal streaming with heartbeat
 enabled
• Uses configurable streamKeepaliveConfig for test-friendly timing (100ms intervals, 500ms stall
 threshold)

server/src/tests/routes/stream-heartbeat-stall.test.ts

6. server/src/services/router.ts ✨ Enhancement +86/-32

Recency-weighted analytics and balanced/smart mode routing preferences

• Added recency weighting to analytics stats calculation: success/total counts now use 7-day
 exponential decay formula to prioritize recent performance over historical data
• Introduced EXCLUDED_FROM_BALANCED and EXCLUDED_MODELS_FROM_BALANCED sets to exclude LongCat
 platform and Owl Alpha model from balanced-mode auto-routing
• Implemented smart-mode preference logic for LongCat and Owl Alpha: moves these models to front of
 sorted list when capacity available
• Added hasValidKeys() helper function to check if a platform/model has any enabled keys with
 available capacity

server/src/services/router.ts

7. server/src/__tests__/services/router.test.ts 🐞 Bug fix +3/-36

Router test updates and syntax fixes

• Updated test imports to include refreshStatsCache and getAnalyticsScores functions
• Simplified test setup by removing redundant comments and consolidating key insertion logic
• Fixed SQL INSERT statement syntax error (duplicate VALUES clause)

server/src/tests/services/router.test.ts

8. server/src/providers/base.ts ✨ Enhancement +28/-1

Protected error extraction and wrapped error detection

• Changed extractErrorMessage() from private to protected for subclass access
• Added isWrappedError() method to detect root-level error field in JSON responses (catches
 upstream providers returning error payloads with HTTP 200 status)
• Added throwWrappedError() method to throw ProviderApiError from wrapped error payloads with
 proper status code extraction

server/src/providers/base.ts

9. server/src/providers/openai-compat.ts ✨ Enhancement +10/-1

Wrapped error detection for OpenAI-compatible provider

• Added wrapped error detection in non-streaming response path: checks isWrappedError() after JSON
 parsing and throws via throwWrappedError()
• Added wrapped error detection in streaming path: checks each parsed chunk for wrapped errors
 before yielding

server/src/providers/openai-compat.ts

10. server/src/providers/cohere.ts ✨ Enhancement +10/-1

Wrapped error detection for Cohere provider
• Added wrapped error detection in non-streaming response path after JSON parsing
• Added wrapped error detection in streaming chunk processing loop
server/src/providers/cohere.ts

11. server/src/providers/cloudflare.ts ✨ Enhancement +10/-1

Wrapped error detection for Cloudflare provider
• Added wrapped error detection in non-streaming response path after JSON parsing
• Added wrapped error detection in streaming chunk processing loop
server/src/providers/cloudflare.ts

12. server/src/providers/google.ts ✨ Enhancement +10/-0

Wrapped error detection for Google provider

• Added wrapped error detection in non-streaming response path after JSON parsing
• Added wrapped error detection in streaming chunk processing loop before candidate extraction

server/src/providers/google.ts

13. server/src/__tests__/routes/fallback.test.ts 🧪 Tests +11/-0

Fallback API tests for pool property validation

• Added import for ModelPool enum from shared types
• Added test to verify fallback API response includes speedRank and pool properties
• Added test to validate all returned pool values are valid ModelPool enum values (Fast,
 Balanced, Smart)

server/src/tests/routes/fallback.test.ts

14. server/src/__tests__/routes/proxy-tools.test.ts 🧪 Tests +3/-2

Proxy tools test cleanup for transient cooldowns
• Added import for transientModelCooldowns from proxy module
• Updated beforeEach hook to clear both stickySessionMap and transientModelCooldowns before
 each test
server/src/tests/routes/proxy-tools.test.ts

15. .roo/specs/sse-stream-heartbeat-stall-protection/design.md 📝 Documentation +330/-0

Design spec for stream heartbeat and stall protection

• New 330-line design specification for SSE stream heartbeat and stall protection feature
• Documents architecture with mermaid flowchart, implementation details for
 constants/state/cleanup/interval/listeners, interaction with existing code paths, and comprehensive
 edge case matrix
• Specifies pre-stream vs mid-stream stall handling, integration with Responses API, and file
 modification requirements

.roo/specs/sse-stream-heartbeat-stall-protection/design.md

16. .roo/specs/transient-model-cooldown/design.md 📝 Documentation +197/-0

Design spec for transient model cooldown circuit breaker

• New 197-line design specification for shared temporary cooldowns feature
• Documents architecture with mermaid flowchart, data structure design, 4 integration points
 (injection, sticky override, failure registration, mid-stream errors), and error classification
 matrix
• Includes test strategy, export requirements, and risk mitigation table

.roo/specs/transient-model-cooldown/design.md

17. .roo/specs/generalized-thread-protection/requirements.md 📝 Documentation +5/-0

Requirements stub for thread protection spec
• New requirements document stub (incomplete, 5 lines) for generalized thread protection scanner
 spec
.roo/specs/generalized-thread-protection/requirements.md

18. fix.py Miscellaneous +8/-0

Incomplete utility script for streaming block replacement
• New Python script stub (incomplete, 8 lines) for replacing streaming block in proxy.ts with
 Promise.race-based stall detection
fix.py

19. .roo/specs/wrapped-error-interception/design.md Design documentation +337/-0

Wrapped error detection on HTTP 200 responses

• Introduces a two-step detection mechanism for wrapped error payloads on HTTP 200 responses
• Adds isWrappedError() predicate and throwWrappedError() helper to BaseProvider
• Specifies integration points in four provider implementations (OpenAI, Cohere, Cloudflare, Google)
• Documents error detection flow, wrapped error formats, and edge cases with detailed examples

.roo/specs/wrapped-error-interception/design.md

20. .roo/specs/recency-biased-thompson-sampling/design.md ✨ Enhancement +238/-0

Time-decay weighting for Thompson sampling aggregation

• Replaces flat request aggregation with linear time-decay weighting in SQL queries
• Introduces recency_weight calculation using julianday() for age-based decay over 7-day window
• Extends ModelStats interface with rawSuccesses and rawTotal fields for dashboard
 transparency
• Adds Math.max(0.1, ...) safety guards to Beta distribution parameters

.roo/specs/recency-biased-thompson-sampling/design.md

21. .roo/specs/sse-stream-heartbeat-stall-protection/requirements.md Requirements documentation +132/-0

SSE stream heartbeat and stall detection requirements

• Specifies SSE keep-alive heartbeats (15-second interval) to prevent intermediate proxy timeouts
• Defines stall detection mechanism (45-second threshold) with graceful stream termination
• Requires client-disconnect cleanup and heartbeat write failure handling
• Documents pre-stream and mid-stream error behaviors with structured error frames

.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md

22. .roo/specs/owl-alpha-longcat-model-routing/design.md Design documentation +184/-0

Model-level routing for Owl Alpha and LongCat

• Implements model-level routing for Owl Alpha and LongCat with balanced/smart mode differentiation
• Balanced mode excludes both longcat platform and openrouter/owl-alpha model entirely
• Smart mode prefers both models when valid keys exist, using same capacity validation logic
• Replaces provider-level banning with model-level skipModels for both platforms

.roo/specs/owl-alpha-longcat-model-routing/design.md

23. .roo/specs/owl-alpha-longcat-model-routing/tasks.md Tasks documentation +116/-0

Implementation tasks for model-level routing

• Phase 1: Router exclusion constants and balanced mode filtering in routeRequest()
• Phase 2: Proxy sticky cooldown checks and model-level error handling for both platforms
• Phase 3: Test cases for balanced exclusion, smart preference, sticky cooldown, and model-level
 banning
• All tasks marked complete with specific line references and implementation details

.roo/specs/owl-alpha-longcat-model-routing/tasks.md

24. client/src/pages/FallbackPage.tsx ✨ Enhancement +49/-30

Pool-based grouping for fallback model display

• Adds PoolType import and pool field to FallbackEntry interface
• Introduces PoolSection component wrapper for grouping models by pool type
• Reorganizes table display to show separate sections for fast, balanced, and smart pools
• Maintains existing sort and filter functionality within each pool group

client/src/pages/FallbackPage.tsx

25. .roo/specs/owl-alpha-longcat-model-routing/requirements.md Requirements documentation +126/-0

Requirements for model-level routing behavior

• Specifies exclusion of LongCat and Owl Alpha from balanced auto routing (REQ-1)
• Requires smart auto preference when valid non-cooldown keys exist (REQ-2)
• Defines sticky session cooldown protection for both platforms (REQ-3)
• Mandates model-level banning migration for LongCat and Owl Alpha (REQ-4, REQ-5)

.roo/specs/owl-alpha-longcat-model-routing/requirements.md

26. .roo/specs/generalized-thread-protection/design.md Design documentation +152/-0

Generalized thread protection scanner module

• Creates new threadProtection.ts scanner module with configurable per-platform protection rules
• Replaces 6 hardcoded longcat branches with dynamic evaluateThreadProtection() calls
• Implements decision matrix mapping error contexts (5xx, truncation, retryable) to actions (ban,
 skip, clear)
• Generalizes sticky cooldown check to work with any provider-ban level platform

.roo/specs/generalized-thread-protection/design.md

27. .roo/specs/disable-sticky-on-auto/design.md Design documentation +97/-0

Disable sticky sessions for balanced auto endpoint

• Single-point guard in getSessionKey() returns empty string for balanced mode
• Cascades through all sticky functions which already handle empty keys gracefully
• No changes needed to router or other proxy functions
• Includes flow diagram and edge case analysis

.roo/specs/disable-sticky-on-auto/design.md

28. .roo/specs/wrapped-error-interception/requirements.md Requirements documentation +53/-0

Requirements for wrapped error detection

• Requires detection of root-level error field in HTTP 200 responses across all providers
• Specifies ProviderApiError throwing with proper status/provider/responseBody fields
• Mandates handling both string and object error formats with code extraction
• Requires streaming mode support with first-chunk error detection

.roo/specs/wrapped-error-interception/requirements.md

29. .roo/specs/wrapped-error-interception/tasks.md Tasks documentation +65/-0

Implementation tasks for wrapped error detection

• 13 implementation tasks covering isWrappedError() and throwWrappedError() methods
• Tasks for adding checks in all four provider implementations (OpenAI, Cohere, Cloudflare, Google)
• Visibility change for extractErrorMessage() from private to protected
• TypeScript compilation and test verification tasks

.roo/specs/wrapped-error-interception/tasks.md

30. .roo/specs/recency-biased-thompson-sampling/tasks.md Tasks documentation +17/-0

Implementation tasks for recency-biased sampling

• 13 tasks for implementing time-decay weighting in SQL and TypeScript
• Tasks include SQL query rewrite, interface extension, safety guards, and dashboard updates
• Test cases for outage sensitivity, fractional evaluation, and clock drift safety
• Existing test suite verification

.roo/specs/recency-biased-thompson-sampling/tasks.md

31. .roo/specs/recency-biased-thompson-sampling/requirements.md Requirements documentation +76/-0

Requirements for recency-biased Thompson sampling

• Specifies linear time-decay formula with MIN/MAX bounds for clock drift protection
• Requires Math.max(0.1, ...) guards for Beta distribution parameter safety
• Mandates use of standard SQLite julianday() function for portability
• Includes test cases for outage sensitivity and fractional evaluation

.roo/specs/recency-biased-thompson-sampling/requirements.md

32. .roo/specs/transient-model-cooldown/requirements.md Requirements documentation +38/-0

Requirements for transient model cooldowns

• Introduces shared in-memory cooldown collection for transient failures across concurrent requests
• Specifies 15-second global cooldown window for 5xx and connection timeout errors
• Requires sticky session precedence override when preferred model is on cooldown
• Mandates auto-expiry and pruning without manual intervention

.roo/specs/transient-model-cooldown/requirements.md

33. .roo/specs/disable-sticky-on-auto/requirements.md Requirements documentation +44/-0

Requirements for disabling sticky on balanced endpoint

• Requires disabling sticky model/key pinning for balanced routing mode (REQ-1, REQ-2)
• Mandates no session-level platform bans for balanced mode (REQ-3)
• Specifies sticky sessions remain fully active for smart mode (REQ-4)
• Requires per-request retry skip logic to work for both modes (REQ-5)

.roo/specs/disable-sticky-on-auto/requirements.md

34. .roo/specs/sse-stream-heartbeat-stall-protection/tasks.md Tasks documentation +20/-0

Implementation tasks for stream heartbeat protection

• 10 implementation tasks for heartbeat interval, stall detection, and cleanup logic
• Tasks include constant definitions, state variable setup, and client-disconnect handling
• Comprehensive unit tests for heartbeat emission, stall detection, and edge cases
• Existing test suite regression verification

.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md

35. .roo/specs/transient-model-cooldown/tasks.md Tasks documentation +16/-0

Implementation tasks for transient cooldowns

• 8 implementation tasks for transient cooldown Map and injection logic
• Tasks include pre-routing cooldown injection, sticky session override, and error registration
• Unit tests for cooldown injection, expiry pruning, and sticky override behavior
• Existing test suite regression verification

.roo/specs/transient-model-cooldown/tasks.md

36. server/write_tests.py Miscellaneous +45/-0

Python utility for writing router tests

• Python script to write router test file in parts to avoid truncation
• Includes test setup with database initialization and encryption key configuration
• Defines basic test cases for key configuration and routing logic
• File appears incomplete (truncated at end)

server/write_tests.py

37. .roo/specs/generalized-thread-protection/tasks.md Tasks documentation +12/-0

Implementation tasks for thread protection scanner

• 8 tasks for implementing generalized thread protection scanner
• Task 1-3: Rename constant and remove hardcoded LongCat/Owl Alpha blocks
• Task 4: Insert generalized scanner at same location
• Tasks 5-8: Verify execution order, create tests, and run regression suite

.roo/specs/generalized-thread-protection/tasks.md

38. .roo/specs/disable-sticky-on-auto/tasks.md Tasks documentation +16/-0

Implementation tasks for disabling sticky on auto

• 4 tasks for disabling sticky sessions on balanced endpoint
• Task 1: Single code change in getSessionKey() to return empty string for balanced mode
• Task 2: Add test cases verifying balanced mode skips sticky operations
• Tasks 3-4: Run existing tests and manual smoke test

.roo/specs/disable-sticky-on-auto/tasks.md

39. server/write_test.py Miscellaneous +29/-0

Python utility for router test generation

• Python script to write complete router test file
• Includes test setup and basic test cases
• File appears incomplete (truncated at end)

server/write_test.py

40. new_streaming_block.txt Miscellaneous +28/-0

New streaming block implementation reference

• Defines new streaming block structure with heartbeat and stall detection state variables
• Includes cleanupStream() function and stallTimeout() helper
• Shows SSE headers setup and stream state initialization
• File appears incomplete (truncated at end)

new_streaming_block.txt

41. fix_streaming.py Miscellaneous +21/-0

Python utility for streaming block replacement

• Python script to replace streaming block in proxy.ts with Promise.race-based stall detection
• Reads before/after parts and applies new streaming implementation
• File appears incomplete (truncated at end)

fix_streaming.py

42. do_fix.py Miscellaneous +7/-0

Python utility stub for proxy fix

• Python script stub for fixing streaming block in proxy.ts
• Reads proxy.ts content but implementation is incomplete

do_fix.py

43. fix{ Additional files +0/-0

...

fix{

qodo-code-review · 2026-06-02T23:46:02Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 🔗 Cross-repo conflicts (0)

Context used

✅ Compliance rules (platform): 6 rules

1. Stall doesn't stop upstream 🐞 Bug ☼ Reliability

Description

In the streaming path of handleChatCompletion, the stall timer ends the HTTP response but does not
abort/cancel the provider stream async iterator, so the for await can remain blocked indefinitely
if the upstream stops yielding chunks. This can leak request handlers and reduce server capacity
during upstream stalls.

Code

server/src/routes/proxy.ts[R1351-1417]

Evidence

The stall handler sets a flag and ends the response, but the only break condition (`if
(streamAborted) break;) runs after a chunk is received; when no further chunks arrive, the for
await` remains blocked waiting on the generator.

server/src/routes/proxy.ts[1351-1418]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The stream stall handler sets `streamAborted` and closes the client response, but it does not interrupt the blocked `for await` loop waiting on the upstream async iterator. If the upstream never yields again, the handler remains stuck and never returns.

## Issue Context
This occurs in the streaming branch where a `setInterval` detects stalls and closes the response. The `streamAborted` check is only evaluated inside the loop body, which is unreachable when the iterator is stalled.

## Fix Focus Areas
- server/src/routes/proxy.ts[1351-1418]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Stray debug scripts committed 🐞 Bug ⚙ Maintainability

Description

The PR adds multiple incomplete local fix/debug artifacts (e.g., do_fix.py, fix.py,
fix_streaming.py, new_streaming_block.txt) including hardcoded local paths and unterminated code
blocks. These files add maintenance/security risk and should not ship in the repository.

Code

do_fix.py[R1-7]

Evidence

These files contain incomplete code (e.g., dangling {), hardcoded local filesystem paths, and
partial TypeScript embedded in text/scripts, indicating they are development artifacts rather than
production code.

do_fix.py[1-7]
fix_streaming.py[1-21]
new_streaming_block.txt[1-28]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Several newly added files appear to be local, partially-written helper scripts / scratch buffers (broken syntax, hardcoded absolute paths, unfinished blocks). They should be removed from the PR (and optionally added to `.gitignore` if they’re generated during development).

## Issue Context
Examples include root-level Python scripts and a stray text file containing partial TypeScript snippets.

## Fix Focus Areas
- do_fix.py[1-7]
- fix.py[1-8]
- fix_streaming.py[1-21]
- new_streaming_block.txt[1-28]
- server/write_test.py[1-29]
- server/write_tests.py[1-45]
- fix{[1-1]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Cooldown guard checks wrong set 🐞 Bug ≡ Correctness

Description

The thread-cooldown exhaustion protection compares cooldown exclusions against `models WHERE
enabled=1 rather than the actual routable fallback chain used by routeRequest`. If enabled models
exist outside fallback_config, the guard may fail to trigger even when cooldown exclusions exhaust
all routable options, leading to avoidable routing failures.

Code

server/src/routes/proxy.ts[R1258-1276]

Evidence

proxy.ts computes wouldExhaustAll from all enabled models, while router.ts only routes across
fallback_config JOIN models (and then filters further by routing mode). This mismatch means the
guard may not represent the true routable set.

server/src/routes/proxy.ts[1258-1287]
server/src/services/router.ts[506-524]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The cooldown “exhaustion protection” determines whether to clear cooldown exclusions by querying all enabled model IDs from `models`. Routing, however, only considers models in `fallback_config` (and additional mode-based filtering), so the exhaustion check can be miscalibrated.

## Issue Context
If `models.enabled=1` includes entries not present in `fallback_config` (or otherwise unroutable), `wouldExhaustAll` can be computed as false while the real routing chain is fully excluded, and `routeRequest()` will still throw.

## Fix Focus Areas
- server/src/routes/proxy.ts[1258-1287]
- server/src/services/router.ts[506-524]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

gemini-code-assist

Code Review

This pull request introduces several major features to improve routing stability and performance, including disabling sticky sessions in balanced mode, implementing a generalized thread protection scanner, adding Owl Alpha model-level routing, applying recency-biased Thompson sampling, adding SSE stream heartbeat and stall protection, and introducing transient model cooldowns and wrapped error interception. The frontend fallback page is also updated to group models by pool. Feedback on the changes highlights critical syntax errors in the SQL query for recency-biased sampling, a malformed SQL query and truncation issue in the router test file, and a minor double semicolon typo in the proxy route handler.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T23:47:24Z

+      COUNT(*) as raw_total,
+      SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,
+      SUM(CASE WHEN status = 'success'
+        THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))
+        ELSE 0 END) as successes,


The SQL query contains syntax errors due to extra closing parentheses at the end of the total and successes weighted sum expressions. Specifically, 7.0)))) has an extra ) and 7.0))) in the THEN clause also has an extra ). This will cause SQLite syntax errors and crash the stats cache refresh.

Suggested change

COUNT(*) as raw_total,

SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,

SUM(CASE WHEN status = 'success'

THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))

ELSE 0 END) as successes,

COUNT(*) as raw_total,

SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total,

SUM(CASE WHEN status = 'success'

THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))

ELSE 0 END) as successes,

gemini-code-assist · 2026-06-02T23:47:24Z

      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
+      VALUES (?,
      VALUES (?, ?, ?, ?, ?, ?, ?)
    `).run('groq', 'test', encrypted, iv, authTag, 'healthy', 1);



The SQL query inside this test has duplicate/malformed VALUES clauses, which will cause a syntax error when running the tests.

db.prepare(` INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?) `).run('groq', 'test', encrypted, iv, authTag, 'healthy', 1);

gemini-code-assist · 2026-06-02T23:47:24Z

  it('should skip disabled keys', () => {
    const db = getDb();
-
    const googleKey = encrypt('test-google-key');
    db.prepare(`
      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
      VALUES (?, ?, ?, ?, ?, ?, ?)
    `).run('google', 'disabled', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 0);
-
-    const groqKey = encrypt('test-groq-key');
-    db.prepare(`
-      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
-      VALUES (?, ?, ?, ?, ?, ?, ?)
-    `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1);
-
-    const result = routeRequest();
-    expect(result.platform).toBe('groq');
-  });
-
-  it('should skip invalid keys', () => {
-    const db = getDb();
-
-    const invalidKey = encrypt('invalid-key');
-    db.prepare(`
-      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
-      VALUES (?, ?, ?, ?, ?, ?, ?)
-    `).run('google', 'invalid', invalidKey.encrypted, invalidKey.iv, invalidKey.authTag, 'invalid', 1);
-
-    const groqKey = encrypt('test-groq-key');
-    db.prepare(`
-      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
-      VALUES (?, ?, ?, ?, ?, ?, ?)
-    `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1);
-
-    const result = routeRequest();
-    expect(result.platform).toBe('groq');
-  });
-});
+    const groqKey = encrypt


The test file is truncated at the end, leaving the should skip disabled keys test incomplete and missing the should skip invalid keys test, which prevents the test suite from compiling or running.

it('should skip disabled keys', () => { const db = getDb(); const googleKey = encrypt('test-google-key'); db.prepare(` INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?) `).run('google', 'disabled', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 0); const groqKey = encrypt('test-groq-key'); db.prepare(` INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?) `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1); const result = routeRequest(); expect(result.platform).toBe('groq'); }); it('should skip invalid keys', () => { const db = getDb(); const invalidKey = encrypt('invalid-key'); db.prepare(` INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?) `).run('google', 'invalid', invalidKey.encrypted, invalidKey.iv, invalidKey.authTag, 'invalid', 1); const groqKey = encrypt('test-groq-key'); db.prepare(` INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?) `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1); const result = routeRequest(); expect(result.platform).toBe('groq'); }); });

gemini-code-assist · 2026-06-02T23:47:24Z

+                  });
+                } else {
+                  res.write(`data: ${JSON.stringify(payload)}\n\n`);
+                  res.write('data: [DONE]\n\n');;


Minor syntax issue: there is a double semicolon at the end of this line.

Suggested change

res.write('data: [DONE]\n\n');;

res.write('data: [DONE]\n\n');

sourcery-ai

Hey - I've found 7 issues, and left some high level feedback:

There are several temporary helper scripts and files (e.g. write_test.py, fix_streaming.py, fix.py, do_fix.py, new_streaming_block.txt, server/write_tests.py) checked into the repo; these look like local tooling and should either be removed from the PR or moved under a proper tooling directory and wired into the build/test process if they are meant to be permanent.
The changes around server/src/__tests__/services/router.test.ts and the related write_test.py helper appear to leave the test file in an incomplete/broken state (e.g. a truncated INSERT INTO api_keys statement and an unfinished expect(() => route...), which will prevent the test suite from running; please fix the test file content directly instead of relying on the generator script, or remove the script if it’s not meant to be committed.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- There are several temporary helper scripts and files (e.g. `write_test.py`, `fix_streaming.py`, `fix.py`, `do_fix.py`, `new_streaming_block.txt`, `server/write_tests.py`) checked into the repo; these look like local tooling and should either be removed from the PR or moved under a proper tooling directory and wired into the build/test process if they are meant to be permanent.
- The changes around `server/src/__tests__/services/router.test.ts` and the related `write_test.py` helper appear to leave the test file in an incomplete/broken state (e.g. a truncated `INSERT INTO api_keys` statement and an unfinished `expect(() => route...`), which will prevent the test suite from running; please fix the test file content directly instead of relying on the generator script, or remove the script if it’s not meant to be committed.

## Individual Comments

### Comment 1
<location path="server/src/services/router.ts" line_range="184-185" />
<code_context>
-      COUNT(*) as total,
-      SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
+      COUNT(*) as raw_total,
+      SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,
+      SUM(CASE WHEN status = 'success'
+        THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))
+        ELSE 0 END) as successes,
</code_context>
<issue_to_address>
**issue (bug_risk):** Fix mismatched parentheses in recency-weighted stats SQL expression.

The `SUM(MAX(0, MIN(...))))` call has an extra closing parenthesis, which will cause a SQLite syntax error. It should be `SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total` (one closing paren each for `MIN`, `MAX`, and `SUM`). Please also re-check the parentheses in the `successes` expression to ensure they match your intended nesting.
</issue_to_address>

### Comment 2
<location path="server/src/routes/proxy.ts" line_range="19" />
<code_context>
 const stickySessionMap = new Map<string, { modelDbId: number; keyId?: number; bannedPlatforms?: Set<string>; lastUsed: number }>();
 const STICKY_TTL_MS = 30 * 60 * 1000; // 30 min session TTL
-const LONGCAT_STICKY_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window
+const THREAD_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window
+// Stream heartbeat & stall protection config — exported for test overrides
+export const streamKeepaliveConfig = {
</code_context>
<issue_to_address>
**nitpick:** Update the THREAD_COOLDOWN_MS comment to match the generalized implementation.

The `THREAD_COOLDOWN_MS` comment still calls out LongCat specifically, but the new logic applies a cooldown across platforms based on recent sticky usage. Please update the wording to describe a generic cross-session thread cooldown instead of LongCat-only behavior.
</issue_to_address>

### Comment 3
<location path="fix_streaming.py" line_range="2-11" />
<code_context>
+#!/usr/bin/env python3
+"""Replace the streaming block in proxy.ts with Promise.race-based stall detection."""
+
+with open('server/src/routes/proxy.ts', 'r') as f:
</code_context>
<issue_to_address>
**issue:** Remove or relocate helper scripts and partial patches that don’t belong in the runtime tree.

Files like `fix_streaming.py`, `fix.py`, `do_fix.py`, `server/write_test.py`, `server/write_tests.py`, and `new_streaming_block.txt` appear to be ad-hoc tooling or partial patches (and `do_fix.py` is syntactically invalid). Leaving them under `server/` can confuse readers and interfere with tooling that scans or executes everything there. Please either remove them or move them into a dedicated tooling/examples directory and ensure only complete, valid scripts remain in the runtime tree.
</issue_to_address>

### Comment 4
<location path="server/src/__tests__/services/router.test.ts" line_range="78-52" />
<code_context>
-    expect(result.platform).toBe('groq');
-  });
-
-  it('should skip invalid keys', () => {
-    const db = getDb();
-
-    const invalidKey = encrypt('invalid-key');
-    db.prepare(`
-      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
-      VALUES (?, ?, ?, ?, ?, ?, ?)
-    `).run('google', 'invalid', invalidKey.encrypted, invalidKey.iv, invalidKey.authTag, 'invalid', 1);
-
-    const groqKey = encrypt('test-groq-key');
-    db.prepare(`
-      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
-      VALUES (?, ?, ?, ?, ?, ?, ?)
-    `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1);
-
-    const result = routeRequest();
-    expect(result.platform).toBe('groq');
-  });
</code_context>
<issue_to_address>
**issue (testing):** Reintroduce or replace the removed "skip invalid keys" routing test

The previous `"should skip invalid keys"` test was removed, but we still need coverage that `status = 'invalid'` keys are excluded from routing. Please either:

- Reintroduce a test that inserts both an `invalid` and a `healthy` key and asserts the invalid key/platform is never selected, or
- Add a test that verifies `hasValidKeys` returns `false` when only invalid keys exist for a platform/model, causing `routeRequest` to fall back to another platform.

This ensures regressions in key-status filtering (e.g. mistakenly including `invalid` keys) are caught by tests.
</issue_to_address>

### Comment 5
<location path="server/src/__tests__/services/router.test.ts" line_range="12-15" />
<code_context>
     app = createApp();
   });

-  beforeEach(() => {
+  beforeEach(async () => {
     (stickySessionMap as Map<any, any>).clear();
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for new balanced/smart routing behavior (LongCat/Owl Alpha and recency bias)

The router now includes several untested behaviors (recency-weighted stats, `rawTotal`, `EXCLUDED_FROM_BALANCED` / `EXCLUDED_MODELS_FROM_BALANCED`, LongCat/Owl Alpha handling via `hasValidKeys`). To exercise this logic, add tests that:

- Verify that in `routingMode = 'balanced'`, LongCat and Owl Alpha are excluded when other eligible models exist, but remain selectable in `routingMode = 'smart'`.
- Seed Owl Alpha plus other smart-eligible models, and assert that with `routingMode = 'smart'` and `hasValidKeys(owl-alpha) === true`, `routeRequest` prefers Owl Alpha.
- Insert synthetic requests with varied `created_at` timestamps, refresh the stats cache, and assert that `getAnalyticsScores` exposes `total = rawTotal` while the internal `ModelStats.total` is recency-weighted.

This will ensure the new routing behavior is properly covered by tests.
</issue_to_address>

### Comment 6
<location path="server/src/__tests__/routes/transient-cooldown.test.ts" line_range="20-27" />
<code_context>
+  (stickySessionMap as Map<any, any>).clear();
+}
+
+describe('Transient model cooldown functionality', () => {
+  let app: Express;
+
+  beforeAll(() => {
+    process.env.ENCRYPTION_KEY = '0'.repeat(64);
+    initDb(':memory:');
+    app = createApp();
+  });
+
+  beforeEach(() => {
</code_context>
<issue_to_address>
**suggestion (testing):** Complement low-level transient cooldown tests with at least one proxy integration test

The current tests validate the cooldown map behavior in isolation, but there’s no end-to-end coverage that a real `/v1/chat/completions` request actually triggers and respects transient cooldowns. Please add an integration test (e.g., alongside `proxy-tools.test.ts`) that:

1. Mocks a provider to return a retryable/5xx error for a known `modelDbId`,
2. Asserts that `transientModelCooldowns` is populated for that ID, and
3. Sends a second request and verifies (via DB or logs) that this model appears in `skipModels` and is not chosen by the router.

This will confirm the cooldown logic is exercised in the real routing path, not just in unit-level helpers.

Suggested implementation:

```typescript
} from '../../routes/proxy.js';
import request from 'supertest';

function clearCooldownMap() {
  (transientModelCooldowns as Map<any, any>).clear();
}

function clearStickyMap() {
  (stickySessionMap as Map<any, any>).clear();
}

describe('Transient model cooldown functionality', () => {
  let app: Express;

  beforeAll(() => {
    process.env.ENCRYPTION_KEY = '0'.repeat(64);
    initDb(':memory:');
    app = createApp();
  });

  beforeEach(() => {
    clearCooldownMap();
    clearStickyMap();
    const db = getDb();
    db.prepare('DELETE FROM api_keys').run();
    db.prepare('DELETE FROM requests').run();
  });

  afterEach(() => {
    clearCooldownMap();
    clearStickyMap();
  });

  it('applies transient cooldowns end-to-end for chat completions', async () => {
    const db = getDb();

    // Seed an API key and a transient model; align this with your actual schema.
    const apiKey = 'test-api-key';
    db.prepare(
      `
      INSERT INTO api_keys (api_key, label, is_active)
      VALUES (?, 'test-key', 1)
      `
    ).run(apiKey);

    // This assumes you have some notion of provider + model tables; adjust to match your schema.
    // The important part is obtaining a concrete modelDbId used by the router and cooldown map.
    const providerStmt = db.prepare(
      `
      INSERT INTO providers (name, type, base_url, is_active)
      VALUES (?, 'test', 'http://localhost:9999', 1)
      `
    );
    const providerResult = providerStmt.run('test-provider');
    const providerId = providerResult.lastInsertRowid as number;

    const modelStmt = db.prepare(
      `
      INSERT INTO models (provider_id, name, is_active, is_transient)
      VALUES (?, ?, 1, 1)
      `
    );
    const modelResult = modelStmt.run(providerId, 'test-transient-model');
    const modelDbId = modelResult.lastInsertRowid as number;

    // Mock the underlying provider call so that the first attempt returns a retryable error.
    // Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.).
    const originalFetch = (global as any).fetch;
    const fetchMock = jest.fn()
      // First call for this model: retryable server error.
      .mockResolvedValueOnce(
        new Response(JSON.stringify({ error: 'temporary failure' }), {
          status: 502,
          headers: { 'Content-Type': 'application/json' },
        }) as any,
      )
      // Second call (if the router mistakenly uses this model again) – would succeed.
      .mockResolvedValue(
        new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), {
          status: 200,
          headers: { 'Content-Type': 'application/json' },
        }) as any,
      );
    (global as any).fetch = fetchMock;

    try {
      // First request: should hit the model, get a retryable error, and populate cooldowns.
      const firstRes = await request(app)
        .post('/v1/chat/completions')
        .set('Authorization', `Bearer ${apiKey}`)
        .send({
          model: 'test-transient-model',
          messages: [{ role: 'user', content: 'hello' }],
        });

      expect(firstRes.status).toBeGreaterThanOrEqual(500);
      expect(firstRes.status).toBeLessThan(600);
      expect(transientModelCooldowns.has(modelDbId)).toBe(true);

      // Capture logs or DB state around routing to assert skipModels includes this model.
      const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {});

      // Second request: router should now treat this model as cooled down and skip it.
      const secondRes = await request(app)
        .post('/v1/chat/completions')
        .set('Authorization', `Bearer ${apiKey}`)
        .send({
          model: 'test-transient-model',
          messages: [{ role: 'user', content: 'hello again' }],
        });

      // Depending on the presence of fallback models, this may succeed with a different model
      // or fail with "no available models". The key is that the cooled-down model is skipped.
      expect(secondRes.status).not.toBe(502);

      const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n');
      // Adjust this assertion to match the actual log/message structure where skipModels is recorded.
      expect(logsCombined).toContain('skipModels');
      expect(logsCombined).toContain(String(modelDbId));

      logSpy.mockRestore();
    } finally {
      (global as any).fetch = originalFetch;
    }

```

`).

```xml
<file_operations>
<file_operation operation="edit" file_path="server/src/__tests__/routes/transient-cooldown.test.ts">
<<<<<<< SEARCH
} from '../../routes/proxy.js';

function clearCooldownMap() {
  (transientModelCooldowns as Map<any, any>).clear();
}

function clearStickyMap() {
  (stickySessionMap as Map<any, any>).clear();
}

describe('Transient model cooldown functionality', () => {
  let app: Express;

  beforeAll(() => {
    process.env.ENCRYPTION_KEY = '0'.repeat(64);
    initDb(':memory:');
    app = createApp();
  });

  beforeEach(() => {
    clearCooldownMap();
    clearStickyMap();
    const db = getDb();
    db.prepare('DELETE FROM api_keys').run();
    db.prepare('DELETE FROM requests').run();
  });

  afterEach(() => {
    clearCooldownMap();
    clearStickyMap();
=======
} from '../../routes/proxy.js';
import request from 'supertest';

function clearCooldownMap() {
  (transientModelCooldowns as Map<any, any>).clear();
}

function clearStickyMap() {
  (stickySessionMap as Map<any, any>).clear();
}

describe('Transient model cooldown functionality', () => {
  let app: Express;

  beforeAll(() => {
    process.env.ENCRYPTION_KEY = '0'.repeat(64);
    initDb(':memory:');
    app = createApp();
  });

  beforeEach(() => {
    clearCooldownMap();
    clearStickyMap();
    const db = getDb();
    db.prepare('DELETE FROM api_keys').run();
    db.prepare('DELETE FROM requests').run();
  });

  afterEach(() => {
    clearCooldownMap();
    clearStickyMap();
  });

  it('applies transient cooldowns end-to-end for chat completions', async () => {
    const db = getDb();

    // Seed an API key and a transient model; align this with your actual schema.
    const apiKey = 'test-api-key';
    db.prepare(
      `
      INSERT INTO api_keys (api_key, label, is_active)
      VALUES (?, 'test-key', 1)
      `
    ).run(apiKey);

    // This assumes you have some notion of provider + model tables; adjust to match your schema.
    // The important part is obtaining a concrete modelDbId used by the router and cooldown map.
    const providerStmt = db.prepare(
      `
      INSERT INTO providers (name, type, base_url, is_active)
      VALUES (?, 'test', 'http://localhost:9999', 1)
      `
    );
    const providerResult = providerStmt.run('test-provider');
    const providerId = providerResult.lastInsertRowid as number;

    const modelStmt = db.prepare(
      `
      INSERT INTO models (provider_id, name, is_active, is_transient)
      VALUES (?, ?, 1, 1)
      `
    );
    const modelResult = modelStmt.run(providerId, 'test-transient-model');
    const modelDbId = modelResult.lastInsertRowid as number;

    // Mock the underlying provider call so that the first attempt returns a retryable error.
    // Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.).
    const originalFetch = (global as any).fetch;
    const fetchMock = jest.fn()
      // First call for this model: retryable server error.
      .mockResolvedValueOnce(
        new Response(JSON.stringify({ error: 'temporary failure' }), {
          status: 502,
          headers: { 'Content-Type': 'application/json' },
        }) as any,
      )
      // Second call (if the router mistakenly uses this model again) – would succeed.
      .mockResolvedValue(
        new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), {
          status: 200,
          headers: { 'Content-Type': 'application/json' },
        }) as any,
      );
    (global as any).fetch = fetchMock;

    try {
      // First request: should hit the model, get a retryable error, and populate cooldowns.
      const firstRes = await request(app)
        .post('/v1/chat/completions')
        .set('Authorization', `Bearer ${apiKey}`)
        .send({
          model: 'test-transient-model',
          messages: [{ role: 'user', content: 'hello' }],
        });

      expect(firstRes.status).toBeGreaterThanOrEqual(500);
      expect(firstRes.status).toBeLessThan(600);
      expect(transientModelCooldowns.has(modelDbId)).toBe(true);

      // Capture logs or DB state around routing to assert skipModels includes this model.
      const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {});

      // Second request: router should now treat this model as cooled down and skip it.
      const secondRes = await request(app)
        .post('/v1/chat/completions')
        .set('Authorization', `Bearer ${apiKey}`)
        .send({
          model: 'test-transient-model',
          messages: [{ role: 'user', content: 'hello again' }],
        });

      // Depending on the presence of fallback models, this may succeed with a different model
      // or fail with "no available models". The key is that the cooled-down model is skipped.
      expect(secondRes.status).not.toBe(502);

      const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n');
      // Adjust this assertion to match the actual log/message structure where skipModels is recorded.
      expect(logsCombined).toContain('skipModels');
      expect(logsCombined).toContain(String(modelDbId));

      logSpy.mockRestore();
    } finally {
      (global as any).fetch = originalFetch;
    }
>>>>>>> REPLACE
</file_operation>
</file_operations>

<additional_changes>
1. **Align DB seeding with your schema/helpers**:
   - Replace the `INSERT INTO providers` and `INSERT INTO models` statements with the actual table/column names used by your router.
   - If you already have helpers (e.g., in `proxy-tools.test.ts`) to create providers/models and return their DB IDs, use those instead of raw SQL here and remove the explicit inserts in this test.

2. **Use the same HTTP client mocking strategy as the rest of your tests**:
   - If the proxy uses `axios`, a custom HTTP client, or an SDK instead of `fetch`, adjust the mocking accordingly (e.g., `jest.spyOn(httpClient, 'post')`, `nock`, etc.).
   - Mirror the exact way a provider failure is simulated in your existing proxy tests so that the router records a retryable error and triggers cooldown logic.

3. **Match how `skipModels` is surfaced (logs or DB)**:
   - If `skipModels` is persisted in the `requests` table, assert against the last row instead of logs, for example:
     - `const lastRequest = db.prepare('SELECT * FROM requests ORDER BY id DESC LIMIT 1').get();`
     - Inspect a `skip_models` or similar column to confirm it contains `modelDbId`.
   - If you use a structured logger instead of `console.debug`, spy on that logger instance and adjust the expectation from `logsCombined` to the appropriate logged fields.

4. **Ensure `modelDbId` matches the key used by `transientModelCooldowns`**:
   - If cooldowns are keyed by something other than the models table PK (e.g., a join table ID or a provider-model mapping ID), update the insert/select logic in the test to obtain the exact ID used by the router and cooldown map.
</issue_to_address>

### Comment 7
<location path="server/src/__tests__/routes/stream-heartbeat-stall.test.ts" line_range="70-79" />
<code_context>
+  it('emits SSE keep-alive comments during idle periods', async () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Make heartbeat/stall tests deterministic using fake timers to avoid flakiness

These tests depend on real `setTimeout` delays and the global `streamKeepaliveConfig`, which can be flaky under slow CI or load. Within this suite, consider `vi.useFakeTimers()` and driving behavior via `vi.advanceTimersByTime(...)` instead of real-time waits, so the heartbeat/stall scenarios run quickly and deterministically.

Suggested implementation:

```typescript
  afterEach(() => {
    // Restore original config values
    streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS = origKeepaliveInterval;
    streamKeepaliveConfig.MAX_STREAM_STALL_MS = origMaxStall;
    // Ensure we always reset timers and mocks between tests so heartbeat/stall tests
    // don't depend on real time and stay deterministic.
    vi.useRealTimers();
    vi.restoreAllMocks();
  });

  it('emits SSE keep-alive comments during idle periods', async () => {
    // Use fake timers so we can deterministically drive heartbeat/idle behavior
    // via vi.advanceTimersByTime(...) instead of relying on real setTimeout delays.
    vi.useFakeTimers();

    const origFetch = global.fetch;
    const encoder = new TextEncoder();

```

To fully implement the suggestion and make the test deterministic, you should also:
1. Ensure the mocked provider implementation in this test (and related heartbeat/stall tests) uses `setTimeout`-based delays rather than `await new Promise(...)` with real-time sleeps.
2. Replace any `await new Promise(resolve => setTimeout(resolve, X))` or similar real-time waits in this test suite with `vi.advanceTimersByTime(X)` (or `vi.runAllTimers()`), so the test progresses via virtual time instead of wall-clock time.
3. If there are other heartbeat/stall tests in this file, consider applying the same `vi.useFakeTimers()` pattern to them (either at the individual `it(...)` level or via a `beforeEach` in the surrounding `describe`), and drive their timing assertions using `vi.advanceTimersByTime(...)`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-02T23:47:37Z

+      SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,
+      SUM(CASE WHEN status = 'success'


issue (bug_risk): Fix mismatched parentheses in recency-weighted stats SQL expression.

The SUM(MAX(0, MIN(...)))) call has an extra closing parenthesis, which will cause a SQLite syntax error. It should be SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total (one closing paren each for MIN, MAX, and SUM). Please also re-check the parentheses in the successes expression to ensure they match your intended nesting.

sourcery-ai · 2026-06-02T23:47:37Z

 const stickySessionMap = new Map<string, { modelDbId: number; keyId?: number; bannedPlatforms?: Set<string>; lastUsed: number }>();
 const STICKY_TTL_MS = 30 * 60 * 1000; // 30 min session TTL
-const LONGCAT_STICKY_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window
+const THREAD_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window


nitpick: Update the THREAD_COOLDOWN_MS comment to match the generalized implementation.

The THREAD_COOLDOWN_MS comment still calls out LongCat specifically, but the new logic applies a cooldown across platforms based on recent sticky usage. Please update the wording to describe a generic cross-session thread cooldown instead of LongCat-only behavior.

sourcery-ai · 2026-06-02T23:47:37Z

+"""Replace the streaming block in proxy.ts with Promise.race-based stall detection."""
+
+# Read the before and after parts
+with open('/tmp/before.ts', 'r') as f:
+    before = f.read()
+
+with open('/tmp/after.ts', 'r') as f:
+    after = f.read()  # starts with "} else {"
+
+# The new streaming block


issue: Remove or relocate helper scripts and partial patches that don’t belong in the runtime tree.

Files like fix_streaming.py, fix.py, do_fix.py, server/write_test.py, server/write_tests.py, and new_streaming_block.txt appear to be ad-hoc tooling or partial patches (and do_fix.py is syntactically invalid). Leaving them under server/ can confuse readers and interfere with tooling that scans or executes everything there. Please either remove them or move them into a dedicated tooling/examples directory and ensure only complete, valid scripts remain in the runtime tree.

sourcery-ai · 2026-06-02T23:47:37Z

    const groqKey = encrypt('test-groq-key');
    db.prepare(`
      INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
      VALUES (?, ?, ?, ?, ?, ?, ?)


issue (testing): Reintroduce or replace the removed "skip invalid keys" routing test

The previous "should skip invalid keys" test was removed, but we still need coverage that status = 'invalid' keys are excluded from routing. Please either:

Reintroduce a test that inserts both an invalid and a healthy key and asserts the invalid key/platform is never selected, or

Add a test that verifies hasValidKeys returns false when only invalid keys exist for a platform/model, causing routeRequest to fall back to another platform.

This ensures regressions in key-status filtering (e.g. mistakenly including invalid keys) are caught by tests.

sourcery-ai · 2026-06-02T23:47:37Z

  beforeEach(() => {
    const db = getDb();
    db.prepare('DELETE FROM api_keys').run();
-    // Reset fallback order to intelligence ranking


suggestion (testing): Add tests for new balanced/smart routing behavior (LongCat/Owl Alpha and recency bias)

The router now includes several untested behaviors (recency-weighted stats, rawTotal, EXCLUDED_FROM_BALANCED / EXCLUDED_MODELS_FROM_BALANCED, LongCat/Owl Alpha handling via hasValidKeys). To exercise this logic, add tests that:

Verify that in routingMode = 'balanced', LongCat and Owl Alpha are excluded when other eligible models exist, but remain selectable in routingMode = 'smart'.

Seed Owl Alpha plus other smart-eligible models, and assert that with routingMode = 'smart' and hasValidKeys(owl-alpha) === true, routeRequest prefers Owl Alpha.

Insert synthetic requests with varied created_at timestamps, refresh the stats cache, and assert that getAnalyticsScores exposes total = rawTotal while the internal ModelStats.total is recency-weighted.

This will ensure the new routing behavior is properly covered by tests.

sourcery-ai · 2026-06-02T23:47:37Z

+describe('Transient model cooldown functionality', () => {
+  let app: Express;
+
+  beforeAll(() => {
+    process.env.ENCRYPTION_KEY = '0'.repeat(64);
+    initDb(':memory:');
+    app = createApp();
+  });


suggestion (testing): Complement low-level transient cooldown tests with at least one proxy integration test

The current tests validate the cooldown map behavior in isolation, but there’s no end-to-end coverage that a real /v1/chat/completions request actually triggers and respects transient cooldowns. Please add an integration test (e.g., alongside proxy-tools.test.ts) that:

Mocks a provider to return a retryable/5xx error for a known modelDbId,

Asserts that transientModelCooldowns is populated for that ID, and

Sends a second request and verifies (via DB or logs) that this model appears in skipModels and is not chosen by the router.

This will confirm the cooldown logic is exercised in the real routing path, not just in unit-level helpers.

Suggested implementation:

} from '../../routes/proxy.js'; import request from 'supertest'; function clearCooldownMap() { (transientModelCooldowns as Map<any, any>).clear(); } function clearStickyMap() { (stickySessionMap as Map<any, any>).clear(); } describe('Transient model cooldown functionality', () => { let app: Express; beforeAll(() => { process.env.ENCRYPTION_KEY = '0'.repeat(64); initDb(':memory:'); app = createApp(); }); beforeEach(() => { clearCooldownMap(); clearStickyMap(); const db = getDb(); db.prepare('DELETE FROM api_keys').run(); db.prepare('DELETE FROM requests').run(); }); afterEach(() => { clearCooldownMap(); clearStickyMap(); }); it('applies transient cooldowns end-to-end for chat completions', async () => { const db = getDb(); // Seed an API key and a transient model; align this with your actual schema. const apiKey = 'test-api-key'; db.prepare( ` INSERT INTO api_keys (api_key, label, is_active) VALUES (?, 'test-key', 1) ` ).run(apiKey); // This assumes you have some notion of provider + model tables; adjust to match your schema. // The important part is obtaining a concrete modelDbId used by the router and cooldown map. const providerStmt = db.prepare( ` INSERT INTO providers (name, type, base_url, is_active) VALUES (?, 'test', 'http://localhost:9999', 1) ` ); const providerResult = providerStmt.run('test-provider'); const providerId = providerResult.lastInsertRowid as number; const modelStmt = db.prepare( ` INSERT INTO models (provider_id, name, is_active, is_transient) VALUES (?, ?, 1, 1) ` ); const modelResult = modelStmt.run(providerId, 'test-transient-model'); const modelDbId = modelResult.lastInsertRowid as number; // Mock the underlying provider call so that the first attempt returns a retryable error. // Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.). const originalFetch = (global as any).fetch; const fetchMock = jest.fn() // First call for this model: retryable server error. .mockResolvedValueOnce( new Response(JSON.stringify({ error: 'temporary failure' }), { status: 502, headers: { 'Content-Type': 'application/json' }, }) as any, ) // Second call (if the router mistakenly uses this model again) – would succeed. .mockResolvedValue( new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), { status: 200, headers: { 'Content-Type': 'application/json' }, }) as any, ); (global as any).fetch = fetchMock; try { // First request: should hit the model, get a retryable error, and populate cooldowns. const firstRes = await request(app) .post('/v1/chat/completions') .set('Authorization', `Bearer ${apiKey}`) .send({ model: 'test-transient-model', messages: [{ role: 'user', content: 'hello' }], }); expect(firstRes.status).toBeGreaterThanOrEqual(500); expect(firstRes.status).toBeLessThan(600); expect(transientModelCooldowns.has(modelDbId)).toBe(true); // Capture logs or DB state around routing to assert skipModels includes this model. const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {}); // Second request: router should now treat this model as cooled down and skip it. const secondRes = await request(app) .post('/v1/chat/completions') .set('Authorization', `Bearer ${apiKey}`) .send({ model: 'test-transient-model', messages: [{ role: 'user', content: 'hello again' }], }); // Depending on the presence of fallback models, this may succeed with a different model // or fail with "no available models". The key is that the cooled-down model is skipped. expect(secondRes.status).not.toBe(502); const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n'); // Adjust this assertion to match the actual log/message structure where skipModels is recorded. expect(logsCombined).toContain('skipModels'); expect(logsCombined).toContain(String(modelDbId)); logSpy.mockRestore(); } finally { (global as any).fetch = originalFetch; }

`).

<file_operations> <file_operation operation="edit" file_path="server/src/__tests__/routes/transient-cooldown.test.ts"> <<<<<<< SEARCH } from '../../routes/proxy.js'; function clearCooldownMap() { (transientModelCooldowns as Map<any, any>).clear(); } function clearStickyMap() { (stickySessionMap as Map<any, any>).clear(); } describe('Transient model cooldown functionality', () => { let app: Express; beforeAll(() => { process.env.ENCRYPTION_KEY = '0'.repeat(64); initDb(':memory:'); app = createApp(); }); beforeEach(() => { clearCooldownMap(); clearStickyMap(); const db = getDb(); db.prepare('DELETE FROM api_keys').run(); db.prepare('DELETE FROM requests').run(); }); afterEach(() => { clearCooldownMap(); clearStickyMap(); ======= } from '../../routes/proxy.js'; import request from 'supertest'; function clearCooldownMap() { (transientModelCooldowns as Map<any, any>).clear(); } function clearStickyMap() { (stickySessionMap as Map<any, any>).clear(); } describe('Transient model cooldown functionality', () => { let app: Express; beforeAll(() => { process.env.ENCRYPTION_KEY = '0'.repeat(64); initDb(':memory:'); app = createApp(); }); beforeEach(() => { clearCooldownMap(); clearStickyMap(); const db = getDb(); db.prepare('DELETE FROM api_keys').run(); db.prepare('DELETE FROM requests').run(); }); afterEach(() => { clearCooldownMap(); clearStickyMap(); }); it('applies transient cooldowns end-to-end for chat completions', async () => { const db = getDb(); // Seed an API key and a transient model; align this with your actual schema. const apiKey = 'test-api-key'; db.prepare( ` INSERT INTO api_keys (api_key, label, is_active) VALUES (?, 'test-key', 1) ` ).run(apiKey); // This assumes you have some notion of provider + model tables; adjust to match your schema. // The important part is obtaining a concrete modelDbId used by the router and cooldown map. const providerStmt = db.prepare( ` INSERT INTO providers (name, type, base_url, is_active) VALUES (?, 'test', 'http://localhost:9999', 1) ` ); const providerResult = providerStmt.run('test-provider'); const providerId = providerResult.lastInsertRowid as number; const modelStmt = db.prepare( ` INSERT INTO models (provider_id, name, is_active, is_transient) VALUES (?, ?, 1, 1) ` ); const modelResult = modelStmt.run(providerId, 'test-transient-model'); const modelDbId = modelResult.lastInsertRowid as number; // Mock the underlying provider call so that the first attempt returns a retryable error. // Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.). const originalFetch = (global as any).fetch; const fetchMock = jest.fn() // First call for this model: retryable server error. .mockResolvedValueOnce( new Response(JSON.stringify({ error: 'temporary failure' }), { status: 502, headers: { 'Content-Type': 'application/json' }, }) as any, ) // Second call (if the router mistakenly uses this model again) – would succeed. .mockResolvedValue( new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), { status: 200, headers: { 'Content-Type': 'application/json' }, }) as any, ); (global as any).fetch = fetchMock; try { // First request: should hit the model, get a retryable error, and populate cooldowns. const firstRes = await request(app) .post('/v1/chat/completions') .set('Authorization', `Bearer ${apiKey}`) .send({ model: 'test-transient-model', messages: [{ role: 'user', content: 'hello' }], }); expect(firstRes.status).toBeGreaterThanOrEqual(500); expect(firstRes.status).toBeLessThan(600); expect(transientModelCooldowns.has(modelDbId)).toBe(true); // Capture logs or DB state around routing to assert skipModels includes this model. const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {}); // Second request: router should now treat this model as cooled down and skip it. const secondRes = await request(app) .post('/v1/chat/completions') .set('Authorization', `Bearer ${apiKey}`) .send({ model: 'test-transient-model', messages: [{ role: 'user', content: 'hello again' }], }); // Depending on the presence of fallback models, this may succeed with a different model // or fail with "no available models". The key is that the cooled-down model is skipped. expect(secondRes.status).not.toBe(502); const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n'); // Adjust this assertion to match the actual log/message structure where skipModels is recorded. expect(logsCombined).toContain('skipModels'); expect(logsCombined).toContain(String(modelDbId)); logSpy.mockRestore(); } finally { (global as any).fetch = originalFetch; } >>>>>>> REPLACE </file_operation> </file_operations> <additional_changes> 1. **Align DB seeding with your schema/helpers**: - Replace the `INSERT INTO providers` and `INSERT INTO models` statements with the actual table/column names used by your router. - If you already have helpers (e.g., in `proxy-tools.test.ts`) to create providers/models and return their DB IDs, use those instead of raw SQL here and remove the explicit inserts in this test. 2. **Use the same HTTP client mocking strategy as the rest of your tests**: - If the proxy uses `axios`, a custom HTTP client, or an SDK instead of `fetch`, adjust the mocking accordingly (e.g., `jest.spyOn(httpClient, 'post')`, `nock`, etc.). - Mirror the exact way a provider failure is simulated in your existing proxy tests so that the router records a retryable error and triggers cooldown logic. 3. **Match how `skipModels` is surfaced (logs or DB)**: - If `skipModels` is persisted in the `requests` table, assert against the last row instead of logs, for example: - `const lastRequest = db.prepare('SELECT * FROM requests ORDER BY id DESC LIMIT 1').get();` - Inspect a `skip_models` or similar column to confirm it contains `modelDbId`. - If you use a structured logger instead of `console.debug`, spy on that logger instance and adjust the expectation from `logsCombined` to the appropriate logged fields. 4. **Ensure `modelDbId` matches the key used by `transientModelCooldowns`**: - If cooldowns are keyed by something other than the models table PK (e.g., a join table ID or a provider-model mapping ID), update the insert/select logic in the test to obtain the exact ID used by the router and cooldown map.

sourcery-ai · 2026-06-02T23:47:37Z

+  it('emits SSE keep-alive comments during idle periods', async () => {
+    const origFetch = global.fetch;
+    const encoder = new TextEncoder();
+
+    // Mock provider that delays first chunk by 300ms (longer than KEEPALIVE_INTERVAL_MS=100)
+    vi.spyOn(global, 'fetch').mockImplementation(async (url, init) => {
+      const urlStr = typeof url === 'string' ? url : url.toString();
+      if (urlStr.startsWith('http://127.0.0.1') || urlStr.startsWith('http://localhost')) {
+        return origFetch(url, init);
+      }


suggestion (testing): Make heartbeat/stall tests deterministic using fake timers to avoid flakiness

These tests depend on real setTimeout delays and the global streamKeepaliveConfig, which can be flaky under slow CI or load. Within this suite, consider vi.useFakeTimers() and driving behavior via vi.advanceTimersByTime(...) instead of real-time waits, so the heartbeat/stall scenarios run quickly and deterministically.

Suggested implementation:

afterEach(() => { // Restore original config values streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS = origKeepaliveInterval; streamKeepaliveConfig.MAX_STREAM_STALL_MS = origMaxStall; // Ensure we always reset timers and mocks between tests so heartbeat/stall tests // don't depend on real time and stay deterministic. vi.useRealTimers(); vi.restoreAllMocks(); }); it('emits SSE keep-alive comments during idle periods', async () => { // Use fake timers so we can deterministically drive heartbeat/idle behavior // via vi.advanceTimersByTime(...) instead of relying on real setTimeout delays. vi.useFakeTimers(); const origFetch = global.fetch; const encoder = new TextEncoder();

To fully implement the suggestion and make the test deterministic, you should also:

Ensure the mocked provider implementation in this test (and related heartbeat/stall tests) uses setTimeout-based delays rather than await new Promise(...) with real-time sleeps.

Replace any await new Promise(resolve => setTimeout(resolve, X)) or similar real-time waits in this test suite with vi.advanceTimersByTime(X) (or vi.runAllTimers()), so the test progresses via virtual time instead of wall-clock time.

If there are other heartbeat/stall tests in this file, consider applying the same vi.useFakeTimers() pattern to them (either at the individual it(...) level or via a beforeEach in the surrounding describe), and drive their timing assertions using vi.advanceTimersByTime(...).

qodo-code-review · 2026-06-02T23:53:41Z

+        // Set up heartbeat and stall monitor
+        heartbeatInterval = setInterval(() => {
+          const now = Date.now();
+
+          if (now - lastChunkTimestamp > streamKeepaliveConfig.MAX_STREAM_STALL_MS) {
+            // Stall detected — terminate the stream
+            console.warn(`[Proxy] Stream stalled for ${now - lastChunkTimestamp}ms — aborting socket`);
+            streamAborted = true;
+            cleanupStream();
+
+            if (streamStarted) {
+              // Mid-stream stall — write error frame and close
+              const payload = { error: { message: 'Upstream stream stalled', type: 'stream_timeout' } };
+              try {
+                if (responseStreamContext) {
+                  writeResponseStreamEvent(res, {
+                    type: 'response.failed',
+                    response: {
+                      id: responseStreamContext.responseId,
+                      status: 'failed',
+                      error: payload.error,
+                    },
+                  });
+                } else {
+                  res.write(`data: ${JSON.stringify(payload)}\n\n`);
+                  res.write('data: [DONE]\n\n');;
+                }
+                res.end();
+              } catch { /* Socket already gone */ }
+            } else {
+              // Pre-stream stall — no headers sent yet, response is still retryable
+              // Send a 504 so the client gets a proper error signal
+              try {
+                res.status(504).json({
+                  error: { message: `Upstream provider stalled before yielding any data from ${route.displayName}`, type: 'stream_timeout' },
+                });
+              } catch { /* Socket already gone */ }
+            }
+          } else if (streamStarted) {
+            // Write an SSE comment to keep the socket alive across intermediate proxies
+            // Only write after SSE headers have been sent (streamStarted === true)
+            try {
+              res.write(': keep-alive\n\n');
+            } catch {
+              // Client disconnected — clean up
+              cleanupStream();
+            }
+          }
+        }, streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS);
+
+        // Attach client-disconnect listener
+        req.on('close', () => {
+          cleanupStream();
+        });
+
        try {
          const gen = route.provider.streamChatCompletion(
            route.apiKey, normalizedMessages, route.modelId,
            { temperature, max_tokens, top_p, tools, tool_choice, parallel_tool_calls },
          );

          for await (const chunk of gen) {
+            // Update chunk timestamp to reset stall timer
+            lastChunkTimestamp = Date.now();
+
+            // If stall handler already terminated the stream, break out
+            if (streamAborted) break;


1. Stall doesn't stop upstream 🐞 Bug ☼ Reliability

In the streaming path of handleChatCompletion, the stall timer ends the HTTP response but does not abort/cancel the provider stream async iterator, so the for await can remain blocked indefinitely if the upstream stops yielding chunks. This can leak request handlers and reduce server capacity during upstream stalls.

Agent Prompt

## Issue description The stream stall handler sets `streamAborted` and closes the client response, but it does not interrupt the blocked `for await` loop waiting on the upstream async iterator. If the upstream never yields again, the handler remains stuck and never returns. ## Issue Context This occurs in the streaming branch where a `setInterval` detects stalls and closes the response. The `streamAborted` check is only evaluated inside the loop body, which is unreachable when the iterator is stalled. ## Fix Focus Areas - server/src/routes/proxy.ts[1351-1418]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-02T23:53:41Z

+    // Exhaustion protection: if cooldown would block ALL available models,
+    // clear the set and let the request through rather than failing outright.
+    if (activeCooldownModels.size > 0) {
+      const db = getDb();
+      const allEnabled = db.prepare('SELECT id FROM models WHERE enabled = 1').all() as Array<{ id: number }>;
+      const allEnabledIds = new Set(allEnabled.map(m => m.id));
+
+      // Remove the current preferred model from cooldown consideration
+      if (preferredModel !== undefined) {
+        activeCooldownModels.delete(preferredModel);
+      }
+
+      // Check if cooldown would exhaust all models
+      let wouldExhaustAll = true;
+      for (const id of allEnabledIds) {
+        if (!activeCooldownModels.has(id)) {
+          wouldExhaustAll = false;
+          break;
+        }


2. Cooldown guard checks wrong set 🐞 Bug ≡ Correctness

The thread-cooldown exhaustion protection compares cooldown exclusions against `models WHERE enabled=1 rather than the actual routable fallback chain used by routeRequest`. If enabled models exist outside fallback_config, the guard may fail to trigger even when cooldown exclusions exhaust all routable options, leading to avoidable routing failures.

Agent Prompt

## Issue description The cooldown “exhaustion protection” determines whether to clear cooldown exclusions by querying all enabled model IDs from `models`. Routing, however, only considers models in `fallback_config` (and additional mode-based filtering), so the exhaustion check can be miscalibrated. ## Issue Context If `models.enabled=1` includes entries not present in `fallback_config` (or otherwise unroutable), `wouldExhaustAll` can be computed as false while the real routing chain is fully excluded, and `routeRequest()` will still throw. ## Fix Focus Areas - server/src/routes/proxy.ts[1258-1287] - server/src/services/router.ts[506-524]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-02T23:53:41Z

+#!/usr/bin/env python3
+"""Replace the streaming block in proxy.ts with Promise.race-based stall detection."""
+
+with open('server/src/routes/proxy.ts', 'r') as f:
+    content = f.read()
+
+{


3. Stray debug scripts committed 🐞 Bug ⚙ Maintainability

The PR adds multiple incomplete local fix/debug artifacts (e.g., do_fix.py, fix.py, fix_streaming.py, new_streaming_block.txt) including hardcoded local paths and unterminated code blocks. These files add maintenance/security risk and should not ship in the repository.

Agent Prompt

## Issue description Several newly added files appear to be local, partially-written helper scripts / scratch buffers (broken syntax, hardcoded absolute paths, unfinished blocks). They should be removed from the PR (and optionally added to `.gitignore` if they’re generated during development). ## Issue Context Examples include root-level Python scripts and a stray text file containing partial TypeScript snippets. ## Fix Focus Areas - do_fix.py[1-7] - fix.py[1-8] - fix_streaming.py[1-21] - new_streaming_block.txt[1-28] - server/write_test.py[1-29] - server/write_tests.py[1-45] - fix{[1-1]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

coderabbitai

Actionable comments posted: 19

🧹 Nitpick comments (18)

.roo/specs/disable-sticky-on-auto/design.md (1)
11-17: 💤 Low value

Add language specifier to code block.

The fenced code block should specify a language identifier for proper syntax highlighting.
📝 Proposed fix
-```
+```typescript
 function stickyOp(messages, routingMode, ...) {
   const key = getSessionKey(messages, routingMode);
   if (!key) return <no-op value>;   // undefined, false, or early return
   ...operate on stickySessionMap using key...
 }
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/disable-sticky-on-auto/design.md around lines 11 - 17, The fenced
code block containing the stickyOp example lacks a language identifier; update
the block fence to include a language (e.g., add "typescript" after the opening
...)" and referencing getSessionKey and stickySessionMap is written as a
language-specified fenced block (for example ```typescript) to enable proper
syntax highlighting.
.roo/specs/generalized-thread-protection/design.md (1)
43-46: 💤 Low value

Add language specifier to code block.

The fenced code block should specify a language identifier for proper syntax highlighting (e.g., bash or shell).
📝 Proposed fix
-```
+```bash
 THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 46, The
fenced code block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" needs a
language identifier for proper syntax highlighting; update the triple-backtick
fence to include a language such as bash or shell (e.g., change tobash)
around the existing THREAD_PROTECTION_PLATFORMS line so the block is rendered
with shell/bash highlighting.
</details>

</blockquote></details>
<details>
<summary>.roo/specs/owl-alpha-longcat-model-routing/design.md (1)</summary><blockquote>

`5-46`: _💤 Low value_

**Add language specifiers to code blocks.**

Multiple fenced code blocks are missing language identifiers, which impacts syntax highlighting and readability. The blocks at lines 5-46, 52-78, 82-93, and 96-105 should specify their content type (e.g., `text` for ASCII diagrams or `typescript` for code pseudocode).



<details>
<summary>📝 Suggested languages for each block</summary>

- Lines 5-46: Use `text` for the ASCII architecture diagram
- Lines 52-78: Use `text` for the smart preference flow diagram  
- Lines 82-93: Use `text` for the sticky cooldown flow diagram
- Lines 96-105: Use `text` for the error handling flow diagram

Example fix for the first block:
```diff
-```
+```text
 ┌─────────────────────────────────────────────────────────────┐
 │                     Client Request                          │
```
</details>


Also applies to: 52-78, 82-93, 96-105

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md around lines 5 - 46,
The fenced code blocks in the design document lack language specifiers; update
each ASCII diagram/code block to include appropriate languages (e.g., use
text for ASCII diagrams and typescript for pseudocode) so syntax
highlighting and readability improve—specifically add text to the top architecture diagram (the Client Request / Balanced Router / Smart Router block referenced alongside routeRequest() in router.ts), add text to the smart
preference and sticky cooldown flow diagrams, and add text or typescript
for the error handling flow/pseudocode near handleChatCompletion() in proxy.ts;
ensure each existing triple-backtick fence is changed to include the chosen
language.
</details>

</blockquote></details>
<details>
<summary>.roo/specs/sse-stream-heartbeat-stall-protection/design.md (1)</summary><blockquote>

`112-121`: _💤 Low value_

**Heartbeat write failure cleanup may cause race condition.**

Lines 115-119 call `cleanupStream()` inside the `catch` block when the heartbeat write fails. However, this runs inside the `setInterval` callback. If the client disconnect also fires `req.on('close')` simultaneously, both paths will call `cleanupStream()`, which is idempotent. This is safe, but consider adding a note that the idempotent design explicitly handles this race.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 112
- 121, The heartbeat write failure catch inside the setInterval that calls
cleanupStream() can race with the req.on('close') handler; make this explicit by
documenting that cleanupStream() is idempotent and designed to be safe when
invoked concurrently from both the setInterval catch and the request 'close'
event. Add a short comment near streamStarted/res.write and the setInterval
callback referencing req.on('close') and stating cleanupStream() is idempotent
and safe to call multiple times to prevent confusion for future readers.
```

</details>

</blockquote></details>
<details>
<summary>.roo/specs/recency-biased-thompson-sampling/design.md (1)</summary><blockquote>

`82-82`: _💤 Low value_

**Clarify whether speed/TTFB weighting is deferred by design.**

The table states that `tok_per_sec` and `avg_ttfb_ms` remain unweighted, with the rationale that "Speed/TTFB are quality metrics that don't typically change suddenly." However, speed degradation can be a leading indicator of provider issues. Consider documenting whether this is a deliberate simplification for the initial implementation or a permanent design choice.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/recency-biased-thompson-sampling/design.md at line 82, Clarify
that leaving tok_per_sec and avg_ttfb_ms unweighted is a deliberate
simplification for the initial implementation: update the design text in
recency-biased-thompson-sampling to state that tok_per_sec and avg_ttfb_ms are
intentionally kept unweighted for rollout simplicity, include a brief rationale
(they're relatively stable but can be early warning signals), and add a short
"Future work" note that weighting or anomaly-triggered adjustments for
tok_per_sec and avg_ttfb_ms may be introduced later with monitoring/thresholds.
```

</details>

</blockquote></details>
<details>
<summary>.roo/specs/transient-model-cooldown/design.md (1)</summary><blockquote>

`33-36`: _💤 Low value_

**Consider adding a maximum Map size limit.**

While the lazy pruning strategy is efficient, the design mentions "Worst case: N entries where N = number of models." In a system with hundreds of models experiencing simultaneous transient failures, this could grow unexpectedly. Consider adding a defensive maximum size check (e.g., prune the oldest entries if size exceeds 100) or document the acceptable upper bound based on expected model count.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/transient-model-cooldown/design.md around lines 33 - 36, The
transientModelCooldowns Map can grow unbounded; add a defensive max-size limit
(e.g., MAX_TRANSIENT_COOLDOWNS = 100) and enforce it wherever entries are added
(reference transientModelCooldowns and TRANSIENT_COOLDOWN_MS), pruning the
oldest entries by expiryTimestamp when size exceeds the limit; update the design
notes to document the chosen upper bound and the pruning behavior (remove lowest
expiryTimestamp entries until size <= MAX_TRANSIENT_COOLDOWNS).
```

</details>

</blockquote></details>
<details>
<summary>.roo/specs/wrapped-error-interception/tasks.md (1)</summary><blockquote>

`63-65`: _⚡ Quick win_

**Missing explicit test task for wrapped error scenarios.**

Task 13 says "run all tests" to "verify no regressions," but there's no task to *create* tests specifically for wrapped error detection (e.g., test that HTTP 200 with `{"error": {...}}` throws ProviderApiError, test that wrapped 429 triggers cooldown). Consider adding a task to create unit tests for the new `isWrappedError()` and `throwWrappedError()` methods.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/wrapped-error-interception/tasks.md around lines 63 - 65, Add an
explicit test creation task that verifies wrapped-error behavior: write unit
tests covering isWrappedError() and throwWrappedError() to assert that an HTTP
200 response with body {"error": {...}} causes throwWrappedError() to throw a
ProviderApiError and that a wrapped 429 response triggers the cooldown path;
locate tests near existing provider/proxy tests and add cases for
isWrappedError() true/false, throwWrappedError() raising correct error type and
metadata (including retry-after/cooldown), and ensure they run in the existing
test suite invoked by "npm test".
```

</details>

</blockquote></details>
<details>
<summary>.roo/specs/wrapped-error-interception/design.md (2)</summary><blockquote>

`46-56`: _💤 Low value_

**Array false-positive acknowledged but not mitigated.**

The `isWrappedError()` check allows arrays (`{"error": ["something"]}`) to pass because `typeof [] === 'object'`. Line 315 (EC-3) acknowledges this results in "Unknown wrapped error" message and states it's "acceptable because array-format errors are extremely rare." Consider whether adding `&& !Array.isArray((body as Record<string, unknown>).error)` would be worth the extra guard, or explicitly document why array errors shouldn't be detected.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/wrapped-error-interception/design.md around lines 46 - 56, The
isWrappedError(body: unknown) function currently treats body.error arrays as
valid because typeof [] === 'object'; update isWrappedError to reject array-form
errors by adding a check like && !Array.isArray((body as Record<string,
unknown>).error) so that {"error": ["..."]} does not pass, or alternatively add
a short comment in the isWrappedError implementation explaining why array-form
errors are intentionally allowed; reference the isWrappedError method and the
(body as Record<string, unknown>).error property when making the change.
```

</details>

---

`318-322`: _⚡ Quick win_

**Mid-stream error behavior may confuse clients.**

EC-5 notes that if an error chunk arrives mid-stream (after valid chunks), the client receives a partial response followed by termination. While the doc states "this is the best possible behavior," it doesn't mention whether the stream should emit a specific error event/frame to signal the truncation, or whether clients might interpret this as a network failure rather than a provider error.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/wrapped-error-interception/design.md around lines 318 - 322, The
documentation misses guidance on signaling mid-stream wrapped errors to clients;
update the EC-5 section to specify that when isWrappedError() causes the stream
generator to throw a ProviderApiError, the implementation should emit a terminal
error frame/event (or standardized SSE "event: error" chunk) before closing so
clients can distinguish provider errors from network failures. Mention the
involved symbols: the stream parser that uses isWrappedError(), the thrown
ProviderApiError from the async generator, and the consumer/retry loop in
proxy.ts; state that the generator should attempt to push a final error
chunk/event describing the wrapped error (including code/message) immediately
prior to terminating the stream so clients can detect provider-originated
truncation rather than a transport error.
```

</details>

</blockquote></details>
<details>
<summary>.roo/specs/transient-model-cooldown/requirements.md (3)</summary><blockquote>

`25-25`: _⚡ Quick win_

**Consider guidance on observability for "all models exhausted" scenarios.**

R-6 accepts the existing behavior when all models are on cooldown, but doesn't mention whether such scenarios should be logged, metered, or alerted. Since this represents a complete service degradation, consider adding a note recommending instrumentation to track when the proxy enters this state.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/transient-model-cooldown/requirements.md at line 25, Add a short
observability note to R-6 recommending that the system emits logs, metrics, and
alerts when routeRequest() returns the "All models exhausted" outcome: specify
logging a structured error event with context (timestamp, caller, model pool
state), incrementing a meter/counter (e.g., all_models_exhausted_total) and
recording a tagged gauge or histogram for cooldown durations, and suggest
creating an alerting rule on sustained or high-rate occurrences so operators are
notified of complete service degradation.
```

</details>

---

`29-29`: _💤 Low value_

**Memory bounds and concurrent registration not addressed.**

The "Out of scope" section doesn't mention race conditions when multiple concurrent requests register cooldowns for the same model, or memory bounds if the cooldown Map grows unbounded with many failing models. While these may be acceptable omissions for an initial implementation, consider noting them as future considerations.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/transient-model-cooldown/requirements.md at line 29, Update the
requirements.md "Out of scope" section to explicitly call out two future
concerns: (1) race conditions when concurrent requests attempt to register a
cooldown for the same model (referencing the module-level Map used in proxy.ts
and the cooldown registration on 5xx/connection failures and sticky session
override logic that writes to it), and (2) unbounded memory growth of the
cooldown Map (the Map that tracks skipModels/cooldowns) and the need for
retention/eviction or size limits; mention these as future considerations to be
addressed if this feature is expanded.
```

</details>

---

`10-10`: _⚡ Quick win_

**Clarify "non-auth" error exclusion criteria.**

R-1 mentions excluding "non-auth, retryable errors" but doesn't explicitly define which status codes qualify as auth errors. While `5xx` is clearly specified as the trigger, consider adding a note clarifying that `401` and `403` (or similar auth-related errors) should *not* trigger cooldowns, to prevent ambiguity during implementation.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/transient-model-cooldown/requirements.md at line 10, Clarify R-1
by explicitly stating which HTTP status codes are considered auth errors and
must be excluded from cooldowns: treat 401 and 403 as
authentication/authorization errors that must NOT trigger the transient model
cooldown; continue to treat 5xx and connection timeouts as retryable failures
that DO trigger cooldowns. Update the "transient-model-cooldown" requirement
text to add a sentence like "Authentication/authorization errors (e.g., HTTP 401
and 403) are excluded and should not cause a model to be placed into the
cooldown set," so implementers of the shared in-memory disabled-model collection
and logic in the proxy (the component described in R-1) know to check and skip
401/403 before adding model IDs.
```

</details>

</blockquote></details>
<details>
<summary>.roo/specs/wrapped-error-interception/requirements.md (1)</summary><blockquote>

`49-49`: _💤 Low value_

**"errors" array format excluded but used by some providers.**

The out-of-scope section (line 49) excludes detecting `errors` array format, but some providers (e.g., Google) use `{"error": {"errors": [...]}}` where the `errors` key is nested inside the root `error` object. The current design handles this via `extractErrorMessage()`, but it may be worth noting that nested `errors` arrays *are* handled, only top-level `errors` keys are excluded.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/wrapped-error-interception/requirements.md at line 49, Clarify
the out-of-scope note to state that top-level "errors" array formats are
intentionally excluded, but nested "errors" arrays (e.g., {"error": {"errors":
[...]}}) are handled by the existing extractErrorMessage() logic; update the
sentence around the out-of-scope wording to reference extractErrorMessage() and
explicitly note that nested errors arrays inside the root "error" object are
supported while only top-level "errors" keys remain out-of-scope.
```

</details>

</blockquote></details>
<details>
<summary>.roo/specs/transient-model-cooldown/tasks.md (1)</summary><blockquote>

`5-10`: _💤 Low value_

**Line number references may become stale.**

Tasks T-3 through T-6 reference specific line numbers in `proxy.ts` (e.g., line 1179, 1245, 1570, 1392). These are helpful for initial implementation but will become outdated if the code changes before these tasks are completed. Consider adding landmark comments or function names alongside line numbers to improve resilience.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/transient-model-cooldown/tasks.md around lines 5 - 10, Declare
and export a module-level Map named transientModelCooldowns and constant
TRANSIENT_COOLDOWN_MS in proxy.ts, then in handleChatCompletion() (after
skipModels init and before the retry loop) iterate transientModelCooldowns to
prune expired entries and add active cooldown modelDbIds to skipModels;
immediately after the existing session-ban platform check clear preferredModel
and preferredKeyId if preferredModel is on cooldown; in the retry-loop catch
block (where errStatus is checked) when errStatus is 5xx or undefined and
isRetryableError(err) is true set transientModelCooldowns.set(route.modelDbId,
Date.now() + TRANSIENT_COOLDOWN_MS) and add to local skipModels; likewise in the
streaming error handler when streamErrStatus is ban-eligible set the same
cooldown and add to skipModels; add short landmark comments (e.g., "// COOLDOWN:
prune/inject", "// COOLDOWN: sticky override", "// COOLDOWN: register on retry
error", "// COOLDOWN: register on stream error") next to these locations so
future edits can find them even if line numbers shift.
```

</details>

</blockquote></details>
<details>
<summary>new_streaming_block.txt (1)</summary><blockquote>

`1-28`: _💤 Low value_

**Consider removing this file from the PR.**

This appears to be a code snippet or development artifact rather than production code or documentation. The incomplete syntax at line 28 suggests it's a work-in-progress fragment. If it's meant as documentation, consider moving it to a proper markdown file with context, or removing it if it's no longer needed.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@new_streaming_block.txt` around lines 1 - 28, This file appears to be a WIP
fragment with incomplete syntax (unclosed block at the stallTimeout promise) and
should be removed from the PR or moved into proper documentation; either delete
new_streaming_block.txt from the commit, or convert it to a markdown doc with
context and a complete, valid code example. If you intend to keep code, ensure
the stream-related helper symbols (cleanupStream, stallTimeout, streamStarted,
heartbeatInterval) are placed in a proper source file with all braces and
promise handlers closed and tests/typing validated before committing.
```

</details>

</blockquote></details>
<details>
<summary>server/src/routes/proxy.ts (2)</summary><blockquote>

`1699-1720`: _⚡ Quick win_

**Consider registering transient cooldown for non-stream 5xx errors.**

Mid-stream 5xx errors register a transient cooldown (line 1547), but non-stream 5xx errors do not. This inconsistency means a model that fails with a 5xx error before streaming won't be temporarily excluded from other concurrent requests. Since 5xx errors often indicate transient infrastructure issues, adding a cooldown here would improve resilience.



<details>
<summary>🔄 Proposed enhancement</summary>

Add transient cooldown registration after line 1719:

```diff
            preferredKeyId = undefined;
          }
        }
+       // Register global transient cooldown for this 5xx error
+       transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);
+       console.log(`[TransientCooldown] registered global cooldown for modelDbId=${route.modelDbId} (${TRANSIENT_COOLDOWN_MS / 1000}s)`);
      }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1699 - 1720, The 5xx non-stream
branch should also register the same transient cooldown used for mid-stream
5xxs; after the evaluateThreadProtection block (the one that may call
banPlatformFromSession, addProviderModelsToSkipModels, or skipModels.add) add a
call to the transient-cooldown registration used around the mid-stream handling
(the same function invoked at the mid-stream 5xx handling near line ~1547) with
the same parameters (normalizedMessages, routingMode, route.platform,
route.modelDbId) so the failing provider/model is temporarily excluded from
other concurrent requests.
```

</details>

---

`1596-1634`: _⚡ Quick win_

**Consider registering transient cooldown for mid-stream retryable errors.**

Non-stream retryable errors register a transient cooldown (lines 1724-1725), but mid-stream retryable errors do not. This inconsistency means a model that fails mid-stream with a retryable error won't be temporarily excluded from other concurrent requests, potentially causing cascading failures.



<details>
<summary>🔄 Proposed enhancement</summary>

Add transient cooldown registration after line 1607:

```diff
              if (action.skipModel) {
                console.warn(`[Proxy] Mid-stream retryable error from ${route.platform} — skipping model ${route.modelId} (${action.reason})`);
                skipModels.add(route.modelDbId);
              }
+             // Register global transient cooldown to prevent concurrent requests from hitting the same failing model
+             transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);
+             console.log(`[TransientCooldown] registered global cooldown for modelDbId=${route.modelDbId} (${TRANSIENT_COOLDOWN_MS / 1000}s)`);
              if (action.clearStickyIfPinned && preferredModel) {
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1596 - 1634, The mid-stream
retryable error branch should also register the same transient cooldown as the
non-stream retryable branch to temporarily exclude the failing model/provider
from other concurrent requests; locate the transient-cooldown registration call
used in the non-stream retryable error handling (the call that registers a
cooldown for route.platform, route.modelDbId and routingMode) and add that exact
call into this mid-stream branch (inside the isRetryableStreamError block, after
evaluateThreadProtection and before writing/ending the response), so that
banPlatformFromSession/addProviderModelsToSkipModels/skipModels/preferredModel
logic is followed by the transient cooldown registration and then the existing
response/cleanup and logRequest call.
```

</details>

</blockquote></details>
<details>
<summary>server/src/__tests__/routes/transient-cooldown.test.ts (1)</summary><blockquote>

`78-178`: _⚖️ Poor tradeoff_

**Tests re-implement production logic instead of exercising it.**

The pruning/injection loop here (and similarly the sticky-override block at Lines 241-247 and the 5xx eligibility condition at Lines 321, 328, etc.) is a hand-copied duplicate of the logic in `proxy.ts`. These tests validate the copy, so a regression in the actual `proxy.ts` implementation would not be caught. Where feasible, drive the assertions through an exported helper (e.g. the cooldown-injection/skipModels routine) rather than reproducing its body inline.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/__tests__/routes/transient-cooldown.test.ts` around lines 78 -
178, Tests duplicate the production pruning/injection loop instead of exercising
the implementation; replace the inline loop over transientModelCooldowns with
calls to the actual exported helper in proxy.ts (e.g. export and use a function
like injectTransientCooldowns or buildSkipModels that accepts
transientModelCooldowns and returns/updates skipModels), then assert on its
return/state rather than re-implementing the logic; similarly refactor the other
duplicated blocks (the sticky-override routine and the 5xx eligibility checks)
to call their respective exported helpers from proxy.ts so tests validate real
code paths (refer to transientModelCooldowns, skipModels, and proxy.ts helper
names when making the changes).
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/generalized-thread-protection/requirements.md:

Around line 1-5: The requirements.md file is truncated mid-sentence; finish
the document by completing the Problem Statement (referencing the proxy route
handler in server/src/routes/proxy.ts and the existing special-casing of
"longcat"), then add a clear numbered List of Requirements with measurable
Acceptance Criteria (e.g., remove hardcoded branches, implement configurable
rules, pass unit/integration tests, performance and security checks), an Out of
Scope section stating what will not be changed (e.g., unrelated routes, legacy
clients), and a Dependencies section listing impacted components
(server/src/routes/proxy.ts, any config files, tests, and deployment steps);
ensure each requirement includes an acceptance test or success metric and that
terminology matches identifiers like "longcat" and "proxy route handler" so
reviewers can map requirements to code.

In @.roo/specs/recency-biased-thompson-sampling/requirements.md:

Line 15: Replace the hardcoded divisor 7.0 in the recency weight formula with
a SQL parameter placeholder (?) so the formula reads MIN(1.0, MAX(0.0, 1.0 -
(julianday('now') - julianday(created_at)) / ?)); ensure you add a short note
stating that ? is ANALYTICS_WINDOW_DAYS (derived from ANALYTICS_WINDOW_MS) to
match the design and parameterization described in the document.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md:

Around line 243-278: The stall handler currently throws an error inside the
setInterval callback which won’t propagate to the outer try/catch; instead, in
the interval callback set streamAborted = true and call cleanupStream(), and if
streamStarted write the error frame and res.end() but do NOT throw; after the
for-await loop (where the stream is consumed) run cleanupStream() and then if
(streamAborted && !streamStarted) throw the retryable error (Object.assign(new
Error(...), { status: 504 })) so the outer retry/catch sees it; use the existing
symbols streamAborted, streamStarted, cleanupStream(), writeResponseStreamEvent,
res.end and the post-loop logic to implement this control flow change.

In @.roo/specs/transient-model-cooldown/tasks.md:

Line 9: The retry-loop catch currently sets transientModelCooldowns for any
isRetryableError(err) (which includes 429); update the catch in
server/src/routes/proxy.ts so the global
transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS)
and adding to local skipModels only happens when isRetryableError(err) is true
AND errStatus is either undefined (connection failure) or a 5xx status (e.g.
errStatus >=500 && errStatus <600); ensure 429 (status 429) does not trigger the
cooldown by guarding with that errStatus check and keep using
TRANSIENT_COOLDOWN_MS and route.modelDbId as the identifiers.

In @.roo/specs/wrapped-error-interception/design.md:

Around line 72-75: The current assignment to error.status converts
errPayload.code with Number(...) without validation, which can produce NaN for
non-numeric codes; update the logic where error.status is set (the block
referencing errPayload and error.status) to parse the code into a numeric value,
check isNaN on that parsed result, and use a safe fallback (e.g., 200) when
parsing fails or the value is not a finite number, ensuring error.status is
always a valid number.

In @.roo/specs/wrapped-error-interception/requirements.md:

Line 32: Update FR-5 to explicitly handle non-numeric error.code values by
stating that only numeric codes will be used as the ProviderApiError.status and
all other values (including string codes like "INVALID_ARGUMENT", null,
undefined, or Number-like strings that parse to NaN) must be ignored and result
in the default status of 200; reference the error.code key, the
ProviderApiError status field, and clarify that implementations should
validate via a numeric check (e.g., finite integer) before assigning status.

Line 33: Update the requirement FR-6 to match the design in EC-5 by changing
its scope from "first SSE chunk" to "any SSE chunk": state that if any SSE chunk
contains a root-level error field (whether first or mid-stream) the stream
must be aborted and a ProviderApiError thrown, and add a short note
referencing EC-5 behavior for consistency with the mid-stream error handling
described there.

In @.roo/specs/wrapped-error-interception/tasks.md:

Line 15: When setting error.status from errPayload.code (the code path that
currently uses Number(errPayload.code)), validate the numeric conversion before
assigning: convert the code to a Number, check whether that conversion is a
valid number (not NaN), and only then use it; otherwise fall back to 200. Update
the assignment that sets status (the logic referencing errPayload and its code
property) to perform this conversion-and-check so non-numeric provider codes
like "RATE_LIMIT" do not produce NaN.

In @do_fix.py:

Around line 1-7: Delete the accidental incomplete one-off script file
do_fix.py (and also remove sibling artifacts fix.py and fix_streaming.py) from
the PR—these are non-parsable codegen artifacts (unexpected EOF in do_fix.py)
and unrelated to the feature; ensure you remove them from the commit or move
them outside the repo so only relevant source files (e.g.,
server/src/routes/proxy.ts) remain in the changeset.

In @server/src/__tests__/services/router.test.ts:

Around line 29-33: The INSERT into api_keys in the db.prepare(...) call has a
duplicate leftover "VALUES (?" line making the SQL invalid; replace the two
VALUES clauses with a single VALUES clause containing seven placeholders
matching the seven listed columns and ensure the .run(...) call on that same
statement supplies exactly seven parameters (encrypted, iv, authTag, 'healthy',
1 plus the platform and label) so db.prepare(...).run(...) is consistent with
the api_keys columns.

In @server/src/providers/base.ts:

Around line 142-145: The assignment to error.status can become NaN when
errPayload.code is a non-numeric string; update the logic around the
error.status assignment in base.ts (where errPayload and error.status are set)
to parse the code only if it yields a finite number (e.g., use Number(...) then
check Number.isFinite or isFinite) and otherwise set error.status to 200; ensure
you still detect the presence of 'code' in errPayload before attempting
conversion and fall back to 200 for non-finite or missing values so downstream
status checks remain reliable.

In @server/src/providers/cloudflare.ts:

Around line 123-131: The current try/catch around JSON.parse in the streaming
path swallows errors thrown by throwWrappedError(parsed); move the wrapped-error
detection outside the parse-specific catch so only JSON parse failures are
ignored: attempt JSON.parse(data) inside a narrow try/catch that on failure
skips the chunk, but after a successful parse call this.isWrappedError(parsed)
and, if true, call this.throwWrappedError(parsed) (so the thrown error bubbles
up), otherwise yield parsed; adjust the logic in the same generator where
ChatCompletionChunk is parsed to mirror the pattern used in google.ts.

In @server/src/providers/cohere.ts:

Around line 114-122: The current try/catch swallows errors thrown by
throwWrappedError(parsed); refactor the block so JSON.parse is wrapped in its
own try/catch that only skips malformed chunks, then after a successful parse
call this.isWrappedError(parsed) and if true call
this.throwWrappedError(parsed); otherwise yield parsed — locate and update the
code around isWrappedError, throwWrappedError and the parsed handling to mirror
the parse-only try/catch pattern used in google.ts.

In @server/src/providers/openai-compat.ts:

Around line 130-138: The current try/catch around JSON.parse in
openai-compat.ts swallows exceptions thrown by throwWrappedError(parsed); change
the flow so parsing is done inside a small try/catch that only handles
JSON.parse errors (skip malformed chunks), then after a successful parse call
this.isWrappedError(parsed) and, if true, call this.throwWrappedError(parsed)
outside the parse-only try block so the wrapped-error is not caught and
discarded; update the block containing JSON.parse, ChatCompletionChunk,
isWrappedError, throwWrappedError, and the yield parsed accordingly.

In @server/src/routes/proxy.ts:

Line 1376: The double semicolon after the res.write call is extraneous; in the
function handling the streaming response where res.write('data: [DONE]\n\n');;
appears, remove the extra semicolon so the line is a single statement
(res.write('data: [DONE]\n\n');). This will clean up the syntax without changing
behavior—locate the res.write invocation in the proxy route handler and delete
the trailing duplicate semicolon.

Around line 1546-1548: The comment above the transient cooldown registration
is misleading because the block only runs for ban-eligible 5xx statuses; update
the comment near transientModelCooldowns.set(route.modelDbId, ...) to accurately
state that this registers a global transient cooldown only for ban-eligible 5xx
mid-stream errors (i.e., when streamErrStatus passes isBanEligibleStatus),
referencing TRANSIENT_COOLDOWN_MS and route.modelDbId to keep context; do not
change logic, only correct the comment text to reflect the actual condition.

In @server/src/services/router.ts:

Around line 183-187: The SQL has mismatched parentheses in the
recency-weighted aggregates: in the SELECT expressions for "total" and
"successes" remove the extra closing parenthesis so MAX(0, MIN(1.0, 1.0 -
(julianday('now') - julianday(created_at)) / 7.0)) is balanced; update the
SUM(...) and SUM(CASE ... THEN ... ELSE 0 END) calls accordingly (symbols to
locate: raw_total, total, successes, created_at, julianday, MAX, MIN, SUM, CASE)
so db.prepare() in refreshStatsCache() will no longer throw.

In @server/write_test.py:

Line 29: The file contains an incomplete/truncated string literal "expect(()
=> route{ in write_test.py which causes a syntax error; fix by either completing
the test generation logic (close the string and finish the test expression) or
remove the development artifact line; locate the occurrence of the truncated
literal in write_test.py (search for expect(() => route{) and either complete
the test statement in the test function that builds route assertions or delete
the line if the test is not needed.

In @server/write_tests.py:

Around line 1-45: The file write_tests.py is a throwaway generator with a
hardcoded absolute path (path variable) and an unterminated triple-quoted string
(part1) causing syntax errors; remove this file (and related scratch scripts
like write_test.py/do_fix.py/fix*.py) from the repository, add them to
.gitignore, and instead commit the intended test file directly (the test content
currently in the part1 string should be placed into
server/src/tests/services/router.test.ts as a regular file); ensure no
references to the removed generator remain in the repo or CI.

Nitpick comments:
In @.roo/specs/disable-sticky-on-auto/design.md:

Around line 11-17: The fenced code block containing the stickyOp example lacks
a language identifier; update the block fence to include a language (e.g., add
"typescript" after the opening ```), so the snippet beginning with "function
stickyOp(messages, routingMode, ...)" and referencing getSessionKey and
stickySessionMap is written as a language-specified fenced block (for example
In @.roo/specs/generalized-thread-protection/design.md:
- Around line 43-46: The fenced code block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" needs a
language identifier for proper syntax highlighting; update the triple-backtick
fence to include a language such as bash or shell (e.g., change ``` to ```bash)
around the existing THREAD_PROTECTION_PLATFORMS line so the block is rendered
with shell/bash highlighting.

In @.roo/specs/owl-alpha-longcat-model-routing/design.md:
- Around line 5-46: The fenced code blocks in the design document lack language
specifiers; update each ASCII diagram/code block to include appropriate
languages (e.g., use ```text for ASCII diagrams and ```typescript for
pseudocode) so syntax highlighting and readability improve—specifically add
```text to the top architecture diagram (the Client Request / Balanced Router /
Smart Router block referenced alongside routeRequest() in router.ts), add
```text to the smart preference and sticky cooldown flow diagrams, and add
```text or ```typescript for the error handling flow/pseudocode near
handleChatCompletion() in proxy.ts; ensure each existing triple-backtick fence
is changed to include the chosen language.

In @.roo/specs/recency-biased-thompson-sampling/design.md:
- Line 82: Clarify that leaving tok_per_sec and avg_ttfb_ms unweighted is a
deliberate simplification for the initial implementation: update the design text
in recency-biased-thompson-sampling to state that tok_per_sec and avg_ttfb_ms
are intentionally kept unweighted for rollout simplicity, include a brief
rationale (they're relatively stable but can be early warning signals), and add
a short "Future work" note that weighting or anomaly-triggered adjustments for
tok_per_sec and avg_ttfb_ms may be introduced later with monitoring/thresholds.

In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md:
- Around line 112-121: The heartbeat write failure catch inside the setInterval
that calls cleanupStream() can race with the req.on('close') handler; make this
explicit by documenting that cleanupStream() is idempotent and designed to be
safe when invoked concurrently from both the setInterval catch and the request
'close' event. Add a short comment near streamStarted/res.write and the
setInterval callback referencing req.on('close') and stating cleanupStream() is
idempotent and safe to call multiple times to prevent confusion for future
readers.

In @.roo/specs/transient-model-cooldown/design.md:
- Around line 33-36: The transientModelCooldowns Map can grow unbounded; add a
defensive max-size limit (e.g., MAX_TRANSIENT_COOLDOWNS = 100) and enforce it
wherever entries are added (reference transientModelCooldowns and
TRANSIENT_COOLDOWN_MS), pruning the oldest entries by expiryTimestamp when size
exceeds the limit; update the design notes to document the chosen upper bound
and the pruning behavior (remove lowest expiryTimestamp entries until size <=
MAX_TRANSIENT_COOLDOWNS).

In @.roo/specs/transient-model-cooldown/requirements.md:
- Line 25: Add a short observability note to R-6 recommending that the system
emits logs, metrics, and alerts when routeRequest() returns the "All models
exhausted" outcome: specify logging a structured error event with context
(timestamp, caller, model pool state), incrementing a meter/counter (e.g.,
all_models_exhausted_total) and recording a tagged gauge or histogram for
cooldown durations, and suggest creating an alerting rule on sustained or
high-rate occurrences so operators are notified of complete service degradation.
- Line 29: Update the requirements.md "Out of scope" section to explicitly call
out two future concerns: (1) race conditions when concurrent requests attempt to
register a cooldown for the same model (referencing the module-level Map used in
proxy.ts and the cooldown registration on 5xx/connection failures and sticky
session override logic that writes to it), and (2) unbounded memory growth of
the cooldown Map (the Map that tracks skipModels/cooldowns) and the need for
retention/eviction or size limits; mention these as future considerations to be
addressed if this feature is expanded.
- Line 10: Clarify R-1 by explicitly stating which HTTP status codes are
considered auth errors and must be excluded from cooldowns: treat 401 and 403 as
authentication/authorization errors that must NOT trigger the transient model
cooldown; continue to treat 5xx and connection timeouts as retryable failures
that DO trigger cooldowns. Update the "transient-model-cooldown" requirement
text to add a sentence like "Authentication/authorization errors (e.g., HTTP 401
and 403) are excluded and should not cause a model to be placed into the
cooldown set," so implementers of the shared in-memory disabled-model collection
and logic in the proxy (the component described in R-1) know to check and skip
401/403 before adding model IDs.

In @.roo/specs/transient-model-cooldown/tasks.md:
- Around line 5-10: Declare and export a module-level Map named
transientModelCooldowns and constant TRANSIENT_COOLDOWN_MS in proxy.ts, then in
handleChatCompletion() (after skipModels init and before the retry loop) iterate
transientModelCooldowns to prune expired entries and add active cooldown
modelDbIds to skipModels; immediately after the existing session-ban platform
check clear preferredModel and preferredKeyId if preferredModel is on cooldown;
in the retry-loop catch block (where errStatus is checked) when errStatus is 5xx
or undefined and isRetryableError(err) is true set
transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS)
and add to local skipModels; likewise in the streaming error handler when
streamErrStatus is ban-eligible set the same cooldown and add to skipModels; add
short landmark comments (e.g., "// COOLDOWN: prune/inject", "// COOLDOWN: sticky
override", "// COOLDOWN: register on retry error", "// COOLDOWN: register on
stream error") next to these locations so future edits can find them even if
line numbers shift.

In @.roo/specs/wrapped-error-interception/design.md:
- Around line 46-56: The isWrappedError(body: unknown) function currently treats
body.error arrays as valid because typeof [] === 'object'; update isWrappedError
to reject array-form errors by adding a check like && !Array.isArray((body as
Record<string, unknown>).error) so that {"error": ["..."]} does not pass, or
alternatively add a short comment in the isWrappedError implementation
explaining why array-form errors are intentionally allowed; reference the
isWrappedError method and the (body as Record<string, unknown>).error property
when making the change.
- Around line 318-322: The documentation misses guidance on signaling mid-stream
wrapped errors to clients; update the EC-5 section to specify that when
isWrappedError() causes the stream generator to throw a ProviderApiError, the
implementation should emit a terminal error frame/event (or standardized SSE
"event: error" chunk) before closing so clients can distinguish provider errors
from network failures. Mention the involved symbols: the stream parser that uses
isWrappedError(), the thrown ProviderApiError from the async generator, and the
consumer/retry loop in proxy.ts; state that the generator should attempt to push
a final error chunk/event describing the wrapped error (including code/message)
immediately prior to terminating the stream so clients can detect
provider-originated truncation rather than a transport error.

In @.roo/specs/wrapped-error-interception/requirements.md:
- Line 49: Clarify the out-of-scope note to state that top-level "errors" array
formats are intentionally excluded, but nested "errors" arrays (e.g., {"error":
{"errors": [...]}}) are handled by the existing extractErrorMessage() logic;
update the sentence around the out-of-scope wording to reference
extractErrorMessage() and explicitly note that nested errors arrays inside the
root "error" object are supported while only top-level "errors" keys remain
out-of-scope.

In @.roo/specs/wrapped-error-interception/tasks.md:
- Around line 63-65: Add an explicit test creation task that verifies
wrapped-error behavior: write unit tests covering isWrappedError() and
throwWrappedError() to assert that an HTTP 200 response with body {"error":
{...}} causes throwWrappedError() to throw a ProviderApiError and that a wrapped
429 response triggers the cooldown path; locate tests near existing
provider/proxy tests and add cases for isWrappedError() true/false,
throwWrappedError() raising correct error type and metadata (including
retry-after/cooldown), and ensure they run in the existing test suite invoked by
"npm test".

In `@new_streaming_block.txt`:
- Around line 1-28: This file appears to be a WIP fragment with incomplete
syntax (unclosed block at the stallTimeout promise) and should be removed from
the PR or moved into proper documentation; either delete new_streaming_block.txt
from the commit, or convert it to a markdown doc with context and a complete,
valid code example. If you intend to keep code, ensure the stream-related helper
symbols (cleanupStream, stallTimeout, streamStarted, heartbeatInterval) are
placed in a proper source file with all braces and promise handlers closed and
tests/typing validated before committing.

In `@server/src/__tests__/routes/transient-cooldown.test.ts`:
- Around line 78-178: Tests duplicate the production pruning/injection loop
instead of exercising the implementation; replace the inline loop over
transientModelCooldowns with calls to the actual exported helper in proxy.ts
(e.g. export and use a function like injectTransientCooldowns or buildSkipModels
that accepts transientModelCooldowns and returns/updates skipModels), then
assert on its return/state rather than re-implementing the logic; similarly
refactor the other duplicated blocks (the sticky-override routine and the 5xx
eligibility checks) to call their respective exported helpers from proxy.ts so
tests validate real code paths (refer to transientModelCooldowns, skipModels,
and proxy.ts helper names when making the changes).

In `@server/src/routes/proxy.ts`:
- Around line 1699-1720: The 5xx non-stream branch should also register the same
transient cooldown used for mid-stream 5xxs; after the evaluateThreadProtection
block (the one that may call banPlatformFromSession,
addProviderModelsToSkipModels, or skipModels.add) add a call to the
transient-cooldown registration used around the mid-stream handling (the same
function invoked at the mid-stream 5xx handling near line ~1547) with the same
parameters (normalizedMessages, routingMode, route.platform, route.modelDbId) so
the failing provider/model is temporarily excluded from other concurrent
requests.
- Around line 1596-1634: The mid-stream retryable error branch should also
register the same transient cooldown as the non-stream retryable branch to
temporarily exclude the failing model/provider from other concurrent requests;
locate the transient-cooldown registration call used in the non-stream retryable
error handling (the call that registers a cooldown for route.platform,
route.modelDbId and routingMode) and add that exact call into this mid-stream
branch (inside the isRetryableStreamError block, after evaluateThreadProtection
and before writing/ending the response), so that
banPlatformFromSession/addProviderModelsToSkipModels/skipModels/preferredModel
logic is followed by the transient cooldown registration and then the existing
response/cleanup and logRequest call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)

Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 64ee865b-4c52-4b11-b495-98a8ad28943f

📥 Commits

Reviewing files that changed from the base of the PR and between 233e031 and 24c1c80.

📒 Files selected for processing (43)

.roo/specs/disable-sticky-on-auto/design.md

.roo/specs/disable-sticky-on-auto/requirements.md

.roo/specs/disable-sticky-on-auto/tasks.md

.roo/specs/generalized-thread-protection/design.md

.roo/specs/generalized-thread-protection/requirements.md

.roo/specs/generalized-thread-protection/tasks.md

.roo/specs/owl-alpha-longcat-model-routing/design.md

.roo/specs/owl-alpha-longcat-model-routing/requirements.md

.roo/specs/owl-alpha-longcat-model-routing/tasks.md

.roo/specs/recency-biased-thompson-sampling/design.md

.roo/specs/recency-biased-thompson-sampling/requirements.md

.roo/specs/recency-biased-thompson-sampling/tasks.md

.roo/specs/sse-stream-heartbeat-stall-protection/design.md

.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md

.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md

.roo/specs/transient-model-cooldown/design.md

.roo/specs/transient-model-cooldown/requirements.md

.roo/specs/transient-model-cooldown/tasks.md

.roo/specs/wrapped-error-interception/design.md

.roo/specs/wrapped-error-interception/requirements.md

.roo/specs/wrapped-error-interception/tasks.md

client/src/pages/FallbackPage.tsx

do_fix.py

fix.py

fix_streaming.py

fix{

new_streaming_block.txt

server/src/__tests__/routes/fallback.test.ts

server/src/__tests__/routes/provider-session-ban.test.ts

server/src/__tests__/routes/proxy-tools.test.ts

server/src/__tests__/routes/stream-heartbeat-stall.test.ts

server/src/__tests__/routes/transient-cooldown.test.ts

server/src/__tests__/services/router.test.ts

server/src/providers/base.ts

server/src/providers/cloudflare.ts

server/src/providers/cohere.ts

server/src/providers/google.ts

server/src/providers/openai-compat.ts

server/src/routes/proxy.ts

server/src/services/router.ts

server/src/services/threadProtection.ts

server/write_test.py

server/write_tests.py

coderabbitai · 2026-06-02T23:58:36Z

+# Requirements: Generalized Thread Protection Scanner
+
+## Problem Statement
+
+The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Requirements document is incomplete.

The file appears truncated at line 5, cutting off mid-sentence. The requirements section is missing, which is critical for understanding the feature scope and acceptance criteria.

The file ends with:

The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{

This looks like an incomplete save or file corruption. Please complete the requirements document with:

Problem statement (appears to be started)

List of requirements with acceptance criteria

Out of scope items

Dependencies (if any)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/generalized-thread-protection/requirements.md around lines 1 - 5, The requirements.md file is truncated mid-sentence; finish the document by completing the Problem Statement (referencing the proxy route handler in server/src/routes/proxy.ts and the existing special-casing of "longcat"), then add a clear numbered List of Requirements with measurable Acceptance Criteria (e.g., remove hardcoded branches, implement configurable rules, pass unit/integration tests, performance and security checks), an Out of Scope section stating what will not be changed (e.g., unrelated routes, legacy clients), and a Dependencies section listing impacted components (server/src/routes/proxy.ts, any config files, tests, and deployment steps); ensure each requirement includes an acceptance test or success metric and that terminology matches identifiers like "longcat" and "proxy route handler" so reviewers can map requirements to code.

coderabbitai · 2026-06-02T23:58:36Z

+
+The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.
+
+**Formula**: `MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))`


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistency: Hardcoded 7.0 conflicts with design parameterization.

This requirement shows the formula with a hardcoded 7.0 divisor, but the design document (line 81) explicitly states that the window days should be "passed as SQL parameter ?" to avoid hardcoding and keep it coupled with ANALYTICS_WINDOW_MS. Update this line to use ? instead of 7.0 to match the design.

📝 Proposed fix

-**Formula**: `MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))` +**Formula**: `MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / ?))`

And add a note: "where ? is ANALYTICS_WINDOW_DAYS derived from ANALYTICS_WINDOW_MS."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/recency-biased-thompson-sampling/requirements.md at line 15, Replace the hardcoded divisor 7.0 in the recency weight formula with a SQL parameter placeholder (?) so the formula reads MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / ?)); ensure you add a short note stating that ? is ANALYTICS_WINDOW_DAYS (derived from ANALYTICS_WINDOW_MS) to match the design and parameterization described in the document.

coderabbitai · 2026-06-02T23:58:36Z

+**Final approach**: When `streamStarted === false` on stall detection, throw a retryable error so the outer retry loop can attempt a fallback provider:
+
+```typescript
+if (now - lastChunkTimestamp > MAX_STREAM_STALL_MS) {
+  console.warn(`[Proxy] Stream stalled for ${now - lastChunkTimestamp}ms — aborting socket`);
+  streamAborted = true;
+  cleanupStream();
+
+  if (streamStarted) {
+    // Mid-stream stall — cannot retry, write error frame and end
+    const payload = { error: { message: 'Upstream stream stalled', type: 'stream_timeout' } };
+    try {
+      if (responseStreamContext) {
+        writeResponseStreamEvent(res, {
+          type: 'response.failed',
+          response: {
+            id: responseStreamContext.responseId,
+            status: 'failed',
+            error: payload.error,
+          },
+        });
+      } else {
+        res.write(`data: ${JSON.stringify(payload)}\n\n`);
+        res.write('data: [DONE]\n\n');
+      }
+      res.end();
+    } catch { /* Socket already gone */ }
+  } else {
+    // Pre-stream stall — no headers sent yet, response is still retryable
+    // Throw an error to fall through to the outer retry/502 handler
+    throw Object.assign(
+      new Error(`Upstream provider stalled before yielding any data from ${route.displayName}`),
+      { status: 504 },
+    );
+  }
+}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Pre-stream stall error propagation may not work as designed.

The design proposes throwing an error from within the setInterval callback to trigger the outer retry loop (lines 273-277). However, errors thrown inside setInterval callbacks do NOT propagate to the outer try/catch block—they trigger uncaughtException or are silently swallowed depending on the environment.

The current design shows the stall handler throwing an error, but this won't reach the catch (streamErr) block at line 1392. Instead, the streamAborted flag approach (already in place) should be used: set the flag, call cleanupStream(), but don't throw. The for-await loop will break on streamAborted, and the post-loop code should detect streamAborted && !streamStarted and throw the retryable error there.

🔧 Recommended control flow fix

Update the design to clarify that the throw happens in the post-loop check, not in the setInterval callback:

In the stall handler (inside setInterval):

Set streamAborted = true

Call cleanupStream()

If streamStarted, write error frame and res.end()

If !streamStarted, do NOT throw—just return from the callback

After the for-await loop (lines 171-178):

cleanupStream(); if (streamAborted) { if (!streamStarted) { // Pre-stream stall - throw to retry throw Object.assign( new Error('Upstream provider stalled before yielding data'), { status: 504 } ); } // Mid-stream stall - already handled in interval callback logRequest(..., 'stream_stalled'); return; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 243 - 278, The stall handler currently throws an error inside the setInterval callback which won’t propagate to the outer try/catch; instead, in the interval callback set streamAborted = true and call cleanupStream(), and if streamStarted write the error frame and res.end() but do NOT throw; after the for-await loop (where the stream is consumed) run cleanupStream() and then if (streamAborted && !streamStarted) throw the retryable error (Object.assign(new Error(...), { status: 504 })) so the outer retry/catch sees it; use the existing symbols streamAborted, streamStarted, cleanupStream(), writeResponseStreamEvent, res.end and the post-loop logic to implement this control flow change.

coderabbitai · 2026-06-02T23:58:36Z

+- [ ] **T-2**: Export `transientModelCooldowns` and `TRANSIENT_COOLDOWN_MS` from [`proxy.ts`](server/src/routes/proxy.ts:170) in the existing export block for test access
+- [ ] **T-3**: Add pre-routing cooldown injection logic inside [`handleChatCompletion()`](server/src/routes/proxy.ts:1061) — after `skipModels` initialization at [line 1179](server/src/routes/proxy.ts:1179) and before the retry loop at [line 1245](server/src/routes/proxy.ts:1245). Iterate `transientModelCooldowns`, prune expired entries, and add active cooldowns to `skipModels`
+- [ ] **T-4**: Add sticky session override logic — after cooldown injection, check if `preferredModel` is on global cooldown and clear `preferredModel`/`preferredKeyId` if so. Place this after the existing session-ban platform check at [line 1195](server/src/routes/proxy.ts:1195)
+- [ ] **T-5**: Register global cooldown in the retry loop catch block at [line 1570](server/src/routes/proxy.ts:1570) — when `errStatus` is `5xx` or `undefined` (connection failure) and `isRetryableError(err)` is true, set `transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS)` and add to local `skipModels`


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify if isRetryableError treats 429 as retryable rg -n -C5 'function isRetryableError|const isRetryableError' --type=ts --glob '**/proxy.ts'

Repository: vi70x3/freellmapi

Length of output: 922

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect the cooldown registration site referenced by the spec (around line 1570) sed -n '1550,1605p' server/src/routes/proxy.ts # 2) Locate where transientModelCooldowns.set is used and inspect surrounding logic rg -n "transientModelCooldowns\.set" server/src/routes/proxy.ts -n rg -n "skipModels" server/src/routes/proxy.ts -n # Print a wider window around the first occurrence of transientModelCooldowns.set first=$(rg -n "transientModelCooldowns\.set" server/src/routes/proxy.ts | head -n1 | cut -d: -f1) start=$((first-40)) end=$((first+80)) sed -n "${start},${end}p" server/src/routes/proxy.ts # 3) Read the spec file to see what T-7 asserts (line 13 mentioned in the comment) sed -n '1,60p' .roo/specs/transient-model-cooldown/tasks.md

Repository: vi70x3/freellmapi

Length of output: 14681

🏁 Script executed:

#!/bin/bash set -euo pipefail # Retry-loop catch block around the spec-referenced line (~1570) sed -n '1500,1650p' server/src/routes/proxy.ts # Also inspect the retry loop region a bit earlier for how errStatus is computed sed -n '1400,1505p' server/src/routes/proxy.ts # Show full isRetryableError + getErrorStatus (for context of what "429" maps to) rg -n "function getErrorStatus|const getErrorStatus|function isRetryableError|const isRetryableError" server/src/routes/proxy.ts sed -n '480,580p' server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 18130

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show the second transient cooldown set site (often the retry-loop one) # Use line numbers from the earlier grep output: one at 1547 (mid-stream), one at 1724 (likely retry loop). sed -n '1680,1760p' server/src/routes/proxy.ts # Find the retry-loop catch block by searching for "errStatus" and "isRetryableError(err)". rg -n "errStatus\s*=|isRetryableError$err$|transientModelCooldowns\.set" server/src/routes/proxy.ts # Show the retry loop section around where errStatus is computed. # (Pick a window around the first "errStatus" occurrence.) first_err_status=$(rg -n "errStatus\s*=" server/src/routes/proxy.ts | head -n1 | cut -d: -f1) start=$((first_err_status-80)) end=$((first_err_status+120)) sed -n "${start},${end}p" server/src/routes/proxy.ts

Repository: vi70x3/freellmapi

Length of output: 13713

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show tasks.md with line numbers to locate T-5 and T-7 expectations precisely nl -ba .roo/specs/transient-model-cooldown/tasks.md | sed -n '1,120p'

Repository: vi70x3/freellmapi

Length of output: 104

🏁 Script executed:

#!/bin/bash set -euo pipefail cat -n .roo/specs/transient-model-cooldown/tasks.md | sed -n '1,140p'

Repository: vi70x3/freellmapi

Length of output: 2609

Fix spec/code mismatch: retry-loop transient cooldown currently applies to 429

isRetryableError(err) treats status === 429 as retryable.

In server/src/routes/proxy.ts, the retry-loop catch registers transientModelCooldowns.set(...) under if (isRetryableError(err)) only (no errStatus 5xx/undefined guard), so 429 will register a transient cooldown.

This contradicts T-5’s stated “only 5xx/undefined” condition and T-7’s “exclude 429” expectation.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/transient-model-cooldown/tasks.md at line 9, The retry-loop catch currently sets transientModelCooldowns for any isRetryableError(err) (which includes 429); update the catch in server/src/routes/proxy.ts so the global transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS) and adding to local skipModels only happens when isRetryableError(err) is true AND errStatus is either undefined (connection failure) or a 5xx status (e.g. errStatus >=500 && errStatus <600); ensure 429 (status 429) does not trigger the cooldown by guarding with that errStatus check and keep using TRANSIENT_COOLDOWN_MS and route.modelDbId as the identifiers.

coderabbitai · 2026-06-02T23:58:36Z

+  error.status =
+    typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>)
+      ? Number((errPayload as Record<string, unknown>).code)
+      : 200;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Potential NaN from non-numeric error codes.

Line 73-74 extracts error.code and calls Number(code) without validating that code is numeric. If a provider returns {"error": {"code": "RATE_LIMIT"}}, this would result in status: NaN. Consider adding a validation or fallback: const codeNum = Number(code); status = isNaN(codeNum) ? 200 : codeNum.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/wrapped-error-interception/design.md around lines 72 - 75, The current assignment to error.status converts errPayload.code with Number(...) without validation, which can produce NaN for non-numeric codes; update the logic where error.status is set (the block referencing errPayload and error.status) to parse the code into a numeric value, check isNaN on that parsed result, and use a safe fallback (e.g., 200) when parsing fails or the value is not a finite number, ensuring error.status is always a valid number.

coderabbitai · 2026-06-02T23:58:37Z

+                  });
+                } else {
+                  res.write(`data: ${JSON.stringify(payload)}\n\n`);
+                  res.write('data: [DONE]\n\n');;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove extra semicolon.

Line 1376 has a double semicolon which is unnecessary.

🔧 Proposed fix

- res.write('data: [DONE]\n\n');; + res.write('data: [DONE]\n\n');

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

res.write('data: [DONE]\n\n');;

res.write('data: [DONE]\n\n');

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` at line 1376, The double semicolon after the res.write call is extraneous; in the function handling the streaming response where res.write('data: [DONE]\n\n');; appears, remove the extra semicolon so the line is a single statement (res.write('data: [DONE]\n\n');). This will clean up the syntax without changing behavior—locate the res.write invocation in the proxy route handler and delete the trailing duplicate semicolon.

coderabbitai · 2026-06-02T23:58:37Z

+             // Register global transient cooldown for any 5xx mid-stream error
+             transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);
+             console.log(`[TransientCooldown] registered global cooldown for modelDbId=${route.modelDbId} (${TRANSIENT_COOLDOWN_MS / 1000}s)`);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Comment is misleading.

The comment states "Register global transient cooldown for any 5xx mid-stream error" but this code only executes for ban-eligible 5xx errors (500, 502, 503, 504) because it's inside the if (streamErrStatus && isBanEligibleStatus(streamErrStatus)) block (line 1527).

📝 Proposed fix

- // Register global transient cooldown for any 5xx mid-stream error + // Register global transient cooldown for this ban-eligible 5xx mid-stream error transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1546 - 1548, The comment above the transient cooldown registration is misleading because the block only runs for ban-eligible 5xx statuses; update the comment near transientModelCooldowns.set(route.modelDbId, ...) to accurately state that this registers a global transient cooldown only for ban-eligible 5xx mid-stream errors (i.e., when streamErrStatus passes isBanEligibleStatus), referencing TRANSIENT_COOLDOWN_MS and route.modelDbId to keep context; do not change logic, only correct the comment text to reflect the actual condition.

coderabbitai · 2026-06-02T23:58:37Z

+      COUNT(*) as raw_total,
+      SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,
+      SUM(CASE WHEN status = 'success'
+        THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))
+        ELSE 0 END) as successes,


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

SQL syntax error: extra closing parenthesis in both weighted aggregates.

Both new recency-weighted expressions have one extra ):

Line 184 total: SUM(MAX(0, MIN(1.0, … / 7.0)))) has 6 opening vs 7 closing parens.

Lines 185‑187 successes: SUM(CASE … THEN MAX(0, MIN(1.0, … / 7.0))) ELSE 0 END) also has one extra ).

db.prepare() will throw at runtime, breaking refreshStatsCache() and therefore all routing.

🐛 Proposed fix

- SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total, + SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total, SUM(CASE WHEN status = 'success' - THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) + THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)) ELSE 0 END) as successes,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

COUNT(*) as raw_total,

SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,

SUM(CASE WHEN status = 'success'

THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))

ELSE 0 END) as successes,

COUNT(*) as raw_total,

SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total,

SUM(CASE WHEN status = 'success'

THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))

ELSE 0 END) as successes,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/router.ts` around lines 183 - 187, The SQL has mismatched parentheses in the recency-weighted aggregates: in the SELECT expressions for "total" and "successes" remove the extra closing parenthesis so MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)) is balanced; update the SUM(...) and SUM(CASE ... THEN ... ELSE 0 END) calls accordingly (symbols to locate: raw_total, total, successes, created_at, julianday, MAX, MIN, SUM, CASE) so db.prepare() in refreshStatsCache() will no longer throw.

coderabbitai · 2026-06-02T23:58:37Z

+    "  });",
+    "",
+    "  it('should throw when no keys are configured', () => {",
+    "    expect(() => route{


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix incomplete string literal.

Line 29 has an incomplete string that will cause a Python syntax error when the script is executed. The string "expect(() => route{" is missing a closing quote and appears to be truncated.

If this script is still needed, complete the test generation logic. If it's a development artifact, consider removing it from the PR.

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 29-29: missing closing quote in string literal

(invalid-syntax)

[warning] 29-29: unexpected EOF while parsing

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/write_test.py` at line 29, The file contains an incomplete/truncated string literal "expect(() => route{ in write_test.py which causes a syntax error; fix by either completing the test generation logic (close the string and finish the test expression) or remove the development artifact line; locate the occurrence of the truncated literal in write_test.py (search for expect(() => route{) and either complete the test statement in the test function that builds route assertions or delete the line if the test is not needed.

coderabbitai · 2026-06-02T23:58:37Z

+import os
+
+# Write the test file in parts to avoid truncation
+path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts'
+
+# Part 1: existing tests
+part1 = """import { describe, it, expect, beforeAll, beforeEach } from 'vitest';
+import { initDb, getDb } from '../../db/index.js';
+import { encrypt } from '../../lib/crypto.js';
+import { routeRequest, refreshStatsCache, getAnalyticsScores } from '../../services/router.js';
+
+describe('Router', () => {
+  beforeAll(() => {
+    process.env.ENCRYPTION_KEY = '0'.repeat(64);
+    initDb(':memory:');
+  });
+
+  beforeEach(() => {
+    const db = getDb();
+    db.prepare('DELETE FROM api_keys').run();
+    const models = db.prepare('SELECT id, intelligence_rank FROM models ORDER BY intelligence_rank ASC').all() as any[];
+    const update = db.prepare('UPDATE fallback_config SET priority = ? WHERE model_db_id = ?');
+    for (let i = 0; i < models.length; i++) {
+      update.run(i + 1, models[i].id);
+    }
+  });
+
+  it('should throw when no keys are configured', () => {
+    expect(() => routeRequest()).toThrow(/exhausted/i);
+  });
+
+  it('should route to highest priority model with available key', () => {
+    const db = getDb();
+    const { encrypted, iv, authTag } = encrypt('test-groq-key');
+    db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('groq', 'test', encrypted, iv, authTag, 'healthy', 1);
+    const result = routeRequest();
+    expect(result.platform).toBe('groq');
+    expect(result.apiKey).toBe('test-groq-key');
+  });
+
+  it('should route to an available model when keys exist for multiple platforms', () => {
+    const db = getDb();
+    const googleKey = encrypt('test-google-key');
+    db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('google', 'test', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 1);
+    const groqKey{


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove this code-generation artifact from the repo.

write_tests.py (along with the related write_test.py/do_fix.py/fix*.py scratch scripts) is a throwaway generator, not part of the application. It hardcodes a machine-specific absolute path (/home/vi/freellmapi/…) and the part1 triple-quoted string is left unterminated, so the file is not even valid Python (confirmed by the static-analysis invalid-syntax hint). These artifacts shouldn't be tracked — author the test file directly instead.

Want me to open an issue to remove these generator scripts and add them to .gitignore?

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 7-45: missing closing quote in string literal

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/write_tests.py` around lines 1 - 45, The file write_tests.py is a throwaway generator with a hardcoded absolute path (path variable) and an unterminated triple-quoted string (part1) causing syntax errors; remove this file (and related scratch scripts like write_test.py/do_fix.py/fix*.py) from the repository, add them to .gitignore, and instead commit the intended test file directly (the test content currently in the part1 string should be placed into server/src/__tests__/services/router.test.ts as a regular file); ensure no references to the removed generator remain in the repo or CI.

BUG-05: Abort upstream provider stream on stall detection by breaking the for-await loop and calling gen.return() when the keepalive timer detects MAX_STREAM_STALL_MS has elapsed without data. BUG-06: Fix cooldown guard to use the actual routable fallback chain (fallback_config JOIN models) instead of all enabled models, ensuring transient cooldowns only skip models that would actually be routed to. BUG-10: Remove double semicolon in proxy.ts. Also adds SSE keep-alive comments during idle periods, transient model cooldown injection before retry loops, and LongCat sticky session cooldown support in balanced routing mode.

vi70x3 added 4 commits June 2, 2026 15:12

feat(providers): intercept wrapped error payloads on HTTP 200 responses

2fd34ba

feat(proxy): replace hardcoded LongCat/Owl Alpha cooldowns with gener…

09bffb0

…alized thread protection scanner

chore: temporary commit before switching branch

d4ea579

feat(thread-protection): implement rules engine and replace hardcoded…

24c1c80

… longcat branches

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

sourcery-ai Bot reviewed Jun 2, 2026

View reviewed changes

qodo-code-review Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot suggested changes Jun 2, 2026

View reviewed changes

vi70x3 closed this Jun 4, 2026

This was referenced Jun 5, 2026

Feat/realtime sticky #16

Closed

Fix/realtime sticky #18

Closed

vi70x4 mentioned this pull request Jun 5, 2026

fix realtime sticky sessions protection x2 #19

Merged

vi70x4 deleted the spec/generalized-thread-protection branch June 7, 2026 00:30

	res.write('data: [DONE]\n\n');;
	res.write('data: [DONE]\n\n');

		SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,
		SUM(CASE WHEN status = 'success'


		The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`.

		Formula: `MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))`

Conversation

vi70x3 commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Task 4a: Rules engine in threadProtection.ts

Task 4b: Integration into proxy.ts

Summary by Sourcery

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

sourcery-ai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

qodo-code-review Bot commented Jun 2, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

vi70x3 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Task 4a: Rules engine in `threadProtection.ts`

Task 4b: Integration into `proxy.ts`

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

sourcery-ai Bot commented Jun 2, 2026 •

edited

Loading

qodo-code-review Bot commented Jun 2, 2026 •

edited

Loading