Fix/realtime sticky#18
Conversation
…alized thread protection scanner
… longcat branches
…tracking for LongCat and Owl Alpha
- Change activeRequests from Map to Set to allow concurrent requests from same session - Add stale active request cleanup with 10-minute TTL - Cache owl-alpha model ID to avoid repeated DB lookups - Fix active request iteration to use Set-compatible syntax
- Remove package-lock.json (npm lockfile) - Add packageManager field to package.json - Create .npmrc with pnpm configuration
BUG-05: Abort upstream provider stream on stall detection by breaking the for-await loop and calling gen.return() when the keepalive timer detects MAX_STREAM_STALL_MS has elapsed without data. BUG-06: Fix cooldown guard to use the actual routable fallback chain (fallback_config JOIN models) instead of all enabled models, ensuring transient cooldowns only skip models that would actually be routed to. BUG-10: Remove double semicolon in proxy.ts. Also adds SSE keep-alive comments during idle periods, transient model cooldown injection before retry loops, and LongCat sticky session cooldown support in balanced routing mode.
…, TTL refresh, collapsible pools, doc paths, cleanup
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity
… pre-stream, cooldown gating, timer cleanup, a11y, log clarity, test fixes
|
MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories. |
Reviewer's GuideIntroduces generalized provider/thread protection for streaming and non-streaming chat completions, adds transient model cooldowns and SSE heartbeat/stall protection on the server, and restructures fallback routing analytics and UI by adding model pools with grouping, while also improving wrapped-error handling and fixing several design/spec/test issues. Sequence diagram for SSE streaming with stall protection and thread protectionsequenceDiagram
actor Client
participant Proxy as handleChatCompletion
participant Provider as route.provider
participant ThreadProtection as evaluateThreadProtection
Client->>Proxy: handleChatCompletion
Proxy->>Provider: streamChatCompletion(apiKey, messages, modelId, options)
Proxy->>Proxy: activeRequests.add
Proxy->>Proxy: setInterval(keepaliveTimer)
loop for each chunk
Provider-->>Proxy: ChatCompletionChunk
Proxy->>Proxy: writeResponseStreamChunk / res.write
end
alt [stream stalls]
Proxy->>Proxy: cleanup
alt [streamStarted]
Proxy-->>Client: writeResponseStreamEvent / res.write timeout
Proxy-->>Client: res.end
else [pre-stream stall]
Proxy->>Proxy: throw Error(status=504)
end
else [mid-stream 5xx]
Proxy->>ThreadProtection: evaluateThreadProtection({ platform, kind: '5xx', midStream: true })
ThreadProtection-->>Proxy: ThreadProtectionAction
alt [action.banProvider]
Proxy->>Proxy: banPlatformFromSession
Proxy->>Proxy: addProviderModelsToSkipModels
end
alt [action.skipModel]
Proxy->>Proxy: skipModels.add(route.modelDbId)
end
Proxy->>Proxy: transientModelCooldowns.set(route.modelDbId, expiry)
end
Proxy->>Proxy: activeRequests.delete(sessionKey, platform, modelId)
Proxy-->>Client: stream completes / response
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Review Summary by QodoRealtime Sticky Session Improvements: Thread Protection, Stream Stall Detection, Transient Cooldowns, and Wrapped Error Handling
WalkthroughsDescription**Core Features:** • Generalized thread protection system replacing hardcoded platform checks with configurable getProtectionLevel() and evaluateThreadProtection() functions supporting provider-ban, model-skip, and off modes • Stream keepalive heartbeat (15s interval) and stall detection (45s threshold) to prevent hanging SSE streams with graceful termination • Transient model cooldowns (15s global window) for 5xx errors shared across concurrent requests with sticky session override logic • Wrapped error detection for HTTP 200 responses with root-level error field across all provider adapters (OpenAI-compat, Cohere, Cloudflare, Google) • Recency-weighted analytics using 7-day decay function to prioritize recent performance data in routing decisions **Implementation Details:** • New threadProtection.ts service module with environment-driven configuration via THREAD_PROTECTION_PLATFORMS • Active request tracking to prevent concurrent sessions from overwhelming provider-ban platforms • Fallback chain for model skipping instead of all enabled models • Model pool classification (Fast, Balanced, Smart) in fallback API and UI with collapsible pool sections • Updated balanced mode routing to allow preferred sticky models through exclusion filters **Testing & Documentation:** • Comprehensive test suites for transient cooldowns (30+ cases), stream heartbeat/stall detection (5 cases), and pool classification • Design specifications for all major features with architecture diagrams and edge case analysis • Requirements and task documentation for implementation tracking • Bug fix documentation addressing 10 verified issues from code review **Configuration:** • Added pnpm package manager specification (v11.1.3) with npm configuration file • Protected error message extraction visibility for provider reuse Diagramflowchart LR
A["Incoming Request"] --> B["Thread Protection Scanner"]
B --> C["getProtectionLevel()"]
C --> D["evaluateThreadProtection()"]
D --> E["Action: Ban/Skip/Clear"]
A --> F["Transient Cooldown Check"]
F --> G["Skip Cooled Models"]
G --> H["Route to Provider"]
H --> I["Stream Keepalive"]
I --> J["Heartbeat 15s"]
J --> K["Stall Detection 45s"]
K --> L["Graceful Termination"]
H --> M["Wrapped Error Detection"]
M --> N["HTTP 200 + error field"]
N --> O["Register Cooldown"]
O --> P["Retry with Next Model"]
File Changes1. server/src/routes/proxy.ts
|
Code Review by Qodo
1. Timer throw crashes process
|
|
MergeGuard — Free plan allows 1 active repository. Upgrade to protect more repositories. |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In the new stream stall handler, the pre-stream timeout path throws an error directly from inside the
setIntervalcallback, which won’t be caught by the outertry/catchand can surface as an unhandled exception; consider signaling the outer flow (e.g., via a flag or abort controller) instead of throwing inside the timer so the retry logic can handle the timeout deterministically. - The transient model cooldown handling in the non-stream error path has overlapping branches (ban-eligible 5xx vs
isTransientCooldownEligible) that both register cooldowns on non-retryable errors; it would be clearer and less error-prone to consolidate this into a single decision path so it’s obvious exactly when a model enters cooldown and you avoid duplicatedset/log behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the new stream stall handler, the pre-stream timeout path throws an error directly from inside the `setInterval` callback, which won’t be caught by the outer `try/catch` and can surface as an unhandled exception; consider signaling the outer flow (e.g., via a flag or abort controller) instead of throwing inside the timer so the retry logic can handle the timeout deterministically.
- The transient model cooldown handling in the non-stream error path has overlapping branches (ban-eligible 5xx vs `isTransientCooldownEligible`) that both register cooldowns on non-retryable errors; it would be clearer and less error-prone to consolidate this into a single decision path so it’s obvious exactly when a model enters cooldown and you avoid duplicated `set`/log behavior.
## Individual Comments
### Comment 1
<location path="server/src/routes/proxy.ts" line_range="1338-1343" />
<code_context>
+ let lastChunkTime = Date.now();
+ let stalled = false;
+
+ const cleanup = () => {
+ clearInterval(keepaliveTimer);
+ try { gen.return(undefined); } catch { /* already closed */ }
+ };
+
+ const keepaliveTimer = setInterval(() => {
+ if (stalled) {
+ clearInterval(keepaliveTimer);
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid referencing keepaliveTimer before initialization and throwing inside the interval callback
Two issues to address:
1) `cleanup` closes over `keepaliveTimer` before it’s initialized with `const`. If `cleanup` runs early (e.g. `req.on('close')` fires immediately), `clearInterval(keepaliveTimer)` will hit the temporal dead zone and throw a `ReferenceError`. Declare `let keepaliveTimer: NodeJS.Timeout | undefined` before `cleanup`, assign it after, and guard `clearInterval` with `if (keepaliveTimer)`.
2) Throwing from inside the `setInterval` callback won’t be caught by the outer `try/catch` around the streaming loop and can crash the process. Instead, set a flag and have the main loop handle the error, or call a rejection/abort handler from the timer callback rather than throwing directly.
</issue_to_address>
### Comment 2
<location path="server/src/providers/openai-compat.ts" line_range="115-116" />
<code_context>
const decoder = new TextDecoder();
let buffer = '';
+ let hasYielded = false;
while (true) {
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Wrapped-error detection is inconsistent between providers and depends on hasYielded flag
For OpenAI-compatible streaming you only call `isWrappedError(parsed)` before the first yielded chunk (`!hasYielded`), while Cloudflare/Cohere/Google check every chunk. This means wrapped error payloads later in the stream won’t be detected. Consider either checking `isWrappedError` on every chunk for consistency, or explicitly documenting that only first-chunk errors are treated as wrapped and confirming upstream behavior matches that assumption.
Suggested implementation:
```typescript
let parsed: ChatCompletionChunk;
try {
parsed = JSON.parse(data) as ChatCompletionChunk;
} catch {
// Skip malformed chunks
}
// Detect wrapped errors consistently on every chunk
if (this.isWrappedError(parsed)) {
this.throwWrappedError(parsed);
}
```
I assumed that the existing code only called `this.isWrappedError(parsed)` conditionally, something like `if (!hasYielded && this.isWrappedError(parsed))`. If that condition still exists elsewhere in the file, you should remove the `!hasYielded &&` part so that `isWrappedError` is checked unconditionally:
- Replace `if (!hasYielded && this.isWrappedError(parsed)) {` with `if (this.isWrappedError(parsed)) {`.
If `hasYielded` is no longer used for anything else after this change, you should also remove the `let hasYielded = false;` declaration to avoid an unused variable.
</issue_to_address>
### Comment 3
<location path="server/src/__tests__/routes/stream-heartbeat-stall.test.ts" line_range="34-43" />
<code_context>
+describe('SSE stream heartbeat and stall protection', () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Time-based stream heartbeat tests are at risk of flakiness; consider using fake timers.
These tests cover key behavior but depend on real `setTimeout` delays and tweaked global config, which can be flaky in CI and slow.
Consider:
- Using `vi.useFakeTimers()` / `vi.setSystemTime()` and advancing timers instead of real waits.
- Driving heartbeat and stall detection via `vi.advanceTimersByTime()` so you can assert exact timeout/error points.
- Keeping at most one end-to-end test with real timing, and moving other timing-sensitive checks to a fully fake-timer setup.
That should keep coverage of the heartbeat/stall logic while making the suite faster and more reliable.
</issue_to_address>
### Comment 4
<location path="server/src/__tests__/routes/transient-cooldown.test.ts" line_range="316-325" />
<code_context>
+ describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Cooldown registration tests re-encode implementation conditions rather than exercising the actual routing paths.
These tests largely restate the implementation’s boolean conditions instead of exercising the real code that mutates `transientModelCooldowns`, making them fragile and tightly coupled to current logic.
Prefer tests that:
- Invoke the actual error-handling path (or a thin wrapper) with simulated statuses/errors.
- Assert on `transientModelCooldowns` contents for cases like 5xx, 429, 401, 404, and undefined status.
That way the tests validate observable behavior and remain stable even if the internal conditions or eligible statuses change.
Suggested implementation:
```typescript
// ---------- Test Suite 5: Cooldown registration Error Classification ----------
describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
beforeEach(() => {
// Ensure we start from a clean cooldown state for each test
transientModelCooldowns.clear();
});
function invokeCooldownErrorPath(options: {
status?: number;
error?: Error;
modelId?: string;
}) {
/**
* Thin wrapper around the real error-handling / routing code that is
* responsible for registering cooldowns for transient models.
*
* This MUST call into the same path the router uses when a transient
* model request fails (e.g. something like `handleTransientModelError`),
* so that these tests exercise observable behavior rather than
* re-encoding implementation details.
*/
return handleTransientModelErrorForTest(options);
}
it('registers a cooldown for 5xx upstream errors (500-504)', () => {
const eligibleStatuses = [500, 502, 503, 504];
for (const status of eligibleStatuses) {
transientModelCooldowns.clear();
invokeCooldownErrorPath({
status,
error: new Error(`upstream ${status}`),
modelId: 'test-model',
});
// Assert based on observable cooldown state rather than status checks
expect(transientModelCooldowns.has('test-model')).toBe(true);
}
});
it('does not register a cooldown for 429 rate limit errors', () => {
invokeCooldownErrorPath({
status: 429,
error: new Error('rate limited'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(false);
});
it('does not register a cooldown for 401 unauthorized errors', () => {
invokeCooldownErrorPath({
status: 401,
error: new Error('unauthorized'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(false);
});
it('does not register a cooldown for 404 not found errors', () => {
invokeCooldownErrorPath({
status: 404,
error: new Error('not found'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(false);
});
it('registers a cooldown when there is a connection failure (no status)', () => {
invokeCooldownErrorPath({
status: undefined,
error: new Error('ECONNRESET'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(true);
});
```
1. Implement `handleTransientModelErrorForTest` in this test file (or import it) so that it calls the **same** error-handling path the router uses to register cooldowns for transient models. For example, it might delegate to something like `handleTransientModelError({ status, error, modelId })` exported from the route module.
2. Ensure `transientModelCooldowns` is imported/accessible in this test file and supports `.clear()` and `.has(modelId)` (e.g. a `Map` or similar). If the underlying structure differs (e.g. a `Map` keyed by provider+model, or a plain object), adjust the assertions to check the appropriate key and API.
3. Remove or update any remaining tests inside this `describe` block that still restate implementation conditions (e.g. any leftover `it('429 rate limit is NOT eligible...` that only checks booleans) so that all tests in this suite go through `invokeCooldownErrorPath`.
4. If your production code uses a different identifier than `'test-model'` (e.g. includes provider or route info), update the `modelId` and corresponding `has(...)` checks to match the real key shape used in `transientModelCooldowns`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| const cleanup = () => { | ||
| clearInterval(keepaliveTimer); | ||
| try { gen.return(undefined); } catch { /* already closed */ } | ||
| }; | ||
|
|
||
| const keepaliveTimer = setInterval(() => { |
There was a problem hiding this comment.
issue (bug_risk): Avoid referencing keepaliveTimer before initialization and throwing inside the interval callback
Two issues to address:
-
cleanupcloses overkeepaliveTimerbefore it’s initialized withconst. Ifcleanupruns early (e.g.req.on('close')fires immediately),clearInterval(keepaliveTimer)will hit the temporal dead zone and throw aReferenceError. Declarelet keepaliveTimer: NodeJS.Timeout | undefinedbeforecleanup, assign it after, and guardclearIntervalwithif (keepaliveTimer). -
Throwing from inside the
setIntervalcallback won’t be caught by the outertry/catcharound the streaming loop and can crash the process. Instead, set a flag and have the main loop handle the error, or call a rejection/abort handler from the timer callback rather than throwing directly.
| let buffer = ''; | ||
| let hasYielded = false; |
There was a problem hiding this comment.
suggestion (bug_risk): Wrapped-error detection is inconsistent between providers and depends on hasYielded flag
For OpenAI-compatible streaming you only call isWrappedError(parsed) before the first yielded chunk (!hasYielded), while Cloudflare/Cohere/Google check every chunk. This means wrapped error payloads later in the stream won’t be detected. Consider either checking isWrappedError on every chunk for consistency, or explicitly documenting that only first-chunk errors are treated as wrapped and confirming upstream behavior matches that assumption.
Suggested implementation:
let parsed: ChatCompletionChunk;
try {
parsed = JSON.parse(data) as ChatCompletionChunk;
} catch {
// Skip malformed chunks
}
// Detect wrapped errors consistently on every chunk
if (this.isWrappedError(parsed)) {
this.throwWrappedError(parsed);
}I assumed that the existing code only called this.isWrappedError(parsed) conditionally, something like if (!hasYielded && this.isWrappedError(parsed)). If that condition still exists elsewhere in the file, you should remove the !hasYielded && part so that isWrappedError is checked unconditionally:
- Replace
if (!hasYielded && this.isWrappedError(parsed)) {withif (this.isWrappedError(parsed)) {.
If hasYielded is no longer used for anything else after this change, you should also remove the let hasYielded = false; declaration to avoid an unused variable.
| describe('SSE stream heartbeat and stall protection', () => { | ||
| let app: Express; | ||
| let origKeepaliveInterval: number; | ||
| let origMaxStall: number; | ||
|
|
||
| beforeAll(() => { | ||
| process.env.ENCRYPTION_KEY = '0'.repeat(64); | ||
| process.env.ADMIN_DASHBOARD_KEY = 'test-admin-key-that-is-long-enough'; | ||
| process.env.NODE_ENV = 'test'; | ||
| initDb(':memory:'); |
There was a problem hiding this comment.
suggestion (testing): Time-based stream heartbeat tests are at risk of flakiness; consider using fake timers.
These tests cover key behavior but depend on real setTimeout delays and tweaked global config, which can be flaky in CI and slow.
Consider:
- Using
vi.useFakeTimers()/vi.setSystemTime()and advancing timers instead of real waits. - Driving heartbeat and stall detection via
vi.advanceTimersByTime()so you can assert exact timeout/error points. - Keeping at most one end-to-end test with real timing, and moving other timing-sensitive checks to a fully fake-timer setup.
That should keep coverage of the heartbeat/stall logic while making the suite faster and more reliable.
| describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => { | ||
| it('5xx status codes (500-504) are eligible for cooldown registration', () => { | ||
| // Simulate the condition: (errStatus >= 500 && errStatus < 600) | ||
| const eligibleStatuses = [500, 502, 503, 504]; | ||
| for (const status of eligibleStatuses) { | ||
| const condition = status !== undefined && status >= 500 && status < 600; | ||
| expect(condition).toBe(true); | ||
| } | ||
| }); | ||
|
|
There was a problem hiding this comment.
suggestion (testing): Cooldown registration tests re-encode implementation conditions rather than exercising the actual routing paths.
These tests largely restate the implementation’s boolean conditions instead of exercising the real code that mutates transientModelCooldowns, making them fragile and tightly coupled to current logic.
Prefer tests that:
- Invoke the actual error-handling path (or a thin wrapper) with simulated statuses/errors.
- Assert on
transientModelCooldownscontents for cases like 5xx, 429, 401, 404, and undefined status.
That way the tests validate observable behavior and remain stable even if the internal conditions or eligible statuses change.
Suggested implementation:
// ---------- Test Suite 5: Cooldown registration Error Classification ----------
describe('Cooldown registration: only 5xx and connection failures trigger cooldown', () => {
beforeEach(() => {
// Ensure we start from a clean cooldown state for each test
transientModelCooldowns.clear();
});
function invokeCooldownErrorPath(options: {
status?: number;
error?: Error;
modelId?: string;
}) {
/**
* Thin wrapper around the real error-handling / routing code that is
* responsible for registering cooldowns for transient models.
*
* This MUST call into the same path the router uses when a transient
* model request fails (e.g. something like `handleTransientModelError`),
* so that these tests exercise observable behavior rather than
* re-encoding implementation details.
*/
return handleTransientModelErrorForTest(options);
}
it('registers a cooldown for 5xx upstream errors (500-504)', () => {
const eligibleStatuses = [500, 502, 503, 504];
for (const status of eligibleStatuses) {
transientModelCooldowns.clear();
invokeCooldownErrorPath({
status,
error: new Error(`upstream ${status}`),
modelId: 'test-model',
});
// Assert based on observable cooldown state rather than status checks
expect(transientModelCooldowns.has('test-model')).toBe(true);
}
});
it('does not register a cooldown for 429 rate limit errors', () => {
invokeCooldownErrorPath({
status: 429,
error: new Error('rate limited'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(false);
});
it('does not register a cooldown for 401 unauthorized errors', () => {
invokeCooldownErrorPath({
status: 401,
error: new Error('unauthorized'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(false);
});
it('does not register a cooldown for 404 not found errors', () => {
invokeCooldownErrorPath({
status: 404,
error: new Error('not found'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(false);
});
it('registers a cooldown when there is a connection failure (no status)', () => {
invokeCooldownErrorPath({
status: undefined,
error: new Error('ECONNRESET'),
modelId: 'test-model',
});
expect(transientModelCooldowns.has('test-model')).toBe(true);
});- Implement
handleTransientModelErrorForTestin this test file (or import it) so that it calls the same error-handling path the router uses to register cooldowns for transient models. For example, it might delegate to something likehandleTransientModelError({ status, error, modelId })exported from the route module. - Ensure
transientModelCooldownsis imported/accessible in this test file and supports.clear()and.has(modelId)(e.g. aMapor similar). If the underlying structure differs (e.g. aMapkeyed by provider+model, or a plain object), adjust the assertions to check the appropriate key and API. - Remove or update any remaining tests inside this
describeblock that still restate implementation conditions (e.g. any leftoverit('429 rate limit is NOT eligible...that only checks booleans) so that all tests in this suite go throughinvokeCooldownErrorPath. - If your production code uses a different identifier than
'test-model'(e.g. includes provider or route info), update themodelIdand correspondinghas(...)checks to match the real key shape used intransientModelCooldowns.
| const keepaliveTimer = setInterval(() => { | ||
| if (stalled) { | ||
| clearInterval(keepaliveTimer); | ||
| return; | ||
| } | ||
| const elapsed = Date.now() - lastChunkTime; | ||
| if (elapsed >= streamKeepaliveConfig.MAX_STREAM_STALL_MS) { | ||
| stalled = true; | ||
| cleanup(); | ||
| if (streamStarted) { | ||
| const payload = { error: { message: 'Stream stalled: no data received within timeout', type: 'stream_timeout' } }; | ||
| try { | ||
| if (responseStreamContext) { | ||
| writeResponseStreamEvent(res, { | ||
| type: 'response.failed', | ||
| response: { | ||
| id: responseStreamContext.responseId, | ||
| status: 'failed', | ||
| error: payload.error, | ||
| }, | ||
| }); | ||
| } else { | ||
| res.write(`data: ${JSON.stringify(payload)}\n\n`); | ||
| res.write('data: [DONE]\n\n'); | ||
| } | ||
| res.end(); | ||
| } catch { /* socket gone */ } | ||
| } else { | ||
| // Pre-stream stall: throw so the outer catch can retry fallback models | ||
| throw Object.assign( | ||
| new Error(`Stream timed out: no data received from provider ${route.displayName}`), | ||
| { status: 504 } | ||
| ); |
There was a problem hiding this comment.
1. Timer throw crashes process 🐞 Bug ☼ Reliability
In server/src/routes/proxy.ts, the pre-stream stall path throws from inside the setInterval keepalive callback, which is outside the request handler’s try/catch and can surface as an uncaught exception (crashing the server) instead of returning a 504 or retrying fallback.
Agent Prompt
### Issue description
`handleChatCompletion()` throws an error from inside the `setInterval()` keepalive callback when a stream stalls before the first chunk. That throw is not catchable by the surrounding request/stream try/catch and can crash the Node process.
### Issue Context
The intention (also asserted by tests) is to return HTTP 504 on pre-stream stall (no SSE headers sent yet) or to allow the outer retry loop to fall back.
### Fix Focus Areas
- server/src/routes/proxy.ts[1343-1416]
### Implementation notes
- Do **not** `throw` inside the timer callback.
- Instead, set a `stallError` variable (or resolve/reject a Promise) and stop the generator (`cleanup()`), then after the `for await` loop ends, `throw stallError` from the main async function flow so the existing outer `catch`/retry logic can handle it.
- Ensure the pre-stream stall path results in `status=504` and `error.type='stream_timeout'` as the test expects.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| } finally { | ||
| // Ensure the session is deregistered immediately on end/abort/fail | ||
| if (sessionKey) { | ||
| for (const active of activeRequests) { | ||
| if (active.sessionKey === sessionKey && active.platform === route.platform && active.modelId === route.modelId) { | ||
| activeRequests.delete(active); | ||
| break; | ||
| } |
There was a problem hiding this comment.
2. Activerequests stale entries 🐞 Bug ☼ Reliability
activeRequests stores per-request objects, but cleanup deletes only the first matching entry and breaks; concurrent requests for the same session/platform/model can leave stale entries that keep provider-ban platforms excluded until the 10-minute TTL cleanup runs.
Agent Prompt
### Issue description
`activeRequests` is a `Set` of newly-created objects. On cleanup, the code searches the set and deletes only the first matching entry, which can leak additional entries if multiple concurrent requests exist with the same `sessionKey/platform/modelId`.
### Issue Context
The code comment says the Set is used to “allow concurrent requests from the same session”. If concurrency is allowed, cleanup must remove the specific entry added by that request (or decrement a reference count).
### Fix Focus Areas
- server/src/routes/proxy.ts[1320-1328]
- server/src/routes/proxy.ts[1629-1638]
- server/src/routes/proxy.ts[1643-1651]
### Implementation notes
Prefer one of:
1) Store the created object in a local `const active = { ... }` and remove it via `activeRequests.delete(active)` in `finally` (no iteration, no ambiguity).
2) Replace the Set with a `Map<string, number>` keyed by `sessionKey|platform|modelId` and increment/decrement counts.
3) If keeping the current structure, delete *all* matches (remove the `break`).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
…ts concurrency, wrapped-error consistency, heartbeat fake timers, cooldown test accuracy
Summary by Sourcery
Refine routing resilience and admin UX by introducing generalized thread protection, transient model cooldowns, and SSE stream heartbeat/stall handling, while surfacing model pools in the fallback dashboard and tightening analytics and error handling across providers.
New Features:
Bug Fixes:
Enhancements:
Documentation:
Tests: