feat: generalized thread protection scanner integration#13
Conversation
…alized thread protection scanner
… longcat branches
📝 WalkthroughWalkthroughThis PR implements seven coordinated proxy routing and resilience features: wrapped-error interception in all provider implementations, a generalized thread-protection decision engine, recency-biased Thompson sampling analytics, transient global cooldowns for failed models, SSE heartbeat/stall protection, model-level routing for Owl Alpha and LongCat with mutual exclusions, and disabling sticky sessions in balanced mode. Includes comprehensive specification documents and new test coverage. ChangesMulti-Feature Proxy Routing Enhancement
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
Reviewer's GuideImplements the generalized thread protection scanner and related routing resiliency features: centralizes provider/model ban decisions, adds transient model cooldowns and SSE heartbeat/stall protection, biases routing analytics toward recent performance, and adjusts router and UI behavior for LongCat/Owl Alpha and balanced vs smart pools. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Review Summary by QodoGeneralized thread protection, stream heartbeat, transient cooldowns, and model-level routing with wrapped error detection
WalkthroughsDescription• **Generalized thread protection scanner**: Replaced 6 hardcoded longcat platform checks with configurable evaluateThreadProtection() function that maps error contexts (5xx, truncation, retryable) to actions (ban provider, skip model, clear sticky) based on per-platform protection levels • **Stream heartbeat and stall protection**: Added SSE keep-alive comments (15s interval) to prevent proxy timeouts and stall detection (45s threshold) to abort unresponsive streams with graceful error handling • **Transient model cooldowns**: Implemented shared in-memory cooldown collection (15s window) for 5xx and connection failures across concurrent requests to prevent thundering herd on degraded models, with sticky session override and exhaustion protection • **Recency-weighted analytics**: Applied 7-day exponential decay formula to success/total counts in Thompson sampling to prioritize recent performance over historical data • **Model-level routing for Owl Alpha and LongCat**: Balanced mode excludes both platforms entirely; smart mode prefers them when valid keys exist; replaced provider-level banning with model-level skipModels • **Wrapped error detection**: Added detection of root-level error field in HTTP 200 responses across all providers (OpenAI, Cohere, Cloudflare, Google) with proper status code extraction • **Fallback API enhancements**: Added pool property to fallback response with ModelPool enum validation and pool-based grouping in UI • **Test coverage**: Added comprehensive test suites for transient cooldowns (40+ cases), stream heartbeat/stall (5 scenarios), balanced mode routing, and fallback pool validation; updated existing tests for balanced/smart mode distinction • **Bug fix**: Fixed SQL INSERT statement syntax error in router tests (duplicate VALUES clause) Diagramflowchart LR
A["Error Context<br/>5xx/truncation/retryable"] -->|evaluateThreadProtection| B["Protection Scanner<br/>threadProtection.ts"]
B -->|per-platform rules| C["Action Decision<br/>ban/skip/clear"]
C -->|applied to| D["Proxy Request<br/>proxy.ts"]
E["Stream Active"] -->|15s idle| F["Heartbeat Comment<br/>SSE keep-alive"]
E -->|45s stall| G["Stream Abort<br/>stream_timeout"]
H["Model Failure<br/>5xx/timeout"] -->|register| I["Transient Cooldown<br/>15s window"]
I -->|exclude from| J["Model Selection<br/>prevent herd"]
K["Analytics Stats"] -->|7-day decay| L["Recency Weight<br/>prioritize recent"]
L -->|Thompson sampling| M["Model Ranking"]
N["HTTP 200 Response"] -->|check| O["Wrapped Error<br/>root error field"]
O -->|throw| P["ProviderApiError<br/>proper status"]
File Changes1. server/src/routes/proxy.ts
|
Code Review by Qodo
Context used✅ Compliance rules (platform):
6 rules 1. Stall doesn't stop upstream
|
There was a problem hiding this comment.
Code Review
This pull request introduces several major features to improve routing stability and performance, including disabling sticky sessions in balanced mode, implementing a generalized thread protection scanner, adding Owl Alpha model-level routing, applying recency-biased Thompson sampling, adding SSE stream heartbeat and stall protection, and introducing transient model cooldowns and wrapped error interception. The frontend fallback page is also updated to group models by pool. Feedback on the changes highlights critical syntax errors in the SQL query for recency-biased sampling, a malformed SQL query and truncation issue in the router test file, and a minor double semicolon typo in the proxy route handler.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| COUNT(*) as raw_total, | ||
| SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total, | ||
| SUM(CASE WHEN status = 'success' | ||
| THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) | ||
| ELSE 0 END) as successes, |
There was a problem hiding this comment.
The SQL query contains syntax errors due to extra closing parentheses at the end of the total and successes weighted sum expressions. Specifically, 7.0)))) has an extra ) and 7.0))) in the THEN clause also has an extra ). This will cause SQLite syntax errors and crash the stats cache refresh.
| COUNT(*) as raw_total, | |
| SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total, | |
| SUM(CASE WHEN status = 'success' | |
| THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) | |
| ELSE 0 END) as successes, | |
| COUNT(*) as raw_total, | |
| SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total, | |
| SUM(CASE WHEN status = 'success' | |
| THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)) | |
| ELSE 0 END) as successes, |
| INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) | ||
| VALUES (?, | ||
| VALUES (?, ?, ?, ?, ?, ?, ?) | ||
| `).run('groq', 'test', encrypted, iv, authTag, 'healthy', 1); | ||
|
|
There was a problem hiding this comment.
The SQL query inside this test has duplicate/malformed VALUES clauses, which will cause a syntax error when running the tests.
db.prepare(`
INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
VALUES (?, ?, ?, ?, ?, ?, ?)
`).run('groq', 'test', encrypted, iv, authTag, 'healthy', 1);| it('should skip disabled keys', () => { | ||
| const db = getDb(); | ||
|
|
||
| const googleKey = encrypt('test-google-key'); | ||
| db.prepare(` | ||
| INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) | ||
| VALUES (?, ?, ?, ?, ?, ?, ?) | ||
| `).run('google', 'disabled', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 0); | ||
|
|
||
| const groqKey = encrypt('test-groq-key'); | ||
| db.prepare(` | ||
| INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) | ||
| VALUES (?, ?, ?, ?, ?, ?, ?) | ||
| `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1); | ||
|
|
||
| const result = routeRequest(); | ||
| expect(result.platform).toBe('groq'); | ||
| }); | ||
|
|
||
| it('should skip invalid keys', () => { | ||
| const db = getDb(); | ||
|
|
||
| const invalidKey = encrypt('invalid-key'); | ||
| db.prepare(` | ||
| INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) | ||
| VALUES (?, ?, ?, ?, ?, ?, ?) | ||
| `).run('google', 'invalid', invalidKey.encrypted, invalidKey.iv, invalidKey.authTag, 'invalid', 1); | ||
|
|
||
| const groqKey = encrypt('test-groq-key'); | ||
| db.prepare(` | ||
| INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) | ||
| VALUES (?, ?, ?, ?, ?, ?, ?) | ||
| `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1); | ||
|
|
||
| const result = routeRequest(); | ||
| expect(result.platform).toBe('groq'); | ||
| }); | ||
| }); | ||
| const groqKey = encrypt |
There was a problem hiding this comment.
The test file is truncated at the end, leaving the should skip disabled keys test incomplete and missing the should skip invalid keys test, which prevents the test suite from compiling or running.
it('should skip disabled keys', () => {
const db = getDb();
const googleKey = encrypt('test-google-key');
db.prepare(`
INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
VALUES (?, ?, ?, ?, ?, ?, ?)
`).run('google', 'disabled', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 0);
const groqKey = encrypt('test-groq-key');
db.prepare(`
INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
VALUES (?, ?, ?, ?, ?, ?, ?)
`).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1);
const result = routeRequest();
expect(result.platform).toBe('groq');
});
it('should skip invalid keys', () => {
const db = getDb();
const invalidKey = encrypt('invalid-key');
db.prepare(`
INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
VALUES (?, ?, ?, ?, ?, ?, ?)
`).run('google', 'invalid', invalidKey.encrypted, invalidKey.iv, invalidKey.authTag, 'invalid', 1);
const groqKey = encrypt('test-groq-key');
db.prepare(`
INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
VALUES (?, ?, ?, ?, ?, ?, ?)
`).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1);
const result = routeRequest();
expect(result.platform).toBe('groq');
});
});| }); | ||
| } else { | ||
| res.write(`data: ${JSON.stringify(payload)}\n\n`); | ||
| res.write('data: [DONE]\n\n');; |
There was a problem hiding this comment.
Hey - I've found 7 issues, and left some high level feedback:
- There are several temporary helper scripts and files (e.g.
write_test.py,fix_streaming.py,fix.py,do_fix.py,new_streaming_block.txt,server/write_tests.py) checked into the repo; these look like local tooling and should either be removed from the PR or moved under a proper tooling directory and wired into the build/test process if they are meant to be permanent. - The changes around
server/src/__tests__/services/router.test.tsand the relatedwrite_test.pyhelper appear to leave the test file in an incomplete/broken state (e.g. a truncatedINSERT INTO api_keysstatement and an unfinishedexpect(() => route...), which will prevent the test suite from running; please fix the test file content directly instead of relying on the generator script, or remove the script if it’s not meant to be committed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There are several temporary helper scripts and files (e.g. `write_test.py`, `fix_streaming.py`, `fix.py`, `do_fix.py`, `new_streaming_block.txt`, `server/write_tests.py`) checked into the repo; these look like local tooling and should either be removed from the PR or moved under a proper tooling directory and wired into the build/test process if they are meant to be permanent.
- The changes around `server/src/__tests__/services/router.test.ts` and the related `write_test.py` helper appear to leave the test file in an incomplete/broken state (e.g. a truncated `INSERT INTO api_keys` statement and an unfinished `expect(() => route...`), which will prevent the test suite from running; please fix the test file content directly instead of relying on the generator script, or remove the script if it’s not meant to be committed.
## Individual Comments
### Comment 1
<location path="server/src/services/router.ts" line_range="184-185" />
<code_context>
- COUNT(*) as total,
- SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successes,
+ COUNT(*) as raw_total,
+ SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,
+ SUM(CASE WHEN status = 'success'
+ THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))
+ ELSE 0 END) as successes,
</code_context>
<issue_to_address>
**issue (bug_risk):** Fix mismatched parentheses in recency-weighted stats SQL expression.
The `SUM(MAX(0, MIN(...))))` call has an extra closing parenthesis, which will cause a SQLite syntax error. It should be `SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total` (one closing paren each for `MIN`, `MAX`, and `SUM`). Please also re-check the parentheses in the `successes` expression to ensure they match your intended nesting.
</issue_to_address>
### Comment 2
<location path="server/src/routes/proxy.ts" line_range="19" />
<code_context>
const stickySessionMap = new Map<string, { modelDbId: number; keyId?: number; bannedPlatforms?: Set<string>; lastUsed: number }>();
const STICKY_TTL_MS = 30 * 60 * 1000; // 30 min session TTL
-const LONGCAT_STICKY_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window
+const THREAD_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window
+// Stream heartbeat & stall protection config — exported for test overrides
+export const streamKeepaliveConfig = {
</code_context>
<issue_to_address>
**nitpick:** Update the THREAD_COOLDOWN_MS comment to match the generalized implementation.
The `THREAD_COOLDOWN_MS` comment still calls out LongCat specifically, but the new logic applies a cooldown across platforms based on recent sticky usage. Please update the wording to describe a generic cross-session thread cooldown instead of LongCat-only behavior.
</issue_to_address>
### Comment 3
<location path="fix_streaming.py" line_range="2-11" />
<code_context>
+#!/usr/bin/env python3
+"""Replace the streaming block in proxy.ts with Promise.race-based stall detection."""
+
+with open('server/src/routes/proxy.ts', 'r') as f:
</code_context>
<issue_to_address>
**issue:** Remove or relocate helper scripts and partial patches that don’t belong in the runtime tree.
Files like `fix_streaming.py`, `fix.py`, `do_fix.py`, `server/write_test.py`, `server/write_tests.py`, and `new_streaming_block.txt` appear to be ad-hoc tooling or partial patches (and `do_fix.py` is syntactically invalid). Leaving them under `server/` can confuse readers and interfere with tooling that scans or executes everything there. Please either remove them or move them into a dedicated tooling/examples directory and ensure only complete, valid scripts remain in the runtime tree.
</issue_to_address>
### Comment 4
<location path="server/src/__tests__/services/router.test.ts" line_range="78-52" />
<code_context>
- expect(result.platform).toBe('groq');
- });
-
- it('should skip invalid keys', () => {
- const db = getDb();
-
- const invalidKey = encrypt('invalid-key');
- db.prepare(`
- INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
- VALUES (?, ?, ?, ?, ?, ?, ?)
- `).run('google', 'invalid', invalidKey.encrypted, invalidKey.iv, invalidKey.authTag, 'invalid', 1);
-
- const groqKey = encrypt('test-groq-key');
- db.prepare(`
- INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled)
- VALUES (?, ?, ?, ?, ?, ?, ?)
- `).run('groq', 'test', groqKey.encrypted, groqKey.iv, groqKey.authTag, 'healthy', 1);
-
- const result = routeRequest();
- expect(result.platform).toBe('groq');
- });
</code_context>
<issue_to_address>
**issue (testing):** Reintroduce or replace the removed "skip invalid keys" routing test
The previous `"should skip invalid keys"` test was removed, but we still need coverage that `status = 'invalid'` keys are excluded from routing. Please either:
- Reintroduce a test that inserts both an `invalid` and a `healthy` key and asserts the invalid key/platform is never selected, or
- Add a test that verifies `hasValidKeys` returns `false` when only invalid keys exist for a platform/model, causing `routeRequest` to fall back to another platform.
This ensures regressions in key-status filtering (e.g. mistakenly including `invalid` keys) are caught by tests.
</issue_to_address>
### Comment 5
<location path="server/src/__tests__/services/router.test.ts" line_range="12-15" />
<code_context>
app = createApp();
});
- beforeEach(() => {
+ beforeEach(async () => {
(stickySessionMap as Map<any, any>).clear();
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for new balanced/smart routing behavior (LongCat/Owl Alpha and recency bias)
The router now includes several untested behaviors (recency-weighted stats, `rawTotal`, `EXCLUDED_FROM_BALANCED` / `EXCLUDED_MODELS_FROM_BALANCED`, LongCat/Owl Alpha handling via `hasValidKeys`). To exercise this logic, add tests that:
- Verify that in `routingMode = 'balanced'`, LongCat and Owl Alpha are excluded when other eligible models exist, but remain selectable in `routingMode = 'smart'`.
- Seed Owl Alpha plus other smart-eligible models, and assert that with `routingMode = 'smart'` and `hasValidKeys(owl-alpha) === true`, `routeRequest` prefers Owl Alpha.
- Insert synthetic requests with varied `created_at` timestamps, refresh the stats cache, and assert that `getAnalyticsScores` exposes `total = rawTotal` while the internal `ModelStats.total` is recency-weighted.
This will ensure the new routing behavior is properly covered by tests.
</issue_to_address>
### Comment 6
<location path="server/src/__tests__/routes/transient-cooldown.test.ts" line_range="20-27" />
<code_context>
+ (stickySessionMap as Map<any, any>).clear();
+}
+
+describe('Transient model cooldown functionality', () => {
+ let app: Express;
+
+ beforeAll(() => {
+ process.env.ENCRYPTION_KEY = '0'.repeat(64);
+ initDb(':memory:');
+ app = createApp();
+ });
+
+ beforeEach(() => {
</code_context>
<issue_to_address>
**suggestion (testing):** Complement low-level transient cooldown tests with at least one proxy integration test
The current tests validate the cooldown map behavior in isolation, but there’s no end-to-end coverage that a real `/v1/chat/completions` request actually triggers and respects transient cooldowns. Please add an integration test (e.g., alongside `proxy-tools.test.ts`) that:
1. Mocks a provider to return a retryable/5xx error for a known `modelDbId`,
2. Asserts that `transientModelCooldowns` is populated for that ID, and
3. Sends a second request and verifies (via DB or logs) that this model appears in `skipModels` and is not chosen by the router.
This will confirm the cooldown logic is exercised in the real routing path, not just in unit-level helpers.
Suggested implementation:
```typescript
} from '../../routes/proxy.js';
import request from 'supertest';
function clearCooldownMap() {
(transientModelCooldowns as Map<any, any>).clear();
}
function clearStickyMap() {
(stickySessionMap as Map<any, any>).clear();
}
describe('Transient model cooldown functionality', () => {
let app: Express;
beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
initDb(':memory:');
app = createApp();
});
beforeEach(() => {
clearCooldownMap();
clearStickyMap();
const db = getDb();
db.prepare('DELETE FROM api_keys').run();
db.prepare('DELETE FROM requests').run();
});
afterEach(() => {
clearCooldownMap();
clearStickyMap();
});
it('applies transient cooldowns end-to-end for chat completions', async () => {
const db = getDb();
// Seed an API key and a transient model; align this with your actual schema.
const apiKey = 'test-api-key';
db.prepare(
`
INSERT INTO api_keys (api_key, label, is_active)
VALUES (?, 'test-key', 1)
`
).run(apiKey);
// This assumes you have some notion of provider + model tables; adjust to match your schema.
// The important part is obtaining a concrete modelDbId used by the router and cooldown map.
const providerStmt = db.prepare(
`
INSERT INTO providers (name, type, base_url, is_active)
VALUES (?, 'test', 'http://localhost:9999', 1)
`
);
const providerResult = providerStmt.run('test-provider');
const providerId = providerResult.lastInsertRowid as number;
const modelStmt = db.prepare(
`
INSERT INTO models (provider_id, name, is_active, is_transient)
VALUES (?, ?, 1, 1)
`
);
const modelResult = modelStmt.run(providerId, 'test-transient-model');
const modelDbId = modelResult.lastInsertRowid as number;
// Mock the underlying provider call so that the first attempt returns a retryable error.
// Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.).
const originalFetch = (global as any).fetch;
const fetchMock = jest.fn()
// First call for this model: retryable server error.
.mockResolvedValueOnce(
new Response(JSON.stringify({ error: 'temporary failure' }), {
status: 502,
headers: { 'Content-Type': 'application/json' },
}) as any,
)
// Second call (if the router mistakenly uses this model again) – would succeed.
.mockResolvedValue(
new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), {
status: 200,
headers: { 'Content-Type': 'application/json' },
}) as any,
);
(global as any).fetch = fetchMock;
try {
// First request: should hit the model, get a retryable error, and populate cooldowns.
const firstRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello' }],
});
expect(firstRes.status).toBeGreaterThanOrEqual(500);
expect(firstRes.status).toBeLessThan(600);
expect(transientModelCooldowns.has(modelDbId)).toBe(true);
// Capture logs or DB state around routing to assert skipModels includes this model.
const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {});
// Second request: router should now treat this model as cooled down and skip it.
const secondRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello again' }],
});
// Depending on the presence of fallback models, this may succeed with a different model
// or fail with "no available models". The key is that the cooled-down model is skipped.
expect(secondRes.status).not.toBe(502);
const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n');
// Adjust this assertion to match the actual log/message structure where skipModels is recorded.
expect(logsCombined).toContain('skipModels');
expect(logsCombined).toContain(String(modelDbId));
logSpy.mockRestore();
} finally {
(global as any).fetch = originalFetch;
}
```
`).
```xml
<file_operations>
<file_operation operation="edit" file_path="server/src/__tests__/routes/transient-cooldown.test.ts">
<<<<<<< SEARCH
} from '../../routes/proxy.js';
function clearCooldownMap() {
(transientModelCooldowns as Map<any, any>).clear();
}
function clearStickyMap() {
(stickySessionMap as Map<any, any>).clear();
}
describe('Transient model cooldown functionality', () => {
let app: Express;
beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
initDb(':memory:');
app = createApp();
});
beforeEach(() => {
clearCooldownMap();
clearStickyMap();
const db = getDb();
db.prepare('DELETE FROM api_keys').run();
db.prepare('DELETE FROM requests').run();
});
afterEach(() => {
clearCooldownMap();
clearStickyMap();
=======
} from '../../routes/proxy.js';
import request from 'supertest';
function clearCooldownMap() {
(transientModelCooldowns as Map<any, any>).clear();
}
function clearStickyMap() {
(stickySessionMap as Map<any, any>).clear();
}
describe('Transient model cooldown functionality', () => {
let app: Express;
beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
initDb(':memory:');
app = createApp();
});
beforeEach(() => {
clearCooldownMap();
clearStickyMap();
const db = getDb();
db.prepare('DELETE FROM api_keys').run();
db.prepare('DELETE FROM requests').run();
});
afterEach(() => {
clearCooldownMap();
clearStickyMap();
});
it('applies transient cooldowns end-to-end for chat completions', async () => {
const db = getDb();
// Seed an API key and a transient model; align this with your actual schema.
const apiKey = 'test-api-key';
db.prepare(
`
INSERT INTO api_keys (api_key, label, is_active)
VALUES (?, 'test-key', 1)
`
).run(apiKey);
// This assumes you have some notion of provider + model tables; adjust to match your schema.
// The important part is obtaining a concrete modelDbId used by the router and cooldown map.
const providerStmt = db.prepare(
`
INSERT INTO providers (name, type, base_url, is_active)
VALUES (?, 'test', 'http://localhost:9999', 1)
`
);
const providerResult = providerStmt.run('test-provider');
const providerId = providerResult.lastInsertRowid as number;
const modelStmt = db.prepare(
`
INSERT INTO models (provider_id, name, is_active, is_transient)
VALUES (?, ?, 1, 1)
`
);
const modelResult = modelStmt.run(providerId, 'test-transient-model');
const modelDbId = modelResult.lastInsertRowid as number;
// Mock the underlying provider call so that the first attempt returns a retryable error.
// Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.).
const originalFetch = (global as any).fetch;
const fetchMock = jest.fn()
// First call for this model: retryable server error.
.mockResolvedValueOnce(
new Response(JSON.stringify({ error: 'temporary failure' }), {
status: 502,
headers: { 'Content-Type': 'application/json' },
}) as any,
)
// Second call (if the router mistakenly uses this model again) – would succeed.
.mockResolvedValue(
new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), {
status: 200,
headers: { 'Content-Type': 'application/json' },
}) as any,
);
(global as any).fetch = fetchMock;
try {
// First request: should hit the model, get a retryable error, and populate cooldowns.
const firstRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello' }],
});
expect(firstRes.status).toBeGreaterThanOrEqual(500);
expect(firstRes.status).toBeLessThan(600);
expect(transientModelCooldowns.has(modelDbId)).toBe(true);
// Capture logs or DB state around routing to assert skipModels includes this model.
const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {});
// Second request: router should now treat this model as cooled down and skip it.
const secondRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello again' }],
});
// Depending on the presence of fallback models, this may succeed with a different model
// or fail with "no available models". The key is that the cooled-down model is skipped.
expect(secondRes.status).not.toBe(502);
const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n');
// Adjust this assertion to match the actual log/message structure where skipModels is recorded.
expect(logsCombined).toContain('skipModels');
expect(logsCombined).toContain(String(modelDbId));
logSpy.mockRestore();
} finally {
(global as any).fetch = originalFetch;
}
>>>>>>> REPLACE
</file_operation>
</file_operations>
<additional_changes>
1. **Align DB seeding with your schema/helpers**:
- Replace the `INSERT INTO providers` and `INSERT INTO models` statements with the actual table/column names used by your router.
- If you already have helpers (e.g., in `proxy-tools.test.ts`) to create providers/models and return their DB IDs, use those instead of raw SQL here and remove the explicit inserts in this test.
2. **Use the same HTTP client mocking strategy as the rest of your tests**:
- If the proxy uses `axios`, a custom HTTP client, or an SDK instead of `fetch`, adjust the mocking accordingly (e.g., `jest.spyOn(httpClient, 'post')`, `nock`, etc.).
- Mirror the exact way a provider failure is simulated in your existing proxy tests so that the router records a retryable error and triggers cooldown logic.
3. **Match how `skipModels` is surfaced (logs or DB)**:
- If `skipModels` is persisted in the `requests` table, assert against the last row instead of logs, for example:
- `const lastRequest = db.prepare('SELECT * FROM requests ORDER BY id DESC LIMIT 1').get();`
- Inspect a `skip_models` or similar column to confirm it contains `modelDbId`.
- If you use a structured logger instead of `console.debug`, spy on that logger instance and adjust the expectation from `logsCombined` to the appropriate logged fields.
4. **Ensure `modelDbId` matches the key used by `transientModelCooldowns`**:
- If cooldowns are keyed by something other than the models table PK (e.g., a join table ID or a provider-model mapping ID), update the insert/select logic in the test to obtain the exact ID used by the router and cooldown map.
</issue_to_address>
### Comment 7
<location path="server/src/__tests__/routes/stream-heartbeat-stall.test.ts" line_range="70-79" />
<code_context>
+ it('emits SSE keep-alive comments during idle periods', async () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Make heartbeat/stall tests deterministic using fake timers to avoid flakiness
These tests depend on real `setTimeout` delays and the global `streamKeepaliveConfig`, which can be flaky under slow CI or load. Within this suite, consider `vi.useFakeTimers()` and driving behavior via `vi.advanceTimersByTime(...)` instead of real-time waits, so the heartbeat/stall scenarios run quickly and deterministically.
Suggested implementation:
```typescript
afterEach(() => {
// Restore original config values
streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS = origKeepaliveInterval;
streamKeepaliveConfig.MAX_STREAM_STALL_MS = origMaxStall;
// Ensure we always reset timers and mocks between tests so heartbeat/stall tests
// don't depend on real time and stay deterministic.
vi.useRealTimers();
vi.restoreAllMocks();
});
it('emits SSE keep-alive comments during idle periods', async () => {
// Use fake timers so we can deterministically drive heartbeat/idle behavior
// via vi.advanceTimersByTime(...) instead of relying on real setTimeout delays.
vi.useFakeTimers();
const origFetch = global.fetch;
const encoder = new TextEncoder();
```
To fully implement the suggestion and make the test deterministic, you should also:
1. Ensure the mocked provider implementation in this test (and related heartbeat/stall tests) uses `setTimeout`-based delays rather than `await new Promise(...)` with real-time sleeps.
2. Replace any `await new Promise(resolve => setTimeout(resolve, X))` or similar real-time waits in this test suite with `vi.advanceTimersByTime(X)` (or `vi.runAllTimers()`), so the test progresses via virtual time instead of wall-clock time.
3. If there are other heartbeat/stall tests in this file, consider applying the same `vi.useFakeTimers()` pattern to them (either at the individual `it(...)` level or via a `beforeEach` in the surrounding `describe`), and drive their timing assertions using `vi.advanceTimersByTime(...)`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total, | ||
| SUM(CASE WHEN status = 'success' |
There was a problem hiding this comment.
issue (bug_risk): Fix mismatched parentheses in recency-weighted stats SQL expression.
The SUM(MAX(0, MIN(...)))) call has an extra closing parenthesis, which will cause a SQLite syntax error. It should be SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total (one closing paren each for MIN, MAX, and SUM). Please also re-check the parentheses in the successes expression to ensure they match your intended nesting.
| const stickySessionMap = new Map<string, { modelDbId: number; keyId?: number; bannedPlatforms?: Set<string>; lastUsed: number }>(); | ||
| const STICKY_TTL_MS = 30 * 60 * 1000; // 30 min session TTL | ||
| const LONGCAT_STICKY_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window | ||
| const THREAD_COOLDOWN_MS = 3 * 60 * 1000; // 3 min — bypass sticky preference for LongCat if session was used within this window |
There was a problem hiding this comment.
nitpick: Update the THREAD_COOLDOWN_MS comment to match the generalized implementation.
The THREAD_COOLDOWN_MS comment still calls out LongCat specifically, but the new logic applies a cooldown across platforms based on recent sticky usage. Please update the wording to describe a generic cross-session thread cooldown instead of LongCat-only behavior.
| """Replace the streaming block in proxy.ts with Promise.race-based stall detection.""" | ||
|
|
||
| # Read the before and after parts | ||
| with open('/tmp/before.ts', 'r') as f: | ||
| before = f.read() | ||
|
|
||
| with open('/tmp/after.ts', 'r') as f: | ||
| after = f.read() # starts with "} else {" | ||
|
|
||
| # The new streaming block |
There was a problem hiding this comment.
issue: Remove or relocate helper scripts and partial patches that don’t belong in the runtime tree.
Files like fix_streaming.py, fix.py, do_fix.py, server/write_test.py, server/write_tests.py, and new_streaming_block.txt appear to be ad-hoc tooling or partial patches (and do_fix.py is syntactically invalid). Leaving them under server/ can confuse readers and interfere with tooling that scans or executes everything there. Please either remove them or move them into a dedicated tooling/examples directory and ensure only complete, valid scripts remain in the runtime tree.
| const groqKey = encrypt('test-groq-key'); | ||
| db.prepare(` | ||
| INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) | ||
| VALUES (?, ?, ?, ?, ?, ?, ?) |
There was a problem hiding this comment.
issue (testing): Reintroduce or replace the removed "skip invalid keys" routing test
The previous "should skip invalid keys" test was removed, but we still need coverage that status = 'invalid' keys are excluded from routing. Please either:
- Reintroduce a test that inserts both an
invalidand ahealthykey and asserts the invalid key/platform is never selected, or - Add a test that verifies
hasValidKeysreturnsfalsewhen only invalid keys exist for a platform/model, causingrouteRequestto fall back to another platform.
This ensures regressions in key-status filtering (e.g. mistakenly including invalid keys) are caught by tests.
| beforeEach(() => { | ||
| const db = getDb(); | ||
| db.prepare('DELETE FROM api_keys').run(); | ||
| // Reset fallback order to intelligence ranking |
There was a problem hiding this comment.
suggestion (testing): Add tests for new balanced/smart routing behavior (LongCat/Owl Alpha and recency bias)
The router now includes several untested behaviors (recency-weighted stats, rawTotal, EXCLUDED_FROM_BALANCED / EXCLUDED_MODELS_FROM_BALANCED, LongCat/Owl Alpha handling via hasValidKeys). To exercise this logic, add tests that:
- Verify that in
routingMode = 'balanced', LongCat and Owl Alpha are excluded when other eligible models exist, but remain selectable inroutingMode = 'smart'. - Seed Owl Alpha plus other smart-eligible models, and assert that with
routingMode = 'smart'andhasValidKeys(owl-alpha) === true,routeRequestprefers Owl Alpha. - Insert synthetic requests with varied
created_attimestamps, refresh the stats cache, and assert thatgetAnalyticsScoresexposestotal = rawTotalwhile the internalModelStats.totalis recency-weighted.
This will ensure the new routing behavior is properly covered by tests.
| describe('Transient model cooldown functionality', () => { | ||
| let app: Express; | ||
|
|
||
| beforeAll(() => { | ||
| process.env.ENCRYPTION_KEY = '0'.repeat(64); | ||
| initDb(':memory:'); | ||
| app = createApp(); | ||
| }); |
There was a problem hiding this comment.
suggestion (testing): Complement low-level transient cooldown tests with at least one proxy integration test
The current tests validate the cooldown map behavior in isolation, but there’s no end-to-end coverage that a real /v1/chat/completions request actually triggers and respects transient cooldowns. Please add an integration test (e.g., alongside proxy-tools.test.ts) that:
- Mocks a provider to return a retryable/5xx error for a known
modelDbId, - Asserts that
transientModelCooldownsis populated for that ID, and - Sends a second request and verifies (via DB or logs) that this model appears in
skipModelsand is not chosen by the router.
This will confirm the cooldown logic is exercised in the real routing path, not just in unit-level helpers.
Suggested implementation:
} from '../../routes/proxy.js';
import request from 'supertest';
function clearCooldownMap() {
(transientModelCooldowns as Map<any, any>).clear();
}
function clearStickyMap() {
(stickySessionMap as Map<any, any>).clear();
}
describe('Transient model cooldown functionality', () => {
let app: Express;
beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
initDb(':memory:');
app = createApp();
});
beforeEach(() => {
clearCooldownMap();
clearStickyMap();
const db = getDb();
db.prepare('DELETE FROM api_keys').run();
db.prepare('DELETE FROM requests').run();
});
afterEach(() => {
clearCooldownMap();
clearStickyMap();
});
it('applies transient cooldowns end-to-end for chat completions', async () => {
const db = getDb();
// Seed an API key and a transient model; align this with your actual schema.
const apiKey = 'test-api-key';
db.prepare(
`
INSERT INTO api_keys (api_key, label, is_active)
VALUES (?, 'test-key', 1)
`
).run(apiKey);
// This assumes you have some notion of provider + model tables; adjust to match your schema.
// The important part is obtaining a concrete modelDbId used by the router and cooldown map.
const providerStmt = db.prepare(
`
INSERT INTO providers (name, type, base_url, is_active)
VALUES (?, 'test', 'http://localhost:9999', 1)
`
);
const providerResult = providerStmt.run('test-provider');
const providerId = providerResult.lastInsertRowid as number;
const modelStmt = db.prepare(
`
INSERT INTO models (provider_id, name, is_active, is_transient)
VALUES (?, ?, 1, 1)
`
);
const modelResult = modelStmt.run(providerId, 'test-transient-model');
const modelDbId = modelResult.lastInsertRowid as number;
// Mock the underlying provider call so that the first attempt returns a retryable error.
// Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.).
const originalFetch = (global as any).fetch;
const fetchMock = jest.fn()
// First call for this model: retryable server error.
.mockResolvedValueOnce(
new Response(JSON.stringify({ error: 'temporary failure' }), {
status: 502,
headers: { 'Content-Type': 'application/json' },
}) as any,
)
// Second call (if the router mistakenly uses this model again) – would succeed.
.mockResolvedValue(
new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), {
status: 200,
headers: { 'Content-Type': 'application/json' },
}) as any,
);
(global as any).fetch = fetchMock;
try {
// First request: should hit the model, get a retryable error, and populate cooldowns.
const firstRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello' }],
});
expect(firstRes.status).toBeGreaterThanOrEqual(500);
expect(firstRes.status).toBeLessThan(600);
expect(transientModelCooldowns.has(modelDbId)).toBe(true);
// Capture logs or DB state around routing to assert skipModels includes this model.
const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {});
// Second request: router should now treat this model as cooled down and skip it.
const secondRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello again' }],
});
// Depending on the presence of fallback models, this may succeed with a different model
// or fail with "no available models". The key is that the cooled-down model is skipped.
expect(secondRes.status).not.toBe(502);
const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n');
// Adjust this assertion to match the actual log/message structure where skipModels is recorded.
expect(logsCombined).toContain('skipModels');
expect(logsCombined).toContain(String(modelDbId));
logSpy.mockRestore();
} finally {
(global as any).fetch = originalFetch;
}`).
<file_operations>
<file_operation operation="edit" file_path="server/src/__tests__/routes/transient-cooldown.test.ts">
<<<<<<< SEARCH
} from '../../routes/proxy.js';
function clearCooldownMap() {
(transientModelCooldowns as Map<any, any>).clear();
}
function clearStickyMap() {
(stickySessionMap as Map<any, any>).clear();
}
describe('Transient model cooldown functionality', () => {
let app: Express;
beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
initDb(':memory:');
app = createApp();
});
beforeEach(() => {
clearCooldownMap();
clearStickyMap();
const db = getDb();
db.prepare('DELETE FROM api_keys').run();
db.prepare('DELETE FROM requests').run();
});
afterEach(() => {
clearCooldownMap();
clearStickyMap();
=======
} from '../../routes/proxy.js';
import request from 'supertest';
function clearCooldownMap() {
(transientModelCooldowns as Map<any, any>).clear();
}
function clearStickyMap() {
(stickySessionMap as Map<any, any>).clear();
}
describe('Transient model cooldown functionality', () => {
let app: Express;
beforeAll(() => {
process.env.ENCRYPTION_KEY = '0'.repeat(64);
initDb(':memory:');
app = createApp();
});
beforeEach(() => {
clearCooldownMap();
clearStickyMap();
const db = getDb();
db.prepare('DELETE FROM api_keys').run();
db.prepare('DELETE FROM requests').run();
});
afterEach(() => {
clearCooldownMap();
clearStickyMap();
});
it('applies transient cooldowns end-to-end for chat completions', async () => {
const db = getDb();
// Seed an API key and a transient model; align this with your actual schema.
const apiKey = 'test-api-key';
db.prepare(
`
INSERT INTO api_keys (api_key, label, is_active)
VALUES (?, 'test-key', 1)
`
).run(apiKey);
// This assumes you have some notion of provider + model tables; adjust to match your schema.
// The important part is obtaining a concrete modelDbId used by the router and cooldown map.
const providerStmt = db.prepare(
`
INSERT INTO providers (name, type, base_url, is_active)
VALUES (?, 'test', 'http://localhost:9999', 1)
`
);
const providerResult = providerStmt.run('test-provider');
const providerId = providerResult.lastInsertRowid as number;
const modelStmt = db.prepare(
`
INSERT INTO models (provider_id, name, is_active, is_transient)
VALUES (?, ?, 1, 1)
`
);
const modelResult = modelStmt.run(providerId, 'test-transient-model');
const modelDbId = modelResult.lastInsertRowid as number;
// Mock the underlying provider call so that the first attempt returns a retryable error.
// Replace this with the same mocking strategy used in proxy-tools.test.ts (e.g., nock, jest.spyOn, etc.).
const originalFetch = (global as any).fetch;
const fetchMock = jest.fn()
// First call for this model: retryable server error.
.mockResolvedValueOnce(
new Response(JSON.stringify({ error: 'temporary failure' }), {
status: 502,
headers: { 'Content-Type': 'application/json' },
}) as any,
)
// Second call (if the router mistakenly uses this model again) – would succeed.
.mockResolvedValue(
new Response(JSON.stringify({ choices: [{ message: { content: 'ok' } }] }), {
status: 200,
headers: { 'Content-Type': 'application/json' },
}) as any,
);
(global as any).fetch = fetchMock;
try {
// First request: should hit the model, get a retryable error, and populate cooldowns.
const firstRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello' }],
});
expect(firstRes.status).toBeGreaterThanOrEqual(500);
expect(firstRes.status).toBeLessThan(600);
expect(transientModelCooldowns.has(modelDbId)).toBe(true);
// Capture logs or DB state around routing to assert skipModels includes this model.
const logSpy = jest.spyOn(console, 'debug').mockImplementation(() => {});
// Second request: router should now treat this model as cooled down and skip it.
const secondRes = await request(app)
.post('/v1/chat/completions')
.set('Authorization', `Bearer ${apiKey}`)
.send({
model: 'test-transient-model',
messages: [{ role: 'user', content: 'hello again' }],
});
// Depending on the presence of fallback models, this may succeed with a different model
// or fail with "no available models". The key is that the cooled-down model is skipped.
expect(secondRes.status).not.toBe(502);
const logsCombined = logSpy.mock.calls.map(args => args.join(' ')).join('\n');
// Adjust this assertion to match the actual log/message structure where skipModels is recorded.
expect(logsCombined).toContain('skipModels');
expect(logsCombined).toContain(String(modelDbId));
logSpy.mockRestore();
} finally {
(global as any).fetch = originalFetch;
}
>>>>>>> REPLACE
</file_operation>
</file_operations>
<additional_changes>
1. **Align DB seeding with your schema/helpers**:
- Replace the `INSERT INTO providers` and `INSERT INTO models` statements with the actual table/column names used by your router.
- If you already have helpers (e.g., in `proxy-tools.test.ts`) to create providers/models and return their DB IDs, use those instead of raw SQL here and remove the explicit inserts in this test.
2. **Use the same HTTP client mocking strategy as the rest of your tests**:
- If the proxy uses `axios`, a custom HTTP client, or an SDK instead of `fetch`, adjust the mocking accordingly (e.g., `jest.spyOn(httpClient, 'post')`, `nock`, etc.).
- Mirror the exact way a provider failure is simulated in your existing proxy tests so that the router records a retryable error and triggers cooldown logic.
3. **Match how `skipModels` is surfaced (logs or DB)**:
- If `skipModels` is persisted in the `requests` table, assert against the last row instead of logs, for example:
- `const lastRequest = db.prepare('SELECT * FROM requests ORDER BY id DESC LIMIT 1').get();`
- Inspect a `skip_models` or similar column to confirm it contains `modelDbId`.
- If you use a structured logger instead of `console.debug`, spy on that logger instance and adjust the expectation from `logsCombined` to the appropriate logged fields.
4. **Ensure `modelDbId` matches the key used by `transientModelCooldowns`**:
- If cooldowns are keyed by something other than the models table PK (e.g., a join table ID or a provider-model mapping ID), update the insert/select logic in the test to obtain the exact ID used by the router and cooldown map.| it('emits SSE keep-alive comments during idle periods', async () => { | ||
| const origFetch = global.fetch; | ||
| const encoder = new TextEncoder(); | ||
|
|
||
| // Mock provider that delays first chunk by 300ms (longer than KEEPALIVE_INTERVAL_MS=100) | ||
| vi.spyOn(global, 'fetch').mockImplementation(async (url, init) => { | ||
| const urlStr = typeof url === 'string' ? url : url.toString(); | ||
| if (urlStr.startsWith('http://127.0.0.1') || urlStr.startsWith('http://localhost')) { | ||
| return origFetch(url, init); | ||
| } |
There was a problem hiding this comment.
suggestion (testing): Make heartbeat/stall tests deterministic using fake timers to avoid flakiness
These tests depend on real setTimeout delays and the global streamKeepaliveConfig, which can be flaky under slow CI or load. Within this suite, consider vi.useFakeTimers() and driving behavior via vi.advanceTimersByTime(...) instead of real-time waits, so the heartbeat/stall scenarios run quickly and deterministically.
Suggested implementation:
afterEach(() => {
// Restore original config values
streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS = origKeepaliveInterval;
streamKeepaliveConfig.MAX_STREAM_STALL_MS = origMaxStall;
// Ensure we always reset timers and mocks between tests so heartbeat/stall tests
// don't depend on real time and stay deterministic.
vi.useRealTimers();
vi.restoreAllMocks();
});
it('emits SSE keep-alive comments during idle periods', async () => {
// Use fake timers so we can deterministically drive heartbeat/idle behavior
// via vi.advanceTimersByTime(...) instead of relying on real setTimeout delays.
vi.useFakeTimers();
const origFetch = global.fetch;
const encoder = new TextEncoder();To fully implement the suggestion and make the test deterministic, you should also:
- Ensure the mocked provider implementation in this test (and related heartbeat/stall tests) uses
setTimeout-based delays rather thanawait new Promise(...)with real-time sleeps. - Replace any
await new Promise(resolve => setTimeout(resolve, X))or similar real-time waits in this test suite withvi.advanceTimersByTime(X)(orvi.runAllTimers()), so the test progresses via virtual time instead of wall-clock time. - If there are other heartbeat/stall tests in this file, consider applying the same
vi.useFakeTimers()pattern to them (either at the individualit(...)level or via abeforeEachin the surroundingdescribe), and drive their timing assertions usingvi.advanceTimersByTime(...).
| // Set up heartbeat and stall monitor | ||
| heartbeatInterval = setInterval(() => { | ||
| const now = Date.now(); | ||
|
|
||
| if (now - lastChunkTimestamp > streamKeepaliveConfig.MAX_STREAM_STALL_MS) { | ||
| // Stall detected — terminate the stream | ||
| console.warn(`[Proxy] Stream stalled for ${now - lastChunkTimestamp}ms — aborting socket`); | ||
| streamAborted = true; | ||
| cleanupStream(); | ||
|
|
||
| if (streamStarted) { | ||
| // Mid-stream stall — write error frame and close | ||
| const payload = { error: { message: 'Upstream stream stalled', type: 'stream_timeout' } }; | ||
| try { | ||
| if (responseStreamContext) { | ||
| writeResponseStreamEvent(res, { | ||
| type: 'response.failed', | ||
| response: { | ||
| id: responseStreamContext.responseId, | ||
| status: 'failed', | ||
| error: payload.error, | ||
| }, | ||
| }); | ||
| } else { | ||
| res.write(`data: ${JSON.stringify(payload)}\n\n`); | ||
| res.write('data: [DONE]\n\n');; | ||
| } | ||
| res.end(); | ||
| } catch { /* Socket already gone */ } | ||
| } else { | ||
| // Pre-stream stall — no headers sent yet, response is still retryable | ||
| // Send a 504 so the client gets a proper error signal | ||
| try { | ||
| res.status(504).json({ | ||
| error: { message: `Upstream provider stalled before yielding any data from ${route.displayName}`, type: 'stream_timeout' }, | ||
| }); | ||
| } catch { /* Socket already gone */ } | ||
| } | ||
| } else if (streamStarted) { | ||
| // Write an SSE comment to keep the socket alive across intermediate proxies | ||
| // Only write after SSE headers have been sent (streamStarted === true) | ||
| try { | ||
| res.write(': keep-alive\n\n'); | ||
| } catch { | ||
| // Client disconnected — clean up | ||
| cleanupStream(); | ||
| } | ||
| } | ||
| }, streamKeepaliveConfig.KEEPALIVE_INTERVAL_MS); | ||
|
|
||
| // Attach client-disconnect listener | ||
| req.on('close', () => { | ||
| cleanupStream(); | ||
| }); | ||
|
|
||
| try { | ||
| const gen = route.provider.streamChatCompletion( | ||
| route.apiKey, normalizedMessages, route.modelId, | ||
| { temperature, max_tokens, top_p, tools, tool_choice, parallel_tool_calls }, | ||
| ); | ||
|
|
||
| for await (const chunk of gen) { | ||
| // Update chunk timestamp to reset stall timer | ||
| lastChunkTimestamp = Date.now(); | ||
|
|
||
| // If stall handler already terminated the stream, break out | ||
| if (streamAborted) break; |
There was a problem hiding this comment.
1. Stall doesn't stop upstream 🐞 Bug ☼ Reliability
In the streaming path of handleChatCompletion, the stall timer ends the HTTP response but does not abort/cancel the provider stream async iterator, so the for await can remain blocked indefinitely if the upstream stops yielding chunks. This can leak request handlers and reduce server capacity during upstream stalls.
Agent Prompt
## Issue description
The stream stall handler sets `streamAborted` and closes the client response, but it does not interrupt the blocked `for await` loop waiting on the upstream async iterator. If the upstream never yields again, the handler remains stuck and never returns.
## Issue Context
This occurs in the streaming branch where a `setInterval` detects stalls and closes the response. The `streamAborted` check is only evaluated inside the loop body, which is unreachable when the iterator is stalled.
## Fix Focus Areas
- server/src/routes/proxy.ts[1351-1418]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Exhaustion protection: if cooldown would block ALL available models, | ||
| // clear the set and let the request through rather than failing outright. | ||
| if (activeCooldownModels.size > 0) { | ||
| const db = getDb(); | ||
| const allEnabled = db.prepare('SELECT id FROM models WHERE enabled = 1').all() as Array<{ id: number }>; | ||
| const allEnabledIds = new Set(allEnabled.map(m => m.id)); | ||
|
|
||
| // Remove the current preferred model from cooldown consideration | ||
| if (preferredModel !== undefined) { | ||
| activeCooldownModels.delete(preferredModel); | ||
| } | ||
|
|
||
| // Check if cooldown would exhaust all models | ||
| let wouldExhaustAll = true; | ||
| for (const id of allEnabledIds) { | ||
| if (!activeCooldownModels.has(id)) { | ||
| wouldExhaustAll = false; | ||
| break; | ||
| } |
There was a problem hiding this comment.
2. Cooldown guard checks wrong set 🐞 Bug ≡ Correctness
The thread-cooldown exhaustion protection compares cooldown exclusions against `models WHERE enabled=1 rather than the actual routable fallback chain used by routeRequest`. If enabled models exist outside fallback_config, the guard may fail to trigger even when cooldown exclusions exhaust all routable options, leading to avoidable routing failures.
Agent Prompt
## Issue description
The cooldown “exhaustion protection” determines whether to clear cooldown exclusions by querying all enabled model IDs from `models`. Routing, however, only considers models in `fallback_config` (and additional mode-based filtering), so the exhaustion check can be miscalibrated.
## Issue Context
If `models.enabled=1` includes entries not present in `fallback_config` (or otherwise unroutable), `wouldExhaustAll` can be computed as false while the real routing chain is fully excluded, and `routeRequest()` will still throw.
## Fix Focus Areas
- server/src/routes/proxy.ts[1258-1287]
- server/src/services/router.ts[506-524]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| #!/usr/bin/env python3 | ||
| """Replace the streaming block in proxy.ts with Promise.race-based stall detection.""" | ||
|
|
||
| with open('server/src/routes/proxy.ts', 'r') as f: | ||
| content = f.read() | ||
|
|
||
| { No newline at end of file |
There was a problem hiding this comment.
3. Stray debug scripts committed 🐞 Bug ⚙ Maintainability
The PR adds multiple incomplete local fix/debug artifacts (e.g., do_fix.py, fix.py, fix_streaming.py, new_streaming_block.txt) including hardcoded local paths and unterminated code blocks. These files add maintenance/security risk and should not ship in the repository.
Agent Prompt
## Issue description
Several newly added files appear to be local, partially-written helper scripts / scratch buffers (broken syntax, hardcoded absolute paths, unfinished blocks). They should be removed from the PR (and optionally added to `.gitignore` if they’re generated during development).
## Issue Context
Examples include root-level Python scripts and a stray text file containing partial TypeScript snippets.
## Fix Focus Areas
- do_fix.py[1-7]
- fix.py[1-8]
- fix_streaming.py[1-21]
- new_streaming_block.txt[1-28]
- server/write_test.py[1-29]
- server/write_tests.py[1-45]
- fix{[1-1]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Actionable comments posted: 19
🧹 Nitpick comments (18)
.roo/specs/disable-sticky-on-auto/design.md (1)
11-17: 💤 Low valueAdd language specifier to code block.
The fenced code block should specify a language identifier for proper syntax highlighting.
📝 Proposed fix
-``` +```typescript function stickyOp(messages, routingMode, ...) { const key = getSessionKey(messages, routingMode); if (!key) return <no-op value>; // undefined, false, or early return ...operate on stickySessionMap using key... }</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In @.roo/specs/disable-sticky-on-auto/design.md around lines 11 - 17, The fenced
code block containing the stickyOp example lacks a language identifier; update
the block fence to include a language (e.g., add "typescript" after the opening...)" and referencing getSessionKey and stickySessionMap is written as a language-specified fenced block (for example ```typescript) to enable proper syntax highlighting..roo/specs/generalized-thread-protection/design.md (1)
43-46: 💤 Low valueAdd language specifier to code block.
The fenced code block should specify a language identifier for proper syntax highlighting (e.g.,
bashorshell).📝 Proposed fix
-``` +```bash THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip"</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In @.roo/specs/generalized-thread-protection/design.md around lines 43 - 46, The
fenced code block containing
THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" needs a
language identifier for proper syntax highlighting; update the triple-backtick
fence to include a language such as bash or shell (e.g., changetobash)
around the existing THREAD_PROTECTION_PLATFORMS line so the block is rendered
with shell/bash highlighting.</details> </blockquote></details> <details> <summary>.roo/specs/owl-alpha-longcat-model-routing/design.md (1)</summary><blockquote> `5-46`: _💤 Low value_ **Add language specifiers to code blocks.** Multiple fenced code blocks are missing language identifiers, which impacts syntax highlighting and readability. The blocks at lines 5-46, 52-78, 82-93, and 96-105 should specify their content type (e.g., `text` for ASCII diagrams or `typescript` for code pseudocode). <details> <summary>📝 Suggested languages for each block</summary> - Lines 5-46: Use `text` for the ASCII architecture diagram - Lines 52-78: Use `text` for the smart preference flow diagram - Lines 82-93: Use `text` for the sticky cooldown flow diagram - Lines 96-105: Use `text` for the error handling flow diagram Example fix for the first block: ```diff -``` +```text ┌─────────────────────────────────────────────────────────────┐ │ Client Request │ ``` </details> Also applies to: 52-78, 82-93, 96-105 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In @.roo/specs/owl-alpha-longcat-model-routing/design.md around lines 5 - 46,
The fenced code blocks in the design document lack language specifiers; update
each ASCII diagram/code block to include appropriate languages (e.g., use
text for ASCII diagrams andtypescript for pseudocode) so syntax
highlighting and readability improve—specifically addtext to the top architecture diagram (the Client Request / Balanced Router / Smart Router block referenced alongside routeRequest() in router.ts), addtext to the smart
preference and sticky cooldown flow diagrams, and addtext ortypescript
for the error handling flow/pseudocode near handleChatCompletion() in proxy.ts;
ensure each existing triple-backtick fence is changed to include the chosen
language.</details> </blockquote></details> <details> <summary>.roo/specs/sse-stream-heartbeat-stall-protection/design.md (1)</summary><blockquote> `112-121`: _💤 Low value_ **Heartbeat write failure cleanup may cause race condition.** Lines 115-119 call `cleanupStream()` inside the `catch` block when the heartbeat write fails. However, this runs inside the `setInterval` callback. If the client disconnect also fires `req.on('close')` simultaneously, both paths will call `cleanupStream()`, which is idempotent. This is safe, but consider adding a note that the idempotent design explicitly handles this race. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 112 - 121, The heartbeat write failure catch inside the setInterval that calls cleanupStream() can race with the req.on('close') handler; make this explicit by documenting that cleanupStream() is idempotent and designed to be safe when invoked concurrently from both the setInterval catch and the request 'close' event. Add a short comment near streamStarted/res.write and the setInterval callback referencing req.on('close') and stating cleanupStream() is idempotent and safe to call multiple times to prevent confusion for future readers. ``` </details> </blockquote></details> <details> <summary>.roo/specs/recency-biased-thompson-sampling/design.md (1)</summary><blockquote> `82-82`: _💤 Low value_ **Clarify whether speed/TTFB weighting is deferred by design.** The table states that `tok_per_sec` and `avg_ttfb_ms` remain unweighted, with the rationale that "Speed/TTFB are quality metrics that don't typically change suddenly." However, speed degradation can be a leading indicator of provider issues. Consider documenting whether this is a deliberate simplification for the initial implementation or a permanent design choice. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/recency-biased-thompson-sampling/design.md at line 82, Clarify that leaving tok_per_sec and avg_ttfb_ms unweighted is a deliberate simplification for the initial implementation: update the design text in recency-biased-thompson-sampling to state that tok_per_sec and avg_ttfb_ms are intentionally kept unweighted for rollout simplicity, include a brief rationale (they're relatively stable but can be early warning signals), and add a short "Future work" note that weighting or anomaly-triggered adjustments for tok_per_sec and avg_ttfb_ms may be introduced later with monitoring/thresholds. ``` </details> </blockquote></details> <details> <summary>.roo/specs/transient-model-cooldown/design.md (1)</summary><blockquote> `33-36`: _💤 Low value_ **Consider adding a maximum Map size limit.** While the lazy pruning strategy is efficient, the design mentions "Worst case: N entries where N = number of models." In a system with hundreds of models experiencing simultaneous transient failures, this could grow unexpectedly. Consider adding a defensive maximum size check (e.g., prune the oldest entries if size exceeds 100) or document the acceptable upper bound based on expected model count. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/transient-model-cooldown/design.md around lines 33 - 36, The transientModelCooldowns Map can grow unbounded; add a defensive max-size limit (e.g., MAX_TRANSIENT_COOLDOWNS = 100) and enforce it wherever entries are added (reference transientModelCooldowns and TRANSIENT_COOLDOWN_MS), pruning the oldest entries by expiryTimestamp when size exceeds the limit; update the design notes to document the chosen upper bound and the pruning behavior (remove lowest expiryTimestamp entries until size <= MAX_TRANSIENT_COOLDOWNS). ``` </details> </blockquote></details> <details> <summary>.roo/specs/wrapped-error-interception/tasks.md (1)</summary><blockquote> `63-65`: _⚡ Quick win_ **Missing explicit test task for wrapped error scenarios.** Task 13 says "run all tests" to "verify no regressions," but there's no task to *create* tests specifically for wrapped error detection (e.g., test that HTTP 200 with `{"error": {...}}` throws ProviderApiError, test that wrapped 429 triggers cooldown). Consider adding a task to create unit tests for the new `isWrappedError()` and `throwWrappedError()` methods. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/wrapped-error-interception/tasks.md around lines 63 - 65, Add an explicit test creation task that verifies wrapped-error behavior: write unit tests covering isWrappedError() and throwWrappedError() to assert that an HTTP 200 response with body {"error": {...}} causes throwWrappedError() to throw a ProviderApiError and that a wrapped 429 response triggers the cooldown path; locate tests near existing provider/proxy tests and add cases for isWrappedError() true/false, throwWrappedError() raising correct error type and metadata (including retry-after/cooldown), and ensure they run in the existing test suite invoked by "npm test". ``` </details> </blockquote></details> <details> <summary>.roo/specs/wrapped-error-interception/design.md (2)</summary><blockquote> `46-56`: _💤 Low value_ **Array false-positive acknowledged but not mitigated.** The `isWrappedError()` check allows arrays (`{"error": ["something"]}`) to pass because `typeof [] === 'object'`. Line 315 (EC-3) acknowledges this results in "Unknown wrapped error" message and states it's "acceptable because array-format errors are extremely rare." Consider whether adding `&& !Array.isArray((body as Record<string, unknown>).error)` would be worth the extra guard, or explicitly document why array errors shouldn't be detected. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/wrapped-error-interception/design.md around lines 46 - 56, The isWrappedError(body: unknown) function currently treats body.error arrays as valid because typeof [] === 'object'; update isWrappedError to reject array-form errors by adding a check like && !Array.isArray((body as Record<string, unknown>).error) so that {"error": ["..."]} does not pass, or alternatively add a short comment in the isWrappedError implementation explaining why array-form errors are intentionally allowed; reference the isWrappedError method and the (body as Record<string, unknown>).error property when making the change. ``` </details> --- `318-322`: _⚡ Quick win_ **Mid-stream error behavior may confuse clients.** EC-5 notes that if an error chunk arrives mid-stream (after valid chunks), the client receives a partial response followed by termination. While the doc states "this is the best possible behavior," it doesn't mention whether the stream should emit a specific error event/frame to signal the truncation, or whether clients might interpret this as a network failure rather than a provider error. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/wrapped-error-interception/design.md around lines 318 - 322, The documentation misses guidance on signaling mid-stream wrapped errors to clients; update the EC-5 section to specify that when isWrappedError() causes the stream generator to throw a ProviderApiError, the implementation should emit a terminal error frame/event (or standardized SSE "event: error" chunk) before closing so clients can distinguish provider errors from network failures. Mention the involved symbols: the stream parser that uses isWrappedError(), the thrown ProviderApiError from the async generator, and the consumer/retry loop in proxy.ts; state that the generator should attempt to push a final error chunk/event describing the wrapped error (including code/message) immediately prior to terminating the stream so clients can detect provider-originated truncation rather than a transport error. ``` </details> </blockquote></details> <details> <summary>.roo/specs/transient-model-cooldown/requirements.md (3)</summary><blockquote> `25-25`: _⚡ Quick win_ **Consider guidance on observability for "all models exhausted" scenarios.** R-6 accepts the existing behavior when all models are on cooldown, but doesn't mention whether such scenarios should be logged, metered, or alerted. Since this represents a complete service degradation, consider adding a note recommending instrumentation to track when the proxy enters this state. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/transient-model-cooldown/requirements.md at line 25, Add a short observability note to R-6 recommending that the system emits logs, metrics, and alerts when routeRequest() returns the "All models exhausted" outcome: specify logging a structured error event with context (timestamp, caller, model pool state), incrementing a meter/counter (e.g., all_models_exhausted_total) and recording a tagged gauge or histogram for cooldown durations, and suggest creating an alerting rule on sustained or high-rate occurrences so operators are notified of complete service degradation. ``` </details> --- `29-29`: _💤 Low value_ **Memory bounds and concurrent registration not addressed.** The "Out of scope" section doesn't mention race conditions when multiple concurrent requests register cooldowns for the same model, or memory bounds if the cooldown Map grows unbounded with many failing models. While these may be acceptable omissions for an initial implementation, consider noting them as future considerations. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/transient-model-cooldown/requirements.md at line 29, Update the requirements.md "Out of scope" section to explicitly call out two future concerns: (1) race conditions when concurrent requests attempt to register a cooldown for the same model (referencing the module-level Map used in proxy.ts and the cooldown registration on 5xx/connection failures and sticky session override logic that writes to it), and (2) unbounded memory growth of the cooldown Map (the Map that tracks skipModels/cooldowns) and the need for retention/eviction or size limits; mention these as future considerations to be addressed if this feature is expanded. ``` </details> --- `10-10`: _⚡ Quick win_ **Clarify "non-auth" error exclusion criteria.** R-1 mentions excluding "non-auth, retryable errors" but doesn't explicitly define which status codes qualify as auth errors. While `5xx` is clearly specified as the trigger, consider adding a note clarifying that `401` and `403` (or similar auth-related errors) should *not* trigger cooldowns, to prevent ambiguity during implementation. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/transient-model-cooldown/requirements.md at line 10, Clarify R-1 by explicitly stating which HTTP status codes are considered auth errors and must be excluded from cooldowns: treat 401 and 403 as authentication/authorization errors that must NOT trigger the transient model cooldown; continue to treat 5xx and connection timeouts as retryable failures that DO trigger cooldowns. Update the "transient-model-cooldown" requirement text to add a sentence like "Authentication/authorization errors (e.g., HTTP 401 and 403) are excluded and should not cause a model to be placed into the cooldown set," so implementers of the shared in-memory disabled-model collection and logic in the proxy (the component described in R-1) know to check and skip 401/403 before adding model IDs. ``` </details> </blockquote></details> <details> <summary>.roo/specs/wrapped-error-interception/requirements.md (1)</summary><blockquote> `49-49`: _💤 Low value_ **"errors" array format excluded but used by some providers.** The out-of-scope section (line 49) excludes detecting `errors` array format, but some providers (e.g., Google) use `{"error": {"errors": [...]}}` where the `errors` key is nested inside the root `error` object. The current design handles this via `extractErrorMessage()`, but it may be worth noting that nested `errors` arrays *are* handled, only top-level `errors` keys are excluded. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/wrapped-error-interception/requirements.md at line 49, Clarify the out-of-scope note to state that top-level "errors" array formats are intentionally excluded, but nested "errors" arrays (e.g., {"error": {"errors": [...]}}) are handled by the existing extractErrorMessage() logic; update the sentence around the out-of-scope wording to reference extractErrorMessage() and explicitly note that nested errors arrays inside the root "error" object are supported while only top-level "errors" keys remain out-of-scope. ``` </details> </blockquote></details> <details> <summary>.roo/specs/transient-model-cooldown/tasks.md (1)</summary><blockquote> `5-10`: _💤 Low value_ **Line number references may become stale.** Tasks T-3 through T-6 reference specific line numbers in `proxy.ts` (e.g., line 1179, 1245, 1570, 1392). These are helpful for initial implementation but will become outdated if the code changes before these tasks are completed. Consider adding landmark comments or function names alongside line numbers to improve resilience. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/transient-model-cooldown/tasks.md around lines 5 - 10, Declare and export a module-level Map named transientModelCooldowns and constant TRANSIENT_COOLDOWN_MS in proxy.ts, then in handleChatCompletion() (after skipModels init and before the retry loop) iterate transientModelCooldowns to prune expired entries and add active cooldown modelDbIds to skipModels; immediately after the existing session-ban platform check clear preferredModel and preferredKeyId if preferredModel is on cooldown; in the retry-loop catch block (where errStatus is checked) when errStatus is 5xx or undefined and isRetryableError(err) is true set transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS) and add to local skipModels; likewise in the streaming error handler when streamErrStatus is ban-eligible set the same cooldown and add to skipModels; add short landmark comments (e.g., "// COOLDOWN: prune/inject", "// COOLDOWN: sticky override", "// COOLDOWN: register on retry error", "// COOLDOWN: register on stream error") next to these locations so future edits can find them even if line numbers shift. ``` </details> </blockquote></details> <details> <summary>new_streaming_block.txt (1)</summary><blockquote> `1-28`: _💤 Low value_ **Consider removing this file from the PR.** This appears to be a code snippet or development artifact rather than production code or documentation. The incomplete syntax at line 28 suggests it's a work-in-progress fragment. If it's meant as documentation, consider moving it to a proper markdown file with context, or removing it if it's no longer needed. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@new_streaming_block.txt` around lines 1 - 28, This file appears to be a WIP fragment with incomplete syntax (unclosed block at the stallTimeout promise) and should be removed from the PR or moved into proper documentation; either delete new_streaming_block.txt from the commit, or convert it to a markdown doc with context and a complete, valid code example. If you intend to keep code, ensure the stream-related helper symbols (cleanupStream, stallTimeout, streamStarted, heartbeatInterval) are placed in a proper source file with all braces and promise handlers closed and tests/typing validated before committing. ``` </details> </blockquote></details> <details> <summary>server/src/routes/proxy.ts (2)</summary><blockquote> `1699-1720`: _⚡ Quick win_ **Consider registering transient cooldown for non-stream 5xx errors.** Mid-stream 5xx errors register a transient cooldown (line 1547), but non-stream 5xx errors do not. This inconsistency means a model that fails with a 5xx error before streaming won't be temporarily excluded from other concurrent requests. Since 5xx errors often indicate transient infrastructure issues, adding a cooldown here would improve resilience. <details> <summary>🔄 Proposed enhancement</summary> Add transient cooldown registration after line 1719: ```diff preferredKeyId = undefined; } } + // Register global transient cooldown for this 5xx error + transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS); + console.log(`[TransientCooldown] registered global cooldown for modelDbId=${route.modelDbId} (${TRANSIENT_COOLDOWN_MS / 1000}s)`); } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1699 - 1720, The 5xx non-stream branch should also register the same transient cooldown used for mid-stream 5xxs; after the evaluateThreadProtection block (the one that may call banPlatformFromSession, addProviderModelsToSkipModels, or skipModels.add) add a call to the transient-cooldown registration used around the mid-stream handling (the same function invoked at the mid-stream 5xx handling near line ~1547) with the same parameters (normalizedMessages, routingMode, route.platform, route.modelDbId) so the failing provider/model is temporarily excluded from other concurrent requests. ``` </details> --- `1596-1634`: _⚡ Quick win_ **Consider registering transient cooldown for mid-stream retryable errors.** Non-stream retryable errors register a transient cooldown (lines 1724-1725), but mid-stream retryable errors do not. This inconsistency means a model that fails mid-stream with a retryable error won't be temporarily excluded from other concurrent requests, potentially causing cascading failures. <details> <summary>🔄 Proposed enhancement</summary> Add transient cooldown registration after line 1607: ```diff if (action.skipModel) { console.warn(`[Proxy] Mid-stream retryable error from ${route.platform} — skipping model ${route.modelId} (${action.reason})`); skipModels.add(route.modelDbId); } + // Register global transient cooldown to prevent concurrent requests from hitting the same failing model + transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS); + console.log(`[TransientCooldown] registered global cooldown for modelDbId=${route.modelDbId} (${TRANSIENT_COOLDOWN_MS / 1000}s)`); if (action.clearStickyIfPinned && preferredModel) { ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1596 - 1634, The mid-stream retryable error branch should also register the same transient cooldown as the non-stream retryable branch to temporarily exclude the failing model/provider from other concurrent requests; locate the transient-cooldown registration call used in the non-stream retryable error handling (the call that registers a cooldown for route.platform, route.modelDbId and routingMode) and add that exact call into this mid-stream branch (inside the isRetryableStreamError block, after evaluateThreadProtection and before writing/ending the response), so that banPlatformFromSession/addProviderModelsToSkipModels/skipModels/preferredModel logic is followed by the transient cooldown registration and then the existing response/cleanup and logRequest call. ``` </details> </blockquote></details> <details> <summary>server/src/__tests__/routes/transient-cooldown.test.ts (1)</summary><blockquote> `78-178`: _⚖️ Poor tradeoff_ **Tests re-implement production logic instead of exercising it.** The pruning/injection loop here (and similarly the sticky-override block at Lines 241-247 and the 5xx eligibility condition at Lines 321, 328, etc.) is a hand-copied duplicate of the logic in `proxy.ts`. These tests validate the copy, so a regression in the actual `proxy.ts` implementation would not be caught. Where feasible, drive the assertions through an exported helper (e.g. the cooldown-injection/skipModels routine) rather than reproducing its body inline. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/transient-cooldown.test.ts` around lines 78 - 178, Tests duplicate the production pruning/injection loop instead of exercising the implementation; replace the inline loop over transientModelCooldowns with calls to the actual exported helper in proxy.ts (e.g. export and use a function like injectTransientCooldowns or buildSkipModels that accepts transientModelCooldowns and returns/updates skipModels), then assert on its return/state rather than re-implementing the logic; similarly refactor the other duplicated blocks (the sticky-override routine and the 5xx eligibility checks) to call their respective exported helpers from proxy.ts so tests validate real code paths (refer to transientModelCooldowns, skipModels, and proxy.ts helper names when making the changes). ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.Inline comments:
In @.roo/specs/generalized-thread-protection/requirements.md:
- Around line 1-5: The requirements.md file is truncated mid-sentence; finish
the document by completing the Problem Statement (referencing the proxy route
handler in server/src/routes/proxy.ts and the existing special-casing of
"longcat"), then add a clear numbered List of Requirements with measurable
Acceptance Criteria (e.g., remove hardcoded branches, implement configurable
rules, pass unit/integration tests, performance and security checks), an Out of
Scope section stating what will not be changed (e.g., unrelated routes, legacy
clients), and a Dependencies section listing impacted components
(server/src/routes/proxy.ts, any config files, tests, and deployment steps);
ensure each requirement includes an acceptance test or success metric and that
terminology matches identifiers like "longcat" and "proxy route handler" so
reviewers can map requirements to code.In @.roo/specs/recency-biased-thompson-sampling/requirements.md:
- Line 15: Replace the hardcoded divisor 7.0 in the recency weight formula with
a SQL parameter placeholder (?) so the formula reads MIN(1.0, MAX(0.0, 1.0 -
(julianday('now') - julianday(created_at)) / ?)); ensure you add a short note
stating that ? is ANALYTICS_WINDOW_DAYS (derived from ANALYTICS_WINDOW_MS) to
match the design and parameterization described in the document.In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md:
- Around line 243-278: The stall handler currently throws an error inside the
setInterval callback which won’t propagate to the outer try/catch; instead, in
the interval callback set streamAborted = true and call cleanupStream(), and if
streamStarted write the error frame and res.end() but do NOT throw; after the
for-await loop (where the stream is consumed) run cleanupStream() and then if
(streamAborted && !streamStarted) throw the retryable error (Object.assign(new
Error(...), { status: 504 })) so the outer retry/catch sees it; use the existing
symbols streamAborted, streamStarted, cleanupStream(), writeResponseStreamEvent,
res.end and the post-loop logic to implement this control flow change.In @.roo/specs/transient-model-cooldown/tasks.md:
- Line 9: The retry-loop catch currently sets transientModelCooldowns for any
isRetryableError(err) (which includes 429); update the catch in
server/src/routes/proxy.ts so the global
transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS)
and adding to local skipModels only happens when isRetryableError(err) is true
AND errStatus is either undefined (connection failure) or a 5xx status (e.g.
errStatus >=500 && errStatus <600); ensure 429 (status 429) does not trigger the
cooldown by guarding with that errStatus check and keep using
TRANSIENT_COOLDOWN_MS and route.modelDbId as the identifiers.In @.roo/specs/wrapped-error-interception/design.md:
- Around line 72-75: The current assignment to error.status converts
errPayload.code with Number(...) without validation, which can produce NaN for
non-numeric codes; update the logic where error.status is set (the block
referencing errPayload and error.status) to parse the code into a numeric value,
check isNaN on that parsed result, and use a safe fallback (e.g., 200) when
parsing fails or the value is not a finite number, ensuring error.status is
always a valid number.In @.roo/specs/wrapped-error-interception/requirements.md:
- Line 32: Update FR-5 to explicitly handle non-numeric error.code values by
stating that only numeric codes will be used as the ProviderApiError.status and
all other values (including string codes like "INVALID_ARGUMENT", null,
undefined, or Number-like strings that parse to NaN) must be ignored and result
in the default status of 200; reference theerror.codekey, the
ProviderApiErrorstatusfield, and clarify that implementations should
validate via a numeric check (e.g., finite integer) before assigningstatus.- Line 33: Update the requirement FR-6 to match the design in EC-5 by changing
its scope from "first SSE chunk" to "any SSE chunk": state that if any SSE chunk
contains a root-levelerrorfield (whether first or mid-stream) the stream
must be aborted and aProviderApiErrorthrown, and add a short note
referencing EC-5 behavior for consistency with the mid-stream error handling
described there.In @.roo/specs/wrapped-error-interception/tasks.md:
- Line 15: When setting error.status from errPayload.code (the code path that
currently uses Number(errPayload.code)), validate the numeric conversion before
assigning: convert the code to a Number, check whether that conversion is a
valid number (not NaN), and only then use it; otherwise fall back to 200. Update
the assignment that sets status (the logic referencing errPayload and its code
property) to perform this conversion-and-check so non-numeric provider codes
like "RATE_LIMIT" do not produce NaN.In
@do_fix.py:
- Around line 1-7: Delete the accidental incomplete one-off script file
do_fix.py (and also remove sibling artifacts fix.py and fix_streaming.py) from
the PR—these are non-parsable codegen artifacts (unexpected EOF in do_fix.py)
and unrelated to the feature; ensure you remove them from the commit or move
them outside the repo so only relevant source files (e.g.,
server/src/routes/proxy.ts) remain in the changeset.In
@server/src/__tests__/services/router.test.ts:
- Around line 29-33: The INSERT into api_keys in the db.prepare(...) call has a
duplicate leftover "VALUES (?" line making the SQL invalid; replace the two
VALUES clauses with a single VALUES clause containing seven placeholders
matching the seven listed columns and ensure the .run(...) call on that same
statement supplies exactly seven parameters (encrypted, iv, authTag, 'healthy',
1 plus the platform and label) so db.prepare(...).run(...) is consistent with
the api_keys columns.In
@server/src/providers/base.ts:
- Around line 142-145: The assignment to error.status can become NaN when
errPayload.code is a non-numeric string; update the logic around the
error.status assignment in base.ts (where errPayload and error.status are set)
to parse the code only if it yields a finite number (e.g., use Number(...) then
check Number.isFinite or isFinite) and otherwise set error.status to 200; ensure
you still detect the presence of 'code' in errPayload before attempting
conversion and fall back to 200 for non-finite or missing values so downstream
status checks remain reliable.In
@server/src/providers/cloudflare.ts:
- Around line 123-131: The current try/catch around JSON.parse in the streaming
path swallows errors thrown by throwWrappedError(parsed); move the wrapped-error
detection outside the parse-specific catch so only JSON parse failures are
ignored: attempt JSON.parse(data) inside a narrow try/catch that on failure
skips the chunk, but after a successful parse call this.isWrappedError(parsed)
and, if true, call this.throwWrappedError(parsed) (so the thrown error bubbles
up), otherwise yield parsed; adjust the logic in the same generator where
ChatCompletionChunk is parsed to mirror the pattern used in google.ts.In
@server/src/providers/cohere.ts:
- Around line 114-122: The current try/catch swallows errors thrown by
throwWrappedError(parsed); refactor the block so JSON.parse is wrapped in its
own try/catch that only skips malformed chunks, then after a successful parse
call this.isWrappedError(parsed) and if true call
this.throwWrappedError(parsed); otherwise yield parsed — locate and update the
code around isWrappedError, throwWrappedError and the parsed handling to mirror
the parse-only try/catch pattern used in google.ts.In
@server/src/providers/openai-compat.ts:
- Around line 130-138: The current try/catch around JSON.parse in
openai-compat.ts swallows exceptions thrown by throwWrappedError(parsed); change
the flow so parsing is done inside a small try/catch that only handles
JSON.parse errors (skip malformed chunks), then after a successful parse call
this.isWrappedError(parsed) and, if true, call this.throwWrappedError(parsed)
outside the parse-only try block so the wrapped-error is not caught and
discarded; update the block containing JSON.parse, ChatCompletionChunk,
isWrappedError, throwWrappedError, and the yield parsed accordingly.In
@server/src/routes/proxy.ts:
- Line 1376: The double semicolon after the res.write call is extraneous; in the
function handling the streaming response where res.write('data: [DONE]\n\n');;
appears, remove the extra semicolon so the line is a single statement
(res.write('data: [DONE]\n\n');). This will clean up the syntax without changing
behavior—locate the res.write invocation in the proxy route handler and delete
the trailing duplicate semicolon.- Around line 1546-1548: The comment above the transient cooldown registration
is misleading because the block only runs for ban-eligible 5xx statuses; update
the comment near transientModelCooldowns.set(route.modelDbId, ...) to accurately
state that this registers a global transient cooldown only for ban-eligible 5xx
mid-stream errors (i.e., when streamErrStatus passes isBanEligibleStatus),
referencing TRANSIENT_COOLDOWN_MS and route.modelDbId to keep context; do not
change logic, only correct the comment text to reflect the actual condition.In
@server/src/services/router.ts:
- Around line 183-187: The SQL has mismatched parentheses in the
recency-weighted aggregates: in the SELECT expressions for "total" and
"successes" remove the extra closing parenthesis so MAX(0, MIN(1.0, 1.0 -
(julianday('now') - julianday(created_at)) / 7.0)) is balanced; update the
SUM(...) and SUM(CASE ... THEN ... ELSE 0 END) calls accordingly (symbols to
locate: raw_total, total, successes, created_at, julianday, MAX, MIN, SUM, CASE)
so db.prepare() in refreshStatsCache() will no longer throw.In
@server/write_test.py:
- Line 29: The file contains an incomplete/truncated string literal "expect(()
=> route{ in write_test.py which causes a syntax error; fix by either completing
the test generation logic (close the string and finish the test expression) or
remove the development artifact line; locate the occurrence of the truncated
literal in write_test.py (search for expect(() => route{) and either complete
the test statement in the test function that builds route assertions or delete
the line if the test is not needed.In
@server/write_tests.py:
- Around line 1-45: The file write_tests.py is a throwaway generator with a
hardcoded absolute path (path variable) and an unterminated triple-quoted string
(part1) causing syntax errors; remove this file (and related scratch scripts
like write_test.py/do_fix.py/fix*.py) from the repository, add them to
.gitignore, and instead commit the intended test file directly (the test content
currently in the part1 string should be placed into
server/src/tests/services/router.test.ts as a regular file); ensure no
references to the removed generator remain in the repo or CI.
Nitpick comments:
In @.roo/specs/disable-sticky-on-auto/design.md:
- Around line 11-17: The fenced code block containing the stickyOp example lacks
a language identifier; update the block fence to include a language (e.g., add
"typescript" after the opening ```), so the snippet beginning with "function
stickyOp(messages, routingMode, ...)" and referencing getSessionKey and
stickySessionMap is written as a language-specified fenced block (for exampleIn @.roo/specs/generalized-thread-protection/design.md: - Around line 43-46: The fenced code block containing THREAD_PROTECTION_PLATFORMS="longcat:provider-ban,groq:model-skip" needs a language identifier for proper syntax highlighting; update the triple-backtick fence to include a language such as bash or shell (e.g., change ``` to ```bash) around the existing THREAD_PROTECTION_PLATFORMS line so the block is rendered with shell/bash highlighting. In @.roo/specs/owl-alpha-longcat-model-routing/design.md: - Around line 5-46: The fenced code blocks in the design document lack language specifiers; update each ASCII diagram/code block to include appropriate languages (e.g., use ```text for ASCII diagrams and ```typescript for pseudocode) so syntax highlighting and readability improve—specifically add ```text to the top architecture diagram (the Client Request / Balanced Router / Smart Router block referenced alongside routeRequest() in router.ts), add ```text to the smart preference and sticky cooldown flow diagrams, and add ```text or ```typescript for the error handling flow/pseudocode near handleChatCompletion() in proxy.ts; ensure each existing triple-backtick fence is changed to include the chosen language. In @.roo/specs/recency-biased-thompson-sampling/design.md: - Line 82: Clarify that leaving tok_per_sec and avg_ttfb_ms unweighted is a deliberate simplification for the initial implementation: update the design text in recency-biased-thompson-sampling to state that tok_per_sec and avg_ttfb_ms are intentionally kept unweighted for rollout simplicity, include a brief rationale (they're relatively stable but can be early warning signals), and add a short "Future work" note that weighting or anomaly-triggered adjustments for tok_per_sec and avg_ttfb_ms may be introduced later with monitoring/thresholds. In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md: - Around line 112-121: The heartbeat write failure catch inside the setInterval that calls cleanupStream() can race with the req.on('close') handler; make this explicit by documenting that cleanupStream() is idempotent and designed to be safe when invoked concurrently from both the setInterval catch and the request 'close' event. Add a short comment near streamStarted/res.write and the setInterval callback referencing req.on('close') and stating cleanupStream() is idempotent and safe to call multiple times to prevent confusion for future readers. In @.roo/specs/transient-model-cooldown/design.md: - Around line 33-36: The transientModelCooldowns Map can grow unbounded; add a defensive max-size limit (e.g., MAX_TRANSIENT_COOLDOWNS = 100) and enforce it wherever entries are added (reference transientModelCooldowns and TRANSIENT_COOLDOWN_MS), pruning the oldest entries by expiryTimestamp when size exceeds the limit; update the design notes to document the chosen upper bound and the pruning behavior (remove lowest expiryTimestamp entries until size <= MAX_TRANSIENT_COOLDOWNS). In @.roo/specs/transient-model-cooldown/requirements.md: - Line 25: Add a short observability note to R-6 recommending that the system emits logs, metrics, and alerts when routeRequest() returns the "All models exhausted" outcome: specify logging a structured error event with context (timestamp, caller, model pool state), incrementing a meter/counter (e.g., all_models_exhausted_total) and recording a tagged gauge or histogram for cooldown durations, and suggest creating an alerting rule on sustained or high-rate occurrences so operators are notified of complete service degradation. - Line 29: Update the requirements.md "Out of scope" section to explicitly call out two future concerns: (1) race conditions when concurrent requests attempt to register a cooldown for the same model (referencing the module-level Map used in proxy.ts and the cooldown registration on 5xx/connection failures and sticky session override logic that writes to it), and (2) unbounded memory growth of the cooldown Map (the Map that tracks skipModels/cooldowns) and the need for retention/eviction or size limits; mention these as future considerations to be addressed if this feature is expanded. - Line 10: Clarify R-1 by explicitly stating which HTTP status codes are considered auth errors and must be excluded from cooldowns: treat 401 and 403 as authentication/authorization errors that must NOT trigger the transient model cooldown; continue to treat 5xx and connection timeouts as retryable failures that DO trigger cooldowns. Update the "transient-model-cooldown" requirement text to add a sentence like "Authentication/authorization errors (e.g., HTTP 401 and 403) are excluded and should not cause a model to be placed into the cooldown set," so implementers of the shared in-memory disabled-model collection and logic in the proxy (the component described in R-1) know to check and skip 401/403 before adding model IDs. In @.roo/specs/transient-model-cooldown/tasks.md: - Around line 5-10: Declare and export a module-level Map named transientModelCooldowns and constant TRANSIENT_COOLDOWN_MS in proxy.ts, then in handleChatCompletion() (after skipModels init and before the retry loop) iterate transientModelCooldowns to prune expired entries and add active cooldown modelDbIds to skipModels; immediately after the existing session-ban platform check clear preferredModel and preferredKeyId if preferredModel is on cooldown; in the retry-loop catch block (where errStatus is checked) when errStatus is 5xx or undefined and isRetryableError(err) is true set transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS) and add to local skipModels; likewise in the streaming error handler when streamErrStatus is ban-eligible set the same cooldown and add to skipModels; add short landmark comments (e.g., "// COOLDOWN: prune/inject", "// COOLDOWN: sticky override", "// COOLDOWN: register on retry error", "// COOLDOWN: register on stream error") next to these locations so future edits can find them even if line numbers shift. In @.roo/specs/wrapped-error-interception/design.md: - Around line 46-56: The isWrappedError(body: unknown) function currently treats body.error arrays as valid because typeof [] === 'object'; update isWrappedError to reject array-form errors by adding a check like && !Array.isArray((body as Record<string, unknown>).error) so that {"error": ["..."]} does not pass, or alternatively add a short comment in the isWrappedError implementation explaining why array-form errors are intentionally allowed; reference the isWrappedError method and the (body as Record<string, unknown>).error property when making the change. - Around line 318-322: The documentation misses guidance on signaling mid-stream wrapped errors to clients; update the EC-5 section to specify that when isWrappedError() causes the stream generator to throw a ProviderApiError, the implementation should emit a terminal error frame/event (or standardized SSE "event: error" chunk) before closing so clients can distinguish provider errors from network failures. Mention the involved symbols: the stream parser that uses isWrappedError(), the thrown ProviderApiError from the async generator, and the consumer/retry loop in proxy.ts; state that the generator should attempt to push a final error chunk/event describing the wrapped error (including code/message) immediately prior to terminating the stream so clients can detect provider-originated truncation rather than a transport error. In @.roo/specs/wrapped-error-interception/requirements.md: - Line 49: Clarify the out-of-scope note to state that top-level "errors" array formats are intentionally excluded, but nested "errors" arrays (e.g., {"error": {"errors": [...]}}) are handled by the existing extractErrorMessage() logic; update the sentence around the out-of-scope wording to reference extractErrorMessage() and explicitly note that nested errors arrays inside the root "error" object are supported while only top-level "errors" keys remain out-of-scope. In @.roo/specs/wrapped-error-interception/tasks.md: - Around line 63-65: Add an explicit test creation task that verifies wrapped-error behavior: write unit tests covering isWrappedError() and throwWrappedError() to assert that an HTTP 200 response with body {"error": {...}} causes throwWrappedError() to throw a ProviderApiError and that a wrapped 429 response triggers the cooldown path; locate tests near existing provider/proxy tests and add cases for isWrappedError() true/false, throwWrappedError() raising correct error type and metadata (including retry-after/cooldown), and ensure they run in the existing test suite invoked by "npm test". In `@new_streaming_block.txt`: - Around line 1-28: This file appears to be a WIP fragment with incomplete syntax (unclosed block at the stallTimeout promise) and should be removed from the PR or moved into proper documentation; either delete new_streaming_block.txt from the commit, or convert it to a markdown doc with context and a complete, valid code example. If you intend to keep code, ensure the stream-related helper symbols (cleanupStream, stallTimeout, streamStarted, heartbeatInterval) are placed in a proper source file with all braces and promise handlers closed and tests/typing validated before committing. In `@server/src/__tests__/routes/transient-cooldown.test.ts`: - Around line 78-178: Tests duplicate the production pruning/injection loop instead of exercising the implementation; replace the inline loop over transientModelCooldowns with calls to the actual exported helper in proxy.ts (e.g. export and use a function like injectTransientCooldowns or buildSkipModels that accepts transientModelCooldowns and returns/updates skipModels), then assert on its return/state rather than re-implementing the logic; similarly refactor the other duplicated blocks (the sticky-override routine and the 5xx eligibility checks) to call their respective exported helpers from proxy.ts so tests validate real code paths (refer to transientModelCooldowns, skipModels, and proxy.ts helper names when making the changes). In `@server/src/routes/proxy.ts`: - Around line 1699-1720: The 5xx non-stream branch should also register the same transient cooldown used for mid-stream 5xxs; after the evaluateThreadProtection block (the one that may call banPlatformFromSession, addProviderModelsToSkipModels, or skipModels.add) add a call to the transient-cooldown registration used around the mid-stream handling (the same function invoked at the mid-stream 5xx handling near line ~1547) with the same parameters (normalizedMessages, routingMode, route.platform, route.modelDbId) so the failing provider/model is temporarily excluded from other concurrent requests. - Around line 1596-1634: The mid-stream retryable error branch should also register the same transient cooldown as the non-stream retryable branch to temporarily exclude the failing model/provider from other concurrent requests; locate the transient-cooldown registration call used in the non-stream retryable error handling (the call that registers a cooldown for route.platform, route.modelDbId and routingMode) and add that exact call into this mid-stream branch (inside the isRetryableStreamError block, after evaluateThreadProtection and before writing/ending the response), so that banPlatformFromSession/addProviderModelsToSkipModels/skipModels/preferredModel logic is followed by the transient cooldown registration and then the existing response/cleanup and logRequest call.🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID:
64ee865b-4c52-4b11-b495-98a8ad28943f📒 Files selected for processing (43)
.roo/specs/disable-sticky-on-auto/design.md.roo/specs/disable-sticky-on-auto/requirements.md.roo/specs/disable-sticky-on-auto/tasks.md.roo/specs/generalized-thread-protection/design.md.roo/specs/generalized-thread-protection/requirements.md.roo/specs/generalized-thread-protection/tasks.md.roo/specs/owl-alpha-longcat-model-routing/design.md.roo/specs/owl-alpha-longcat-model-routing/requirements.md.roo/specs/owl-alpha-longcat-model-routing/tasks.md.roo/specs/recency-biased-thompson-sampling/design.md.roo/specs/recency-biased-thompson-sampling/requirements.md.roo/specs/recency-biased-thompson-sampling/tasks.md.roo/specs/sse-stream-heartbeat-stall-protection/design.md.roo/specs/sse-stream-heartbeat-stall-protection/requirements.md.roo/specs/sse-stream-heartbeat-stall-protection/tasks.md.roo/specs/transient-model-cooldown/design.md.roo/specs/transient-model-cooldown/requirements.md.roo/specs/transient-model-cooldown/tasks.md.roo/specs/wrapped-error-interception/design.md.roo/specs/wrapped-error-interception/requirements.md.roo/specs/wrapped-error-interception/tasks.mdclient/src/pages/FallbackPage.tsxdo_fix.pyfix.pyfix_streaming.pyfix{new_streaming_block.txtserver/src/__tests__/routes/fallback.test.tsserver/src/__tests__/routes/provider-session-ban.test.tsserver/src/__tests__/routes/proxy-tools.test.tsserver/src/__tests__/routes/stream-heartbeat-stall.test.tsserver/src/__tests__/routes/transient-cooldown.test.tsserver/src/__tests__/services/router.test.tsserver/src/providers/base.tsserver/src/providers/cloudflare.tsserver/src/providers/cohere.tsserver/src/providers/google.tsserver/src/providers/openai-compat.tsserver/src/routes/proxy.tsserver/src/services/router.tsserver/src/services/threadProtection.tsserver/write_test.pyserver/write_tests.py
| # Requirements: Generalized Thread Protection Scanner | ||
|
|
||
| ## Problem Statement | ||
|
|
||
| The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{ No newline at end of file |
There was a problem hiding this comment.
Requirements document is incomplete.
The file appears truncated at line 5, cutting off mid-sentence. The requirements section is missing, which is critical for understanding the feature scope and acceptance criteria.
The file ends with:
The proxy route handler (`server/src/routes/proxy.ts`) contains 6+ hardcoded branches that special-case the `longcat`{
This looks like an incomplete save or file corruption. Please complete the requirements document with:
- Problem statement (appears to be started)
- List of requirements with acceptance criteria
- Out of scope items
- Dependencies (if any)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/generalized-thread-protection/requirements.md around lines 1 - 5,
The requirements.md file is truncated mid-sentence; finish the document by
completing the Problem Statement (referencing the proxy route handler in
server/src/routes/proxy.ts and the existing special-casing of "longcat"), then
add a clear numbered List of Requirements with measurable Acceptance Criteria
(e.g., remove hardcoded branches, implement configurable rules, pass
unit/integration tests, performance and security checks), an Out of Scope
section stating what will not be changed (e.g., unrelated routes, legacy
clients), and a Dependencies section listing impacted components
(server/src/routes/proxy.ts, any config files, tests, and deployment steps);
ensure each requirement includes an acceptance test or success metric and that
terminology matches identifiers like "longcat" and "proxy route handler" so
reviewers can map requirements to code.
|
|
||
| The SQL query aggregating historical requests in [`refreshStatsCache()`](server/src/services/router.ts:174) must calculate a weight for each logged request based on its age. Newer requests must be assigned a weight closer to `1.0`, while requests approaching the limit of the analytics window (7 days) must decay toward `0.0`. | ||
|
|
||
| **Formula**: `MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))` |
There was a problem hiding this comment.
Inconsistency: Hardcoded 7.0 conflicts with design parameterization.
This requirement shows the formula with a hardcoded 7.0 divisor, but the design document (line 81) explicitly states that the window days should be "passed as SQL parameter ?" to avoid hardcoding and keep it coupled with ANALYTICS_WINDOW_MS. Update this line to use ? instead of 7.0 to match the design.
📝 Proposed fix
-**Formula**: `MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))`
+**Formula**: `MIN(1.0, MAX(0.0, 1.0 - (julianday('now') - julianday(created_at)) / ?))`And add a note: "where ? is ANALYTICS_WINDOW_DAYS derived from ANALYTICS_WINDOW_MS."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/recency-biased-thompson-sampling/requirements.md at line 15,
Replace the hardcoded divisor 7.0 in the recency weight formula with a SQL
parameter placeholder (?) so the formula reads MIN(1.0, MAX(0.0, 1.0 -
(julianday('now') - julianday(created_at)) / ?)); ensure you add a short note
stating that ? is ANALYTICS_WINDOW_DAYS (derived from ANALYTICS_WINDOW_MS) to
match the design and parameterization described in the document.
| **Final approach**: When `streamStarted === false` on stall detection, throw a retryable error so the outer retry loop can attempt a fallback provider: | ||
|
|
||
| ```typescript | ||
| if (now - lastChunkTimestamp > MAX_STREAM_STALL_MS) { | ||
| console.warn(`[Proxy] Stream stalled for ${now - lastChunkTimestamp}ms — aborting socket`); | ||
| streamAborted = true; | ||
| cleanupStream(); | ||
|
|
||
| if (streamStarted) { | ||
| // Mid-stream stall — cannot retry, write error frame and end | ||
| const payload = { error: { message: 'Upstream stream stalled', type: 'stream_timeout' } }; | ||
| try { | ||
| if (responseStreamContext) { | ||
| writeResponseStreamEvent(res, { | ||
| type: 'response.failed', | ||
| response: { | ||
| id: responseStreamContext.responseId, | ||
| status: 'failed', | ||
| error: payload.error, | ||
| }, | ||
| }); | ||
| } else { | ||
| res.write(`data: ${JSON.stringify(payload)}\n\n`); | ||
| res.write('data: [DONE]\n\n'); | ||
| } | ||
| res.end(); | ||
| } catch { /* Socket already gone */ } | ||
| } else { | ||
| // Pre-stream stall — no headers sent yet, response is still retryable | ||
| // Throw an error to fall through to the outer retry/502 handler | ||
| throw Object.assign( | ||
| new Error(`Upstream provider stalled before yielding any data from ${route.displayName}`), | ||
| { status: 504 }, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Pre-stream stall error propagation may not work as designed.
The design proposes throwing an error from within the setInterval callback to trigger the outer retry loop (lines 273-277). However, errors thrown inside setInterval callbacks do NOT propagate to the outer try/catch block—they trigger uncaughtException or are silently swallowed depending on the environment.
The current design shows the stall handler throwing an error, but this won't reach the catch (streamErr) block at line 1392. Instead, the streamAborted flag approach (already in place) should be used: set the flag, call cleanupStream(), but don't throw. The for-await loop will break on streamAborted, and the post-loop code should detect streamAborted && !streamStarted and throw the retryable error there.
🔧 Recommended control flow fix
Update the design to clarify that the throw happens in the post-loop check, not in the setInterval callback:
-
In the stall handler (inside
setInterval):- Set
streamAborted = true - Call
cleanupStream() - If
streamStarted, write error frame andres.end() - If
!streamStarted, do NOT throw—just return from the callback
- Set
-
After the
for-awaitloop (lines 171-178):cleanupStream(); if (streamAborted) { if (!streamStarted) { // Pre-stream stall - throw to retry throw Object.assign( new Error('Upstream provider stalled before yielding data'), { status: 504 } ); } // Mid-stream stall - already handled in interval callback logRequest(..., 'stream_stalled'); return; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/sse-stream-heartbeat-stall-protection/design.md around lines 243
- 278, The stall handler currently throws an error inside the setInterval
callback which won’t propagate to the outer try/catch; instead, in the interval
callback set streamAborted = true and call cleanupStream(), and if streamStarted
write the error frame and res.end() but do NOT throw; after the for-await loop
(where the stream is consumed) run cleanupStream() and then if (streamAborted &&
!streamStarted) throw the retryable error (Object.assign(new Error(...), {
status: 504 })) so the outer retry/catch sees it; use the existing symbols
streamAborted, streamStarted, cleanupStream(), writeResponseStreamEvent, res.end
and the post-loop logic to implement this control flow change.
| - [ ] **T-2**: Export `transientModelCooldowns` and `TRANSIENT_COOLDOWN_MS` from [`proxy.ts`](server/src/routes/proxy.ts:170) in the existing export block for test access | ||
| - [ ] **T-3**: Add pre-routing cooldown injection logic inside [`handleChatCompletion()`](server/src/routes/proxy.ts:1061) — after `skipModels` initialization at [line 1179](server/src/routes/proxy.ts:1179) and before the retry loop at [line 1245](server/src/routes/proxy.ts:1245). Iterate `transientModelCooldowns`, prune expired entries, and add active cooldowns to `skipModels` | ||
| - [ ] **T-4**: Add sticky session override logic — after cooldown injection, check if `preferredModel` is on global cooldown and clear `preferredModel`/`preferredKeyId` if so. Place this after the existing session-ban platform check at [line 1195](server/src/routes/proxy.ts:1195) | ||
| - [ ] **T-5**: Register global cooldown in the retry loop catch block at [line 1570](server/src/routes/proxy.ts:1570) — when `errStatus` is `5xx` or `undefined` (connection failure) and `isRetryableError(err)` is true, set `transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS)` and add to local `skipModels` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify if isRetryableError treats 429 as retryable
rg -n -C5 'function isRetryableError|const isRetryableError' --type=ts --glob '**/proxy.ts'Repository: vi70x3/freellmapi
Length of output: 922
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect the cooldown registration site referenced by the spec (around line 1570)
sed -n '1550,1605p' server/src/routes/proxy.ts
# 2) Locate where transientModelCooldowns.set is used and inspect surrounding logic
rg -n "transientModelCooldowns\.set" server/src/routes/proxy.ts -n
rg -n "skipModels" server/src/routes/proxy.ts -n
# Print a wider window around the first occurrence of transientModelCooldowns.set
first=$(rg -n "transientModelCooldowns\.set" server/src/routes/proxy.ts | head -n1 | cut -d: -f1)
start=$((first-40))
end=$((first+80))
sed -n "${start},${end}p" server/src/routes/proxy.ts
# 3) Read the spec file to see what T-7 asserts (line 13 mentioned in the comment)
sed -n '1,60p' .roo/specs/transient-model-cooldown/tasks.mdRepository: vi70x3/freellmapi
Length of output: 14681
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Retry-loop catch block around the spec-referenced line (~1570)
sed -n '1500,1650p' server/src/routes/proxy.ts
# Also inspect the retry loop region a bit earlier for how errStatus is computed
sed -n '1400,1505p' server/src/routes/proxy.ts
# Show full isRetryableError + getErrorStatus (for context of what "429" maps to)
rg -n "function getErrorStatus|const getErrorStatus|function isRetryableError|const isRetryableError" server/src/routes/proxy.ts
sed -n '480,580p' server/src/routes/proxy.tsRepository: vi70x3/freellmapi
Length of output: 18130
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the second transient cooldown set site (often the retry-loop one)
# Use line numbers from the earlier grep output: one at 1547 (mid-stream), one at 1724 (likely retry loop).
sed -n '1680,1760p' server/src/routes/proxy.ts
# Find the retry-loop catch block by searching for "errStatus" and "isRetryableError(err)".
rg -n "errStatus\s*=|isRetryableError\(err\)|transientModelCooldowns\.set" server/src/routes/proxy.ts
# Show the retry loop section around where errStatus is computed.
# (Pick a window around the first "errStatus" occurrence.)
first_err_status=$(rg -n "errStatus\s*=" server/src/routes/proxy.ts | head -n1 | cut -d: -f1)
start=$((first_err_status-80))
end=$((first_err_status+120))
sed -n "${start},${end}p" server/src/routes/proxy.tsRepository: vi70x3/freellmapi
Length of output: 13713
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show tasks.md with line numbers to locate T-5 and T-7 expectations precisely
nl -ba .roo/specs/transient-model-cooldown/tasks.md | sed -n '1,120p'Repository: vi70x3/freellmapi
Length of output: 104
🏁 Script executed:
#!/bin/bash
set -euo pipefail
cat -n .roo/specs/transient-model-cooldown/tasks.md | sed -n '1,140p'Repository: vi70x3/freellmapi
Length of output: 2609
Fix spec/code mismatch: retry-loop transient cooldown currently applies to 429
isRetryableError(err)treatsstatus === 429as retryable.- In
server/src/routes/proxy.ts, the retry-loopcatchregisterstransientModelCooldowns.set(...)underif (isRetryableError(err))only (noerrStatus5xx/undefined guard), so 429 will register a transient cooldown. - This contradicts T-5’s stated “only 5xx/undefined” condition and T-7’s “exclude 429” expectation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/transient-model-cooldown/tasks.md at line 9, The retry-loop catch
currently sets transientModelCooldowns for any isRetryableError(err) (which
includes 429); update the catch in server/src/routes/proxy.ts so the global
transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS)
and adding to local skipModels only happens when isRetryableError(err) is true
AND errStatus is either undefined (connection failure) or a 5xx status (e.g.
errStatus >=500 && errStatus <600); ensure 429 (status 429) does not trigger the
cooldown by guarding with that errStatus check and keep using
TRANSIENT_COOLDOWN_MS and route.modelDbId as the identifiers.
| error.status = | ||
| typeof errPayload === 'object' && errPayload !== null && 'code' in (errPayload as Record<string, unknown>) | ||
| ? Number((errPayload as Record<string, unknown>).code) | ||
| : 200; |
There was a problem hiding this comment.
Potential NaN from non-numeric error codes.
Line 73-74 extracts error.code and calls Number(code) without validating that code is numeric. If a provider returns {"error": {"code": "RATE_LIMIT"}}, this would result in status: NaN. Consider adding a validation or fallback: const codeNum = Number(code); status = isNaN(codeNum) ? 200 : codeNum.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.roo/specs/wrapped-error-interception/design.md around lines 72 - 75, The
current assignment to error.status converts errPayload.code with Number(...)
without validation, which can produce NaN for non-numeric codes; update the
logic where error.status is set (the block referencing errPayload and
error.status) to parse the code into a numeric value, check isNaN on that parsed
result, and use a safe fallback (e.g., 200) when parsing fails or the value is
not a finite number, ensuring error.status is always a valid number.
| }); | ||
| } else { | ||
| res.write(`data: ${JSON.stringify(payload)}\n\n`); | ||
| res.write('data: [DONE]\n\n');; |
There was a problem hiding this comment.
Remove extra semicolon.
Line 1376 has a double semicolon which is unnecessary.
🔧 Proposed fix
- res.write('data: [DONE]\n\n');;
+ res.write('data: [DONE]\n\n');📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| res.write('data: [DONE]\n\n');; | |
| res.write('data: [DONE]\n\n'); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/routes/proxy.ts` at line 1376, The double semicolon after the
res.write call is extraneous; in the function handling the streaming response
where res.write('data: [DONE]\n\n');; appears, remove the extra semicolon so the
line is a single statement (res.write('data: [DONE]\n\n');). This will clean up
the syntax without changing behavior—locate the res.write invocation in the
proxy route handler and delete the trailing duplicate semicolon.
| // Register global transient cooldown for any 5xx mid-stream error | ||
| transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS); | ||
| console.log(`[TransientCooldown] registered global cooldown for modelDbId=${route.modelDbId} (${TRANSIENT_COOLDOWN_MS / 1000}s)`); |
There was a problem hiding this comment.
Comment is misleading.
The comment states "Register global transient cooldown for any 5xx mid-stream error" but this code only executes for ban-eligible 5xx errors (500, 502, 503, 504) because it's inside the if (streamErrStatus && isBanEligibleStatus(streamErrStatus)) block (line 1527).
📝 Proposed fix
- // Register global transient cooldown for any 5xx mid-stream error
+ // Register global transient cooldown for this ban-eligible 5xx mid-stream error
transientModelCooldowns.set(route.modelDbId, Date.now() + TRANSIENT_COOLDOWN_MS);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/routes/proxy.ts` around lines 1546 - 1548, The comment above the
transient cooldown registration is misleading because the block only runs for
ban-eligible 5xx statuses; update the comment near
transientModelCooldowns.set(route.modelDbId, ...) to accurately state that this
registers a global transient cooldown only for ban-eligible 5xx mid-stream
errors (i.e., when streamErrStatus passes isBanEligibleStatus), referencing
TRANSIENT_COOLDOWN_MS and route.modelDbId to keep context; do not change logic,
only correct the comment text to reflect the actual condition.
| COUNT(*) as raw_total, | ||
| SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total, | ||
| SUM(CASE WHEN status = 'success' | ||
| THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) | ||
| ELSE 0 END) as successes, |
There was a problem hiding this comment.
SQL syntax error: extra closing parenthesis in both weighted aggregates.
Both new recency-weighted expressions have one extra ):
- Line 184
total:SUM(MAX(0, MIN(1.0, … / 7.0))))has 6 opening vs 7 closing parens. - Lines 185‑187
successes:SUM(CASE … THEN MAX(0, MIN(1.0, … / 7.0))) ELSE 0 END)also has one extra).
db.prepare() will throw at runtime, breaking refreshStatsCache() and therefore all routing.
🐛 Proposed fix
- SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total,
+ SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total,
SUM(CASE WHEN status = 'success'
- THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))
+ THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))
ELSE 0 END) as successes,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| COUNT(*) as raw_total, | |
| SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)))) as total, | |
| SUM(CASE WHEN status = 'success' | |
| THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) | |
| ELSE 0 END) as successes, | |
| COUNT(*) as raw_total, | |
| SUM(MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0))) as total, | |
| SUM(CASE WHEN status = 'success' | |
| THEN MAX(0, MIN(1.0, 1.0 - (julianday('now') - julianday(created_at)) / 7.0)) | |
| ELSE 0 END) as successes, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/src/services/router.ts` around lines 183 - 187, The SQL has mismatched
parentheses in the recency-weighted aggregates: in the SELECT expressions for
"total" and "successes" remove the extra closing parenthesis so MAX(0, MIN(1.0,
1.0 - (julianday('now') - julianday(created_at)) / 7.0)) is balanced; update the
SUM(...) and SUM(CASE ... THEN ... ELSE 0 END) calls accordingly (symbols to
locate: raw_total, total, successes, created_at, julianday, MAX, MIN, SUM, CASE)
so db.prepare() in refreshStatsCache() will no longer throw.
| " });", | ||
| "", | ||
| " it('should throw when no keys are configured', () => {", | ||
| " expect(() => route{ No newline at end of file |
There was a problem hiding this comment.
Fix incomplete string literal.
Line 29 has an incomplete string that will cause a Python syntax error when the script is executed. The string "expect(() => route{" is missing a closing quote and appears to be truncated.
If this script is still needed, complete the test generation logic. If it's a development artifact, consider removing it from the PR.
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 29-29: missing closing quote in string literal
(invalid-syntax)
[warning] 29-29: unexpected EOF while parsing
(invalid-syntax)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/write_test.py` at line 29, The file contains an incomplete/truncated
string literal "expect(() => route{ in write_test.py which causes a syntax
error; fix by either completing the test generation logic (close the string and
finish the test expression) or remove the development artifact line; locate the
occurrence of the truncated literal in write_test.py (search for expect(() =>
route{) and either complete the test statement in the test function that builds
route assertions or delete the line if the test is not needed.
| import os | ||
|
|
||
| # Write the test file in parts to avoid truncation | ||
| path = '/home/vi/freellmapi/server/src/__tests__/services/router.test.ts' | ||
|
|
||
| # Part 1: existing tests | ||
| part1 = """import { describe, it, expect, beforeAll, beforeEach } from 'vitest'; | ||
| import { initDb, getDb } from '../../db/index.js'; | ||
| import { encrypt } from '../../lib/crypto.js'; | ||
| import { routeRequest, refreshStatsCache, getAnalyticsScores } from '../../services/router.js'; | ||
|
|
||
| describe('Router', () => { | ||
| beforeAll(() => { | ||
| process.env.ENCRYPTION_KEY = '0'.repeat(64); | ||
| initDb(':memory:'); | ||
| }); | ||
|
|
||
| beforeEach(() => { | ||
| const db = getDb(); | ||
| db.prepare('DELETE FROM api_keys').run(); | ||
| const models = db.prepare('SELECT id, intelligence_rank FROM models ORDER BY intelligence_rank ASC').all() as any[]; | ||
| const update = db.prepare('UPDATE fallback_config SET priority = ? WHERE model_db_id = ?'); | ||
| for (let i = 0; i < models.length; i++) { | ||
| update.run(i + 1, models[i].id); | ||
| } | ||
| }); | ||
|
|
||
| it('should throw when no keys are configured', () => { | ||
| expect(() => routeRequest()).toThrow(/exhausted/i); | ||
| }); | ||
|
|
||
| it('should route to highest priority model with available key', () => { | ||
| const db = getDb(); | ||
| const { encrypted, iv, authTag } = encrypt('test-groq-key'); | ||
| db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('groq', 'test', encrypted, iv, authTag, 'healthy', 1); | ||
| const result = routeRequest(); | ||
| expect(result.platform).toBe('groq'); | ||
| expect(result.apiKey).toBe('test-groq-key'); | ||
| }); | ||
|
|
||
| it('should route to an available model when keys exist for multiple platforms', () => { | ||
| const db = getDb(); | ||
| const googleKey = encrypt('test-google-key'); | ||
| db.prepare('INSERT INTO api_keys (platform, label, encrypted_key, iv, auth_tag, status, enabled) VALUES (?, ?, ?, ?, ?, ?, ?)').run('google', 'test', googleKey.encrypted, googleKey.iv, googleKey.authTag, 'healthy', 1); | ||
| const groqKey{ No newline at end of file |
There was a problem hiding this comment.
Remove this code-generation artifact from the repo.
write_tests.py (along with the related write_test.py/do_fix.py/fix*.py scratch scripts) is a throwaway generator, not part of the application. It hardcodes a machine-specific absolute path (/home/vi/freellmapi/…) and the part1 triple-quoted string is left unterminated, so the file is not even valid Python (confirmed by the static-analysis invalid-syntax hint). These artifacts shouldn't be tracked — author the test file directly instead.
Want me to open an issue to remove these generator scripts and add them to .gitignore?
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 7-45: missing closing quote in string literal
(invalid-syntax)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/write_tests.py` around lines 1 - 45, The file write_tests.py is a
throwaway generator with a hardcoded absolute path (path variable) and an
unterminated triple-quoted string (part1) causing syntax errors; remove this
file (and related scratch scripts like write_test.py/do_fix.py/fix*.py) from the
repository, add them to .gitignore, and instead commit the intended test file
directly (the test content currently in the part1 string should be placed into
server/src/__tests__/services/router.test.ts as a regular file); ensure no
references to the removed generator remain in the repo or CI.
BUG-05: Abort upstream provider stream on stall detection by breaking the for-await loop and calling gen.return() when the keepalive timer detects MAX_STREAM_STALL_MS has elapsed without data. BUG-06: Fix cooldown guard to use the actual routable fallback chain (fallback_config JOIN models) instead of all enabled models, ensuring transient cooldowns only skip models that would actually be routed to. BUG-10: Remove double semicolon in proxy.ts. Also adds SSE keep-alive comments during idle periods, transient model cooldown injection before retry loops, and LongCat sticky session cooldown support in balanced routing mode.
Implements the
generalized-thread-protectionspec.Changes
Task 4a: Rules engine in
threadProtection.tsevaluateThreadProtectionwith the actual protection scannerTHREAD_PROTECTION_PLATFORMSenv var (comma-separatedplatform:levelpairs)longcat → provider-ban, all others →model-skipgetProtectionLevel()for sticky cooldown generalization5xx,truncation,retryableerror contexts toThreadProtectionActionrulesTask 4b: Integration into
proxy.tsroute.platform === 'longcat'branches with explicitevaluateThreadProtection()calls:banProvider,skipModel, andclearStickyIfPinnedaction flagslongcatreferences in proxy.tsSummary by Sourcery
Integrate a generalized thread protection and transient cooldown system into the proxy, enhance routing and analytics behavior, and add SSE stream heartbeat/stall protection and wrapped-error handling across providers.
Enhancements:
Documentation:
Tests:
Summary by CodeRabbit
Release Notes
New Features
Improvements