Skip to content

fix: http2 updates shared pool test#792

Closed
tode-rl wants to merge 6 commits into
feat/http2-undicifrom
tode/http2-updates-shared-pool-test
Closed

fix: http2 updates shared pool test#792
tode-rl wants to merge 6 commits into
feat/http2-undicifrom
tode/http2-updates-shared-pool-test

Conversation

@tode-rl
Copy link
Copy Markdown
Collaborator

@tode-rl tode-rl commented May 29, 2026

User description

THIS BRANCH
HTTP/1.1 vs HTTP/2 comparison
┌─────────┬─────────────────┬───────┬───────┬──────────────────────┐
│ (index) │ metric │ http1 │ http2 │ deltaHttp2MinusHttp1 │
├─────────┼─────────────────┼───────┼───────┼──────────────────────┤
│ 0 │ 'health p50 ms' │ 8293 │ 1143 │ -7150 │
│ 1 │ 'health p90 ms' │ 11959 │ 1692 │ -10267 │
│ 2 │ 'health p95 ms' │ 11986 │ 1760 │ -10226 │
│ 3 │ 'health p99 ms' │ 12008 │ 1814 │ -10194 │
│ 4 │ 'wall time ms' │ 31408 │ 1828 │ -29580 │
└─────────┴─────────────────┴───────┴───────┴──────────────────────┘

vs

ORIG BRANCH
HTTP/1.1 vs HTTP/2 comparison
┌─────────┬─────────────────┬───────┬───────┬──────────────────────┐
│ (index) │ metric │ http1 │ http2 │ deltaHttp2MinusHttp1 │
├─────────┼─────────────────┼───────┼───────┼──────────────────────┤
│ 0 │ 'health p50 ms' │ 8804 │ 15276 │ 6471 │
│ 1 │ 'health p90 ms' │ 11578 │ 15551 │ 3973 │
│ 2 │ 'health p95 ms' │ 12021 │ 15906 │ 3884 │
│ 3 │ 'health p99 ms' │ 12047 │ 15935 │ 3887 │
│ 4 │ 'wall time ms' │ 12242 │ 15973 │ 3731 │
└─────────┴─────────────────┴───────┴───────┴──────────────────────┘


CodeAnt-AI Description

Use a bounded HTTP/2 transport for SDK requests and keep HTTP/1.1 as fallback

What Changed

  • SDK requests now use a shared HTTP/2 connection pool so many concurrent calls can reuse a small number of TLS sessions
  • If HTTP/2 is not available, requests fall back to HTTP/1.1 instead of failing
  • Request bodies, streamed responses, and error responses are handled in the new transport path so existing SDK behavior stays intact
  • Added benchmark scripts for health checks and devbox startup to compare HTTP/1.1 and HTTP/2 under load, including connection usage stats

Impact

✅ Fewer crashes during high-concurrency API calls
✅ Faster health checks under load
✅ Lower connection overhead for bulk SDK requests

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@tode-rl tode-rl requested a review from dines-rl May 29, 2026 20:30
@codeant-ai
Copy link
Copy Markdown
Contributor

codeant-ai Bot commented May 29, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai Bot added the size:XL This PR changes 500-999 lines, ignoring generated files label May 29, 2026
Comment thread src/lib/undici-fetch.ts
Comment on lines +169 to +192
private acquire(): Promise<H2SessionEntry> {
const existing = this.sessions.find(
(entry) => !entry.closed && entry.activeStreams < MAX_H2_STREAMS_PER_SESSION,
);
if (existing) {
if (existing.idleTimer) {
clearTimeout(existing.idleTimer);
existing.idleTimer = undefined;
}
existing.session.ref?.();
existing.activeStreams++;
return Promise.resolve(existing);
}

if (this.sessions.filter((entry) => !entry.closed).length < MAX_H2_SESSIONS) {
const created = this.createSession();
created.session.ref?.();
created.activeStreams++;
return Promise.resolve(created);
}

return new Promise((resolve) => {
this.waiters.push(resolve);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Architect Review — HIGH

Pool acquisition in H2Pool does not observe AbortSignal while requests are queued, so requests whose signals are aborted (e.g., due to client timeouts) remain stuck in the waiters queue until a stream is released instead of failing promptly.

Suggestion: Thread the request's AbortSignal into acquire() so queued waiters register an abort listener, remove themselves from this.waiters when aborted, and reject immediately with an AbortError to preserve timeout/cancel semantics under pool saturation.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** src/lib/undici-fetch.ts
**Line:** 169:192
**Comment:**
	*HIGH: Pool acquisition in H2Pool does not observe AbortSignal while requests are queued, so requests whose signals are aborted (e.g., due to client timeouts) remain stuck in the waiters queue until a stream is released instead of failing promptly.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

Comment on lines +33 to 34
const baseURL = "https://api.runloop.ai";
const devboxCount = parsePositiveInteger(process.env.RUNLOOP_E2E_DEVBOX_COUNT, DEFAULT_DEVBOX_COUNT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Architect Review — HIGH

The devbox startup benchmark now hardcodes baseURL to "https://api.runloop.ai" and ignores RUNLOOP_BASE_URL, contradicting the script header and removing the ability to run against staging/custom environments that was previously supported.

Suggestion: Restore env-driven baseURL selection, e.g. baseURL = process.env.RUNLOOP_BASE_URL ?? 'https://api.runloop.ai', so the benchmark can be directed to different environments as documented.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** tests/smoketests/scripts/devbox-startup-benchmark.mjs
**Line:** 33:34
**Comment:**
	*HIGH: The devbox startup benchmark now hardcodes baseURL to "https://api.runloop.ai" and ignores RUNLOOP_BASE_URL, contradicting the script header and removing the ability to run against staging/custom environments that was previously supported.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

Comment on lines +28 to +38
const apiKey = process.env.RUNLOOP_API_KEY_DEV;
const baseURL = process.env.RUNLOOP_BASE_URL ?? 'https://api.runloop.pro';
const requestCount = parsePositiveInteger(
process.env.RUNLOOP_E2E_HEALTH_REQUEST_COUNT ?? process.env.RUNLOOP_E2E_DEVBOX_COUNT,
DEFAULT_REQUEST_COUNT,
);
const resultsPath = process.env.RUNLOOP_E2E_RESULTS_PATH ?? defaultResultsPath();

if (!apiKey) {
console.error('RUNLOOP_API_KEY is required');
process.exit(2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Architect Review — CRITICAL

The health benchmark reads RUNLOOP_API_KEY_DEV but its error message and header documentation refer to RUNLOOP_API_KEY, so running it as documented (with RUNLOOP_API_KEY set) causes the script to fail claiming the key is missing.

Suggestion: Align the environment variable name across the code, error message, and header comment—either switch to RUNLOOP_API_KEY everywhere or consistently use RUNLOOP_API_KEY_DEV—so users can run the benchmark exactly as instructed.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** tests/smoketests/scripts/health-endpoint-benchmark.mjs
**Line:** 28:38
**Comment:**
	*CRITICAL: The health benchmark reads RUNLOOP_API_KEY_DEV but its error message and header documentation refer to RUNLOOP_API_KEY, so running it as documented (with RUNLOOP_API_KEY set) causes the script to fail claiming the key is missing.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

Comment thread src/lib/undici-fetch.ts
Comment on lines +169 to +193
private acquire(): Promise<H2SessionEntry> {
const existing = this.sessions.find(
(entry) => !entry.closed && entry.activeStreams < MAX_H2_STREAMS_PER_SESSION,
);
if (existing) {
if (existing.idleTimer) {
clearTimeout(existing.idleTimer);
existing.idleTimer = undefined;
}
existing.session.ref?.();
existing.activeStreams++;
return Promise.resolve(existing);
}

if (this.sessions.filter((entry) => !entry.closed).length < MAX_H2_SESSIONS) {
const created = this.createSession();
created.session.ref?.();
created.activeStreams++;
return Promise.resolve(created);
}

return new Promise((resolve) => {
this.waiters.push(resolve);
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Queued HTTP/2 requests do not observe abort signals while waiting for a free stream slot. If the pool is saturated and a caller aborts, the promise remains pending until another request releases a slot, which breaks expected abort semantics and can cause hung requests under load. Thread the signal into acquire/waiter logic and remove/reject queued waiters immediately on abort. [logic error]

Severity Level: Critical 🚨
- ❌ HTTP/2 requests ignore abort while waiting for capacity.
- ❌ Timeouts in APIClient.fetchWithTimeout fire late under load.
- ⚠️ Health benchmark may hang under high-concurrency HTTP/2 runs.
Steps of Reproduction ✅
1. In `package.json:22-23`, run the HTTP/2 health benchmark via `yarn test:e2e:health`,
which executes `tests/smoketests/scripts/health-endpoint-benchmark.mjs`.

2. In `tests/smoketests/scripts/health-endpoint-benchmark.mjs:46-49,81-89`, the HTTP/2
pass constructs a `Runloop` client with `{ http2: true }`, so in `src/index.ts:362-380`
the client uses `http2Fetch`, which is wired to `undiciFetch` via
`src/_shims/node-runtime.ts:61-71`.

3. For each request, `Runloop` calls `APIClient.makeRequest()` in `src/core.ts:219-257`,
which uses `fetchWithTimeout()` at `src/core.ts:55-82` with a new `AbortController`; this
passes a `signal` into `undiciFetch` at `src/lib/undici-fetch.ts:362-382`.

4. Under high concurrency (more than `MAX_H2_SESSIONS * MAX_H2_STREAMS_PER_SESSION` = 4 *
64 requests in flight to the same origin), `H2Pool.acquire()` at
`src/lib/undici-fetch.ts:169-193` enqueues additional requests in `this.waiters` and
returns a pending promise without attaching any abort listener, so if the timeout
controller or caller aborts the signal while the request is queued, the promise returned
by `H2Pool.request()` (and therefore by `undiciFetch` and `fetchWithTimeout`) remains
pending until some other request releases a stream and `drainWaiters()` assigns an entry,
delaying or effectively hanging the abort/timeout instead of failing promptly.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/lib/undici-fetch.ts
**Line:** 169:193
**Comment:**
	*Logic Error: Queued HTTP/2 requests do not observe abort signals while waiting for a free stream slot. If the pool is saturated and a caller aborts, the promise remains pending until another request releases a slot, which breaks expected abort semantics and can cause hung requests under load. Thread the signal into acquire/waiter logic and remove/reject queued waiters immediately on abort.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment thread src/lib/undici-fetch.ts
Comment on lines +325 to +341
private drainWaiters() {
while (this.waiters.length > 0) {
const entry = this.sessions.find(
(candidate) => !candidate.closed && candidate.activeStreams < MAX_H2_STREAMS_PER_SESSION,
);
if (!entry) return;

if (entry.idleTimer) {
clearTimeout(entry.idleTimer);
entry.idleTimer = undefined;
}
entry.session.ref?.();
entry.activeStreams++;
const resolve = this.waiters.shift();
resolve?.(entry);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Waiters can deadlock when all sessions are closed and no currently open session has capacity. drainWaiters returns without creating a replacement session even when the pool is below MAX_H2_SESSIONS, leaving queued requests unresolved indefinitely. Ensure waiter draining can spawn a new session when capacity exists but no eligible entry is found. [race condition]

Severity Level: Critical 🚨
- ❌ Queued HTTP/2 requests can remain pending indefinitely after closes.
- ⚠️ Health benchmark run may never complete when connections reset.
- ⚠️ Devbox startup benchmark may hang under extreme connection churn.
Steps of Reproduction ✅
1. Run the HTTP/2 health benchmark as in `package.json:23` (`yarn test:e2e:health`), which
drives many concurrent HTTP/2 requests through `Runloop` (see
`tests/smoketests/scripts/health-endpoint-benchmark.mjs:46-49,81-99`).

2. The HTTP/2 client path uses `undiciFetch` as `http2Fetch` via `src/index.ts:362-380`
and `src/_shims/node-runtime.ts:61-71`, so all HTTPS calls eventually go through
`H2Pool.request()` in `src/lib/undici-fetch.ts:151-167`, which calls `this.acquire()` at
`169-193` and maintains a bounded set of sessions (`MAX_H2_SESSIONS`) and streams per
session (`MAX_H2_STREAMS_PER_SESSION`).

3. When load is high enough that all sessions are created (4 sessions) and every session
has `activeStreams >= MAX_H2_STREAMS_PER_SESSION`, further `acquire()` calls fall into the
waiter path at `src/lib/undici-fetch.ts:190-193`, pushing resolver callbacks into
`this.waiters` and returning promises that will only resolve when `drainWaiters()` assigns
a session entry.

4. If all existing HTTP/2 sessions subsequently close (e.g., due to server-side GOAWAY or
network failure), each session's `'close'` handler in `createSession()`
(`src/lib/undici-fetch.ts:218-223`) calls `this.drainWaiters()`, but `drainWaiters()` at
`325-341` finds no non-closed session (`entry` is `undefined`) and returns early without
creating new sessions or resolving `this.waiters`, leaving all queued `acquire()` promises
unresolved indefinitely even though `this.sessions.filter(e => !e.closed).length <
MAX_H2_SESSIONS` and new sessions could be created.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/lib/undici-fetch.ts
**Line:** 325:341
**Comment:**
	*Race Condition: Waiters can deadlock when all sessions are closed and no currently open session has capacity. `drainWaiters` returns without creating a replacement session even when the pool is below `MAX_H2_SESSIONS`, leaving queued requests unresolved indefinitely. Ensure waiter draining can spawn a new session when capacity exists but no eligible entry is found.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎


const apiKey = process.env.RUNLOOP_API_KEY;
const baseURL = process.env.RUNLOOP_BASE_URL;
const baseURL = "https://api.runloop.ai";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The benchmark now hardcodes the API base URL, so RUNLOOP_BASE_URL can no longer override the endpoint as documented. This causes runs to hit the wrong environment in staging/dev setups; restore env-based configuration with a default fallback. [logic error]

Severity Level: Major ⚠️
- ❌ Devbox benchmark always targets api.runloop.ai regardless environment.
- ⚠️ RUNLOOP_BASE_URL documentation becomes misleading for benchmark users.
- ⚠️ Risk of accidental load tests against production backend.
Steps of Reproduction ✅
1. From `package.json:22`, run the devbox benchmark as documented: `RUNLOOP_API_KEY=...
RUNLOOP_BASE_URL=https://api.runloop.pro yarn test:e2e:devbox-startup`.

2. The script `tests/smoketests/scripts/devbox-startup-benchmark.mjs` documents
`[RUNLOOP_BASE_URL=...]` usage in its header (`lines 1-8`), but at `line 33` sets `const
baseURL = "https://api.runloop.ai";`, ignoring the `RUNLOOP_BASE_URL` environment
variable.

3. In `runTransportPass()` at
`tests/smoketests/scripts/devbox-startup-benchmark.mjs:84-92`, the `Runloop` client is
created with this hardcoded `baseURL`, so despite `RUNLOOP_BASE_URL` being set to
`https://api.runloop.pro`, all devbox create/await/execute/shutdown calls go to
`https://api.runloop.ai` through the SDK.

4. As a result, the benchmark always targets the production API host, violating the
documented environment-based configuration and causing unintended load and resource
creation in the wrong backend when developers expect to test against staging or a custom
environment.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/scripts/devbox-startup-benchmark.mjs
**Line:** 33:33
**Comment:**
	*Logic Error: The benchmark now hardcodes the API base URL, so `RUNLOOP_BASE_URL` can no longer override the endpoint as documented. This causes runs to hit the wrong environment in staging/dev setups; restore env-based configuration with a default fallback.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +28 to +38
const apiKey = process.env.RUNLOOP_API_KEY_DEV;
const baseURL = process.env.RUNLOOP_BASE_URL ?? 'https://api.runloop.pro';
const requestCount = parsePositiveInteger(
process.env.RUNLOOP_E2E_HEALTH_REQUEST_COUNT ?? process.env.RUNLOOP_E2E_DEVBOX_COUNT,
DEFAULT_REQUEST_COUNT,
);
const resultsPath = process.env.RUNLOOP_E2E_RESULTS_PATH ?? defaultResultsPath();

if (!apiKey) {
console.error('RUNLOOP_API_KEY is required');
process.exit(2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The script reads RUNLOOP_API_KEY_DEV but validates as if RUNLOOP_API_KEY should be set, which breaks the documented invocation and causes false missing-key failures in normal environments. Read the same variable name you require in docs/error output (or support both consistently). [incorrect variable usage]

Severity Level: Critical 🚨
- ❌ Health benchmark exits immediately despite RUNLOOP_API_KEY being set.
- ⚠️ E2E health performance tests cannot run without workaround.
- ⚠️ Error messaging misleads about correct environment variable name.
Steps of Reproduction ✅
1. Follow the script header in
`tests/smoketests/scripts/health-endpoint-benchmark.mjs:1-7` and run `RUNLOOP_API_KEY=...
yarn test:e2e:health` (script wired in `package.json:23`).

2. At the top of the script, the API key is read from `const apiKey =
process.env.RUNLOOP_API_KEY_DEV;` (`line 28`), but since only `RUNLOOP_API_KEY` was set
(per docs), `apiKey` will be `undefined`.

3. Immediately afterward, the validation block at `lines 36-39` checks `if (!apiKey) {
console.error('RUNLOOP_API_KEY is required'); process.exit(2); }`, emitting an error about
`RUNLOOP_API_KEY` even though that environment variable is set and exiting with status
code 2.

4. This mismatch between the referenced variable (`RUNLOOP_API_KEY_DEV`) and the
documented/validated name (`RUNLOOP_API_KEY`) means the health benchmark cannot be run as
documented without either setting an unexpected `RUNLOOP_API_KEY_DEV` variable or
modifying the script.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/scripts/health-endpoint-benchmark.mjs
**Line:** 28:38
**Comment:**
	*Incorrect Variable Usage: The script reads `RUNLOOP_API_KEY_DEV` but validates as if `RUNLOOP_API_KEY` should be set, which breaks the documented invocation and causes false missing-key failures in normal environments. Read the same variable name you require in docs/error output (or support both consistently).

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@codeant-ai
Copy link
Copy Markdown
Contributor

codeant-ai Bot commented May 29, 2026

CodeAnt AI finished reviewing your PR.

Copy link
Copy Markdown

@charliecreates charliecreates Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focused this pass on transport correctness under saturation/failure paths and benchmark-script runnability.

Blocking feedback

  1. Queued requests don't observe abort/timeouts while waiting for an H2 pool slot — src/lib/undici-fetch.ts#L190
Non-blocking feedback (2)
  1. Health benchmark reads a different API key env var than it documents — tests/smoketests/scripts/health-endpoint-benchmark.mjs#L28
    Running yarn test:e2e:health with the documented RUNLOOP_API_KEY currently fails before any requests are made because the script reads RUNLOOP_API_KEY_DEV.
    Using one env var name consistently in the usage comment, code, and error text will make the benchmark runnable as documented.

  2. Devbox benchmark docs still advertise RUNLOOP_BASE_URL, but the script now hardcodes prod — tests/smoketests/scripts/devbox-startup-benchmark.mjs#L33
    This prevents running the benchmark against staging/custom endpoints despite the header comment saying it’s supported.
    If the hardcoded URL is intentional, update the usage docs; otherwise restore the env override (process.env.RUNLOOP_BASE_URL ?? 'https://api.runloop.ai').

If you want Charlie to apply any of these, reply with the item numbers (for example: please fix 1 or please fix 1,3).

Comment thread src/lib/undici-fetch.ts
return Promise.resolve(created);
}

return new Promise((resolve) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the pool is saturated, acquire() parks requests in this.waiters but does not observe AbortSignal while they’re queued.

That means requests aborted by fetchWithTimeout() can stay pending until a stream frees up (or indefinitely if saturation persists), so timeout/cancellation semantics drift under load.

Suggested fix: thread init.signal into acquire(), register an abort listener for queued waiters, remove the waiter from this.waiters on abort, and reject immediately with AbortError.

Copy link
Copy Markdown
Contributor

@sid-rl sid-rl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — bounded HTTP/2 transport (H2Pool)

This rewrites the undici-fetch.ts adapter from #791: the HTTPS path is now a hand-rolled node:http2 connection pool, with undici kept only for the WHATWG Response type and the HTTP/1.1 fallback.

Is the perf motivation real? Yes — confirmed. I ran a local multiplexing benchmark (self-signed h2 server, 1000 concurrent requests): #791's undici Agent({allowH2}) opens one connection per request (1000 conns / 1000 sessions — no multiplexing), matching this PR description's own numbers where #791's h2 is slower than h1. This pool uses 4 sessions and was ~30× faster. A bounded undici Pool (the "stay in one stack" alternative) multiplexes connections but is as slow as h1 — empirically a dead end. So "the 2nd PR is the perf bump #791 lacks" holds, and the hand-rolled pool is justified.

The catch: the pool that delivers the win is bespoke concurrency with two real bugs, and it has zero test coverage. Net: directionally right, not yet safe to own. Highlights (details inline):

  • 🔴 Queued requests can't be aborted or time out — undici-fetch.ts:191
  • 🔴 Waiters can hang forever if sessions die — undici-fetch.ts:330
  • 🟠 No test exercises H2Pool — every case hits the h1 fallback — tests/lib/undici-fetch.test.ts
  • 🟠 h1-only HTTPS origins re-attempt h2 on every request (no negative cache) — undici-fetch.ts:160
  • 🟡 H2 path ignores redirects (:379); fake undici diagnostics event (:212); 64-stream cap ignores server SETTINGS (:27); inaccurate header docs (:16)

On the questions asked:

  • Cleanest approach? For real multiplexing, yes — but it's a second hand-maintained HTTP stack, so the tests/robustness bar is higher than this currently clears.
  • Maintainable? ~250 lines of acquire/release/waiter-queue/idle-timer with the failure modes above; fix the two 🔴s + add coverage first.
  • Conventions? Good — lives in src/lib/ so it survives regeneration, and touches no generated files in this PR's own diff.
  • Docs? Header comment is now self-contradictory (inline).
  • Tests? The single highest-value addition is one https/ALPN-h2 it-test (the bench harness already stands one up) asserting connection reuse ≤ 4, abort-while-saturated rejects, and SSE/multipart/binary round-trip over h2 — that covers the multiplexing claim and both 🔴 bugs at once.

One unverified claim: the "undici 6.26.0 can assert-crash under H2" rationale (:5-6) is plausible but I could not reproduce it in a local harness — worth linking the upstream undici issue/stack in the PR so the core justification is auditable.

(Bench caveats: node 25 locally vs node 20 in CI, undici 6.26.0, trivial local server — relative ordering robust, absolute ms illustrative.)

Comment thread src/lib/undici-fetch.ts
}

return new Promise((resolve) => {
this.waiters.push(resolve);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Concurrency (blocking): queued requests ignore abort and timeout.

When all MAX_H2_SESSIONS (4) sessions are full — 4 × MAX_H2_STREAMS_PER_SESSION (64) = 256 in-flight streams — the caller is parked here with only a resolve: no reject, and no init.signal wiring. The abort listener is attached later, inside dispatch() (line 272), which only runs after acquire() resolves. So a request that the SDK times out (core.ts aborts the signal) or the caller cancels stays stuck in waiters until a slot frees — the abort/timeout is silently ignored while queued, and request() (line 152) awaits it with no escape hatch.

Suggest parking { resolve, reject, signal } and attaching an abort listener at enqueue time that removes the waiter and rejects with abortError().

Comment thread src/lib/undici-fetch.ts
const entry = this.sessions.find(
(candidate) => !candidate.closed && candidate.activeStreams < MAX_H2_STREAMS_PER_SESSION,
);
if (!entry) return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Concurrency (blocking): waiters can hang forever when sessions die.

drainWaiters() only ever reuses a live session; if none has capacity it returns, leaving waiters parked. The session close handler (line 218) and error handler (line 225) remove/mark sessions and then call drainWaiters(). If every session closes (server GOAWAY, network drop, idle-timer close racing a queued request) while requests are queued, there is no live entry, this returns, and nothing creates a replacement session — those waiters never resolve or reject. Combined with the missing abort wiring above, the request hangs until the process exits.

Suggest: if waiters.length > 0 and the live-session count is < MAX_H2_SESSIONS, create a new session here; otherwise reject the waiters that can no longer be served.

Comment thread src/lib/undici-fetch.ts
throw error;
}

if (entry.alpnProtocol !== 'h2') {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Robustness/perf: h1-only HTTPS origins re-attempt (and fail) h2 on every request.

When an HTTPS origin doesn't negotiate h2, this closes the session and throws; undiciFetch catches it (line 389) and retries over the h1 fallback — correct for one request. But nothing records that the origin is h1-only: getPool() (line 126) keeps the pool, so the next request opens a fresh http2.connect, completes TLS, fails ALPN here, closes, throws, and falls back again. Every request to such an origin pays two connections (failed h2 + h1) indefinitely. Runloop's own API presumably speaks h2 so this never triggers in practice, but any h1-only base URL degrades badly.

Suggest caching a per-origin "h2 unsupported" flag after the first ALPN miss (and routing straight to the fallback thereafter).

Comment thread src/lib/undici-fetch.ts
...rest,
body,
body: rawBody,
maxRedirections: redirect === 'manual' || redirect === 'error' ? 0 : 20,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Correctness asymmetry: the H2 path ignores redirects.

maxRedirections is computed here and passed through, but only undiciFallbackFetch (the h1 path, via undiciRequest) honors it. dispatch() never reads maxRedirections/redirect — it returns the raw 3xx Response. So an HTTPS request that redirects behaves differently depending on whether it went over h2 (returns the redirect) or fell back to h1 (follows up to 20). Worth confirming core.ts doesn't rely on transport-level redirect following; if it does, the h2 path needs to replay on 3xx.

Comment thread src/lib/undici-fetch.ts
entry.ready = new Promise<void>((resolve, reject) => {
session.once('connect', () => {
entry.alpnProtocol = (session.socket as any).alpnProtocol;
connectedChannel.publish({ socket: session.socket });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Cleanliness: production transport publishes a fake undici diagnostics event.

This publishes on the undici:client:connected channel (declared line 40) from the node:http2 path purely so the benchmark's diagnostics_channel subscriber (in devbox-startup-benchmark.mjs) counts these as "undici connections." It couples production transport code to a benchmark's measurement method, and the channel name is now inaccurate (no undici client connected). If connection accounting is wanted long-term, expose a counter/method on H2Pool directly rather than impersonating undici's channel; otherwise drop it.

Comment thread src/lib/undici-fetch.ts
keepAliveTimeout: 10 * 60 * 1000,
keepAliveMaxTimeout: 10 * 60 * 1000,
const MAX_H2_SESSIONS = 4;
const MAX_H2_STREAMS_PER_SESSION = 64;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Robustness: the client-side 64-stream cap ignores the server's advertised limit.

acquire()/drainWaiters() gate on activeStreams < MAX_H2_STREAMS_PER_SESSION (64), but the real ceiling is the server's SETTINGS_MAX_CONCURRENT_STREAMS. If the server advertises fewer than 64, the pool thinks a session has capacity and dispatches anyway; node:http2 then refuses the excess and NGHTTP2_REFUSED_STREAM surfaces as a hard request error here (no retry path). Consider reading session.remoteSettings.maxConcurrentStreams once connect fires and using min(64, serverLimit).

Comment thread src/lib/undici-fetch.ts
@@ -17,65 +16,381 @@
* regeneration; the only generated file touched is the one-line wiring change
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Docs: this header block is now inaccurate.

Two stale claims inherited from #791:

  1. Line 13 — "no second HTTP stack to keep in sync." This PR adds exactly that: a hand-rolled node:http2 stack alongside undici, which directly contradicts lines 4–6.
  2. Lines 16–17 — "the only generated file touched is the one-line wiring change in src/_shims/node-runtime.ts." feat: opt-in HTTP/2 multiplexing transport via undici 7 (Node >= 20.18.1) #791 actually touched 6 generated files; this framing undersells the surface area.

Please update the header to describe the dual-stack reality and drop the "only one generated file" claim.

bearerToken: 'test-token',
baseURL,
maxRetries: 0,
http2: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Tests (important): none of these exercise H2Pool — the core of the PR.

The test server is http.createServerbaseURL = http://127.0.0.1:… (line 50). undiciFetch routes every non-https: URL straight to undiciFallbackFetch (undici-fetch.ts:384), so even these http2: true cases (and the direct undiciFetch calls at lines 94/114) go through the h1 fallback — never the node:http2 pool. The ~250-line pool (acquire/release/waiter-queue/drain/idle-timer, i.e. exactly the code with the concurrency bugs flagged on the implementation) has zero coverage.

Per our "high-value e2e/it" philosophy, the highest-value addition is one https server with a self-signed cert advertising ALPN h2 (the h2-transport-bench.mjs harness already sets this up) asserting: (a) N concurrent requests reuse ≤ MAX_H2_SESSIONS sessions; (b) an aborted/timed-out request while the pool is saturated actually rejects; (c) SSE + multipart + binary bodies round-trip over h2. That single it-test would cover the multiplexing claim and both 🔴 bugs at once.


const apiKey = process.env.RUNLOOP_API_KEY;
const baseURL = process.env.RUNLOOP_BASE_URL;
const baseURL = "https://api.runloop.ai";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: hardcoding the prod base URL removes the env override.

This replaces process.env.RUNLOOP_BASE_URL with a literal https://api.runloop.ai, so the benchmark can no longer be pointed at dev/staging — and the sibling health-endpoint-benchmark.mjs still reads RUNLOOP_BASE_URL. Looks like leftover debugging; suggest process.env.RUNLOOP_BASE_URL ?? 'https://api.runloop.ai'.

@sid-rl
Copy link
Copy Markdown
Contributor

sid-rl commented May 29, 2026

Follow-up / correction to my review above — the undici H2 crash claim, and a cleaner alternative.

In my review I said the "undici 6.26.0 can assert-crash under H2" rationale was plausible but I couldn't reproduce it. I've now reproduced it — it's real — and digging into why turned up an alternative worth weighing against the hand-rolled pool.

The mechanism (and why a naive test misses it). undici only multiplexes H2 streams when pipelining is raised above its default of 1 (getPipelining()client[kPipelining] ?? 1; the dispatch loop stops feeding a connection at kRunning >= pipelining). So:

  • Agent({allowH2:true}) as in feat: opt-in HTTP/2 multiplexing transport via undici 7 (Node >= 20.18.1) #791pipelining:1, unbounded connectionsone TLS connection per concurrent request (no multiplexing; a connection storm under load).
  • Pool({allowH2:true, connections:4}) — pools connections but, still at pipelining:1, runs one stream per session at a time → as slow as HTTP/1.1.
  • Pool/Agent({allowH2:true, connections:4, pipelining:64})actually multiplexes (few sessions, fast)… and crashes on undici 6.26.0.

The crash, reproduced (local self-signed h2 server, 100–500 concurrent GETs):

AssertionError: assert(!this.completed)
  at Request.onHeaders (undici/lib/core/request.js:241)
  at ClientHttp2Stream.<anonymous> (undici/lib/dispatcher/client-h2.js:462)

~1660 of these per 100 requests once multiplexing is on. The crash only appears under the multiplexed config that makes H2 worthwhile — which is exactly why a single-stream test never trips it.

It's fixed upstream. undici PR #4845 ("fix(h2): ignore late data frames after request completion") shipped in v7.23.0. On undici 8.3.0 the same connections:4, pipelining:64 config that crashed on 6.26.0 runs 0 crashes, a few sessions for 500 concurrent requests, wall-time on par with this PR's node:http2 pool. (We currently pin undici ^6.26.0; latest is 8.3.0.)

config (100 concurrent, 20ms server delay, node 25) conns sessions crashes wall
Agent #791 (6.26, pipe=1, unbounded) 100 100 0 145ms
Pool conns=4 pipe=1 (6.26) 4 4 0 580ms
Pool conns=4 pipe=64 (6.26) 4 4 1661 75ms
Pool conns=4 pipe=64 (8.3) 4 4 0 44ms
this PR's node:http2 pool (any undici) 2 2 0 42ms

What this means for the approach. The hand-rolled node:http2 pool here is a legitimate way to get multiplexing on the undici version we ship today. But if we're open to bumping undici to ≥7.23.0, the same result comes from ~5 lines of dispatcher config — new Agent({ allowH2:true, connections:N, pipelining:M }) (plus a separate pipelining:1 dispatcher for the HTTP/1.1 fallback) — and undici owns the stream lifecycle, so the waiter-queue / abort-while-queued / drain-on-session-loss issues I flagged inline simply don't exist. The trade is a two-major dependency bump (6→8) that needs a compat pass + re-validation on node 20.

I'll put up a separate branch with that alternative so we can compare head-to-head rather than in the abstract. Not arguing this PR should be dropped — just that the crash is real and now has a much smaller fix.

(Bench caveats: node 25 locally + a trivial TLS server, so absolute ms are illustrative; the version-level facts — crash on 6.26, clean on 8.3 — are the robust part. Harness lives at tests/smoketests/scripts/h2-transport-bench.mjs.)

@tode-rl
Copy link
Copy Markdown
Collaborator Author

tode-rl commented May 29, 2026

Covered by #793

@tode-rl tode-rl closed this May 29, 2026
@tode-rl tode-rl deleted the tode/http2-updates-shared-pool-test branch May 29, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants