Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 45 additions & 17 deletions services/sandbox/docs/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,18 +83,18 @@ stream. Keep it out so a stray exec call fails closed.

## Environment

| Env | Required | Notes |
| ----------------------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `SANDBOX_BACKEND=kubernetes` | yes | Selects this backend. |
| `SANDBOX_SPAWNER_IMAGE` | yes | The spawner's **own** image ref, used for the `stage`/`harvest` containers. Must match the deployed spawner version. |
| `SANDBOX_RUNTIME_IMAGE` | yes | The `runner` image (`tale-sandbox-runtime:<tag>`). |
| `SANDBOX_K8S_NAMESPACE` | yes | Namespace the per-exec Pods/Secrets are created in (default `tale-sandbox`). |
| `NODE_EXTRA_CA_CERTS` | in-cluster | Point at the SA `ca.crt` (`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`). **Bun's fetch ignores the kubeconfig CA**, so without this the apiserver TLS isn't trusted. |
| `SANDBOX_RUNTIME=runsc` | optional | Sets the Pod `runtimeClassName` (gVisor) via `SANDBOX_RUNTIME_CLASS` (default `gvisor`). |
| `SANDBOX_CACHE=pvc` | optional | Mounts per-org RWX cache PVCs on the runner; needs the PVC RBAC above + an RWX StorageClass. Default `none` (installs fresh each run via the egress proxy). |
| `SANDBOX_K8S_WORKSPACE_SIZE_LIMIT` | optional | `sizeLimit` on the per-exec `/user` emptyDir (default `4Gi`). Bounds deps + temp + outputs; exceeding it evicts the Pod. |
| `SANDBOX_EGRESS_PROXY` | optional | The runner's `HTTPS_PROXY`/`HTTP_PROXY` for `pip`/`npm` (default `http://sandbox-egress:3128`). |
| `SANDBOX_K8S_SERVER` / `SANDBOX_K8S_TOKEN` / `SANDBOX_K8S_CAFILE` | dev only | Explicit bearer-token kubeconfig for local Bun dev (kind's client-cert kubeconfig auths as `system:anonymous` under Bun). In-cluster uses the projected SA token automatically. |
| Env | Required | Notes |
| ----------------------------------------------------------------- | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `SANDBOX_BACKEND=kubernetes` | yes | Selects this backend. |
| `SANDBOX_SPAWNER_IMAGE` | yes | The spawner's **own** image ref, used for the `stage`/`harvest` containers. Must match the deployed spawner version. |
| `SANDBOX_RUNTIME_IMAGE` | yes | The `runner` image (`tale-sandbox-runtime:<tag>`). |
| `SANDBOX_K8S_NAMESPACE` | yes | Namespace the per-exec Pods/Secrets are created in (default `tale-sandbox`). |
| `NODE_EXTRA_CA_CERTS` | in-cluster | Point at the SA `ca.crt` (`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`). Kubernetes API calls are trusted via `caFile` → Agent (see TLS note below), but native `fetch()` calls (e.g. harvest presigned-URL uploads) bypass the Agent and require this variable to trust the apiserver CA. |
| `SANDBOX_RUNTIME=runsc` | optional | Sets the Pod `runtimeClassName` (gVisor) via `SANDBOX_RUNTIME_CLASS` (default `gvisor`). |
| `SANDBOX_CACHE=pvc` | optional | Mounts per-org RWX cache PVCs on the runner; needs the PVC RBAC above + an RWX StorageClass. Default `none` (installs fresh each run via the egress proxy). |
| `SANDBOX_K8S_WORKSPACE_SIZE_LIMIT` | optional | `sizeLimit` on the per-exec `/user` emptyDir (default `4Gi`). Bounds deps + temp + outputs; exceeding it evicts the Pod. |
| `SANDBOX_EGRESS_PROXY` | optional | The runner's `HTTPS_PROXY`/`HTTP_PROXY` for `pip`/`npm` (default `http://sandbox-egress:3128`). |
| `SANDBOX_K8S_SERVER` / `SANDBOX_K8S_TOKEN` / `SANDBOX_K8S_CAFILE` | dev only | Explicit bearer-token kubeconfig for local Bun dev (kind's client-cert kubeconfig auths as `system:anonymous` under Bun — see TLS note below). In-cluster uses the projected SA token automatically. |

The Pod sets `automountServiceAccountToken: false` — the runtime never gets an
SA token.
Expand All @@ -121,11 +121,39 @@ process is free to ignore. The proxy itself is open at the hostname layer by
default (`SANDBOX_EGRESS_ALLOWLIST` opt-in), so this NetworkPolicy is the
_only_ egress fence on k8s; do not run untrusted workloads without it.

## TLS contract under Bun

`@kubernetes/client-node@1.4.0` routes every API call through **node-fetch v2**
(`gen/http/isomorphic-fetch.js`), which calls `https.request()` with an
`https.Agent` — **not** Bun's native `fetch()`. Through `node:https`, Bun
**does** honour TLS options set on the Agent:

| Kubeconfig knob | Effect | Honoured by Bun? |
| ------------------------------------ | --------------------------------------------- | ------------------------------------------------------------------------------------------ |
| `skipTLSVerify: true` | sets `rejectUnauthorized: false` on the Agent | **Yes** — real security relaxation, not dead code |
| `caFile` / `caData` | reads CA bytes into `agent.options.ca` | **Yes** — CA is used for server-cert verification |
| `certFile` / `keyFile` (client cert) | sets `cert` / `key` on the Agent | **No** — Bun's TLS stack does not support mTLS client certs; use bearer-token auth instead |

**Practical consequences:**

- `SANDBOX_K8S_SERVER` + `SANDBOX_K8S_TOKEN` without `SANDBOX_K8S_CAFILE` falls
back to `skipTLSVerify: true`. This **disables TLS certificate verification**
for the apiserver — intentional for local dev against self-signed clusters, but
must never be used in production.
- `SANDBOX_K8S_CAFILE` enables proper CA verification through the Agent. Set it
to the cluster's CA bundle when the apiserver's cert is issued by a private CA.
- `NODE_EXTRA_CA_CERTS` should **also** be set in-cluster to the SA `ca.crt`
path (`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`) as a
belt-and-suspenders measure: any native `fetch()` calls (e.g. harvest
presigned-URL uploads) go through Bun's native TLS stack, which reads this env
var but ignores `agent.options.ca`.

## Verification status

Unit-tested (no cluster): the Pod shape + the security invariant (Secret never
on the runner), the ExecSpec Secret payload + round-trip, and the
`__TALE_RESULT__` result-line protocol. The on-cluster reliability bar (50+
consecutive real executions ~100% pass, 2-replica scale, cancel-across-replicas)
is **pending a healthy cluster** and must be run before enabling this backend in
production.
on the runner), the ExecSpec Secret payload + round-trip, the
`__TALE_RESULT__` result-line protocol, and the `skipTLSVerify`/`caFile` TLS
knob propagation through the Agent (see `k8s-client.test.ts`). The on-cluster
reliability bar (50+ consecutive real executions ~100% pass, 2-replica scale,
cancel-across-replicas) is **pending a healthy cluster** and must be run before
enabling this backend in production.
109 changes: 108 additions & 1 deletion services/sandbox/src/backend/kubernetes/k8s-client.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,14 @@
// forever-hung execute()) and withRetry's retryability gate.

import { describe, expect, test } from 'bun:test';
import { writeFileSync, rmSync } from 'node:fs';
import type { AgentOptions } from 'node:https';

import { HttpMethod, RequestContext } from '@kubernetes/client-node';
import {
HttpMethod,
KubeConfig,
RequestContext,
} from '@kubernetes/client-node';

import { apiTimeout, httpStatusCode, withRetry } from './k8s-client.ts';

Expand Down Expand Up @@ -110,3 +116,104 @@ describe('withRetry', () => {
expect(Date.now() - start).toBeLessThan(1_000);
});
});

// Verify that @kubernetes/client-node@1.4.0 TLS knobs are NOT inert under Bun.
//
// The library routes every request through node-fetch v2
// (gen/http/isomorphic-fetch.js: `import fetch from "node-fetch"`), which
// calls https.request() with the configured https.Agent — NOT Bun's native
// fetch(). Bun's node:https honours rejectUnauthorized and ca on an
// https.Agent, so skipTLSVerify and caFile ARE effective and carry real
// security semantics.
//
// The limitation documented in the AUTH NOTE of k8s-client.ts is specific to
// client-cert auth (cert/key options) — Bun's TLS stack does not support
// mutual TLS client certificates. skipTLSVerify and caFile are distinct knobs
// that work as intended.
describe('kubeconfig TLS knobs under Bun', () => {
function makeKc(clusterExtra: Record<string, unknown>) {
const kc = new KubeConfig();
kc.loadFromOptions({
clusters: [
{ name: 'k', server: 'https://k8s.local:6443', ...clusterExtra },
],
users: [{ name: 'sa', token: 'tok' }],
contexts: [{ name: 'c', cluster: 'k', user: 'sa' }],
currentContext: 'c',
});
return kc;
}

// applySecurityAuthentication constructs an https.Agent whose constructor
// options are stored in agent.options (node:https parity with Node.js).
// We extract them via a typed cast to AgentOptions to avoid unsafe any.
function getAgentOpts(ctx: RequestContext): AgentOptions | undefined {
const agent = ctx.getAgent();
if (!agent) return undefined;
// https.Agent stores its constructor options in agent.options; the property
// is present at runtime but narrower than the union type suggests.
// oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion
return (agent as unknown as { options: AgentOptions }).options;
}
Comment on lines +150 to +157

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and read the file mentioned in the review
find . -name "k8s-client.test.ts" -type f

Repository: tale-project/tale

Length of output: 122


🏁 Script executed:

# Also search for the main implementation file (non-test version)
find . -name "k8s-client.ts" -type f | grep -v test

Repository: tale-project/tale

Length of output: 117


🏁 Script executed:

# Search for AgentOptions definition
rg "type AgentOptions|interface AgentOptions" --type ts --type tsx

Repository: tale-project/tale

Length of output: 89


🏁 Script executed:

# Search for RequestContext definition
rg "class RequestContext|interface RequestContext|type RequestContext" --type ts --type tsx -A 5

Repository: tale-project/tale

Length of output: 89


🏁 Script executed:

# Read the test file to see the actual code
cat -n ./services/sandbox/src/backend/kubernetes/k8s-client.test.ts | sed -n '145,165p'

Repository: tale-project/tale

Length of output: 1187


🏁 Script executed:

# Read the implementation file to see AgentOptions usage
cat -n ./services/sandbox/src/backend/kubernetes/k8s-client.ts | head -100

Repository: tale-project/tale

Length of output: 4864


🏁 Script executed:

# Search for AgentOptions definition
rg "type AgentOptions|interface AgentOptions" --type ts

Repository: tale-project/tale

Length of output: 43


🏁 Script executed:

# Search for RequestContext definition more broadly
rg "RequestContext" ./services/sandbox/src/backend/kubernetes/ -A 3 -B 1

Repository: tale-project/tale

Length of output: 3587


🏁 Script executed:

# Search for AgentOptions definition
rg "AgentOptions|interface.*Agent|type.*Agent" ./services/sandbox/src/backend/kubernetes/ --type ts -B 2 -A 5

Repository: tale-project/tale

Length of output: 3374


🏁 Script executed:

# Check how getAgentOpts is being used in the tests
cat -n ./services/sandbox/src/backend/kubernetes/k8s-client.test.ts | sed -n '145,220p'

Repository: tale-project/tale

Length of output: 3043


🏁 Script executed:

# Check imports to understand AgentOptions source
head -40 ./services/sandbox/src/backend/kubernetes/k8s-client.test.ts

Repository: tale-project/tale

Length of output: 1456


🏁 Script executed:

# Search across the repo for AgentOptions definition
rg "type AgentOptions|interface AgentOptions" --type ts

Repository: tale-project/tale

Length of output: 43


🏁 Script executed:

# Check if getAgentOpts appears anywhere else in the codebase
rg "getAgentOpts" --type ts

Repository: tale-project/tale

Length of output: 517


🏁 Script executed:

# Look for similar patterns with unsafe assertions in the main implementation
rg "as unknown as" ./services/sandbox/src/backend/kubernetes/ --type ts

Repository: tale-project/tale

Length of output: 479


🏁 Script executed:

# Check if there are any other unsafe type assertions in the file
rg "no-unsafe-type-assertion|no-explicit-any" ./services/sandbox/src/backend/kubernetes/k8s-client.test.ts -A 2 -B 2

Repository: tale-project/tale

Length of output: 361


🏁 Script executed:

# Verify the context of where getAgentOpts is defined - is it inside a test describe block?
cat -n ./services/sandbox/src/backend/kubernetes/k8s-client.test.ts | sed -n '130,160p'

Repository: tale-project/tale

Length of output: 1703


🏁 Script executed:

# Check all test assertions using getAgentOpts to verify proposed Pick type
cat -n ./services/sandbox/src/backend/kubernetes/k8s-client.test.ts | sed -n '159,203p'

Repository: tale-project/tale

Length of output: 2260


Replace the as unknown as cast chain with runtime property checks using Reflect.get().

This test helper currently violates the coding guideline against unsafe type assertions. The function accesses the options property on https.Agent, which exists at runtime but is narrower than the union type. The current documentation spans multiple lines where the exception comment should be one line per guidelines.

Replace the unsafe cast with runtime narrowing that safely extracts only the rejectUnauthorized and ca properties that the tests actually depend on:

Proposed rewrite without `as`/`unknown`
-function getAgentOpts(ctx: RequestContext): AgentOptions | undefined {
+function getAgentOpts(
+  ctx: RequestContext,
+): Pick<AgentOptions, 'rejectUnauthorized' | 'ca'> | undefined {
   const agent = ctx.getAgent();
   if (!agent) return undefined;
-  // https.Agent stores its constructor options in agent.options; the property
-  // is present at runtime but narrower than the union type suggests.
-  // oxlint-disable-next-line typescript-eslint/no-unsafe-type-assertion
-  return (agent as unknown as { options: AgentOptions }).options;
+  const options = Reflect.get(agent, 'options');
+  if (!options || typeof options !== 'object') return undefined;
+
+  const rejectUnauthorized = Reflect.get(options, 'rejectUnauthorized');
+  const ca = Reflect.get(options, 'ca');
+  return {
+    ...(typeof rejectUnauthorized === 'boolean' ? { rejectUnauthorized } : {}),
+    ...(ca !== undefined ? { ca } : {}),
+  };
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-client.test.ts` around lines 150
- 157, The getAgentOpts function currently uses an unsafe type assertion chain
(as unknown as) to access the options property, which violates coding
guidelines. Replace this unsafe cast with runtime property checks using
Reflect.get() to safely extract only the rejectUnauthorized and ca properties
that the tests depend on, and condense the multi-line documentation comment to a
single line per guidelines.

Source: Coding guidelines


test('skipTLSVerify:true → rejectUnauthorized:false on the per-request https.Agent', async () => {
const kc = makeKc({ skipTLSVerify: true });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
// applySecurityAuthentication calls createAgent(cluster, { rejectUnauthorized: false, … })
// and sets the resulting https.Agent on the context. The agent stores
// constructor options in agent.options (Node.js / Bun node:https parity).
expect(getAgentOpts(ctx)?.rejectUnauthorized).toBe(false);
});

test('caFile → ca buffer present on the per-request https.Agent', async () => {
const caPath = '/tmp/k8s-client-test-ca.pem';
// Content is a placeholder; bufferFromFileOrString reads the bytes as-is.
writeFileSync(caPath, 'placeholder-pem-content');
try {
const kc = makeKc({ caFile: caPath });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
// applyHTTPSOptions calls fs.readFileSync(caFile) and copies the result
// to agentOptions.ca before constructing the https.Agent.
const agentOpts = getAgentOpts(ctx);
expect(agentOpts?.ca).toBeDefined();
expect(Buffer.isBuffer(agentOpts?.ca)).toBe(true);
expect(agentOpts?.ca).toEqual(Buffer.from('placeholder-pem-content'));
} finally {
rmSync(caPath, { force: true });
}
Comment on lines +172 to +191

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a unique temp file for the caFile test.

The fixed /tmp/k8s-client-test-ca.pem path can collide across concurrent workers/retries and make this test flaky.

Suggested temp-path hardening
-import { writeFileSync, unlinkSync } from 'node:fs';
+import { mkdtempSync, rmSync, writeFileSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
@@
-  test('caFile → ca buffer present on the per-request https.Agent', async () => {
-    const caPath = '/tmp/k8s-client-test-ca.pem';
+  test('caFile → ca buffer present on the per-request https.Agent', async () => {
+    const dir = mkdtempSync(join(tmpdir(), 'k8s-client-ca-'));
+    const caPath = join(dir, 'ca.pem');
@@
-    } finally {
-      unlinkSync(caPath);
-    }
+    } finally {
+      rmSync(dir, { recursive: true, force: true });
+    }
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-client.test.ts` around lines 172
- 190, The test uses a hardcoded file path `/tmp/k8s-client-test-ca.pem` for the
caFile variable, which can cause race conditions and flakiness when tests run
concurrently across multiple workers or retries due to file path collisions.
Replace the hardcoded caPath string with a dynamically generated unique
temporary file path using Node.js built-in utilities such as tmpNameSync or
mkdtempSync, or an equivalent method that ensures each test invocation gets a
unique file path. Update the caPath variable assignment in the test to use this
unique temporary path generation approach.

});

test('neither skipTLSVerify nor caFile → default TLS (no overrides on the agent)', async () => {
const kc = makeKc({});
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
const agentOpts = getAgentOpts(ctx);
expect(agentOpts).toBeDefined();
expect(agentOpts?.rejectUnauthorized).toBeUndefined();
expect(agentOpts?.ca).toBeUndefined();
});

test('caData → ca buffer present on the per-request https.Agent', async () => {
const caData = Buffer.from('placeholder-pem-content').toString('base64');
const kc = makeKc({ caData });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
const agentOpts = getAgentOpts(ctx);
expect(agentOpts?.ca).toBeDefined();
expect(Buffer.isBuffer(agentOpts?.ca)).toBe(true);
});
});
Comment on lines +159 to +219

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add one explicit error-path TLS test to complete the contract.

The new block has happy + edge coverage, but no error condition. Add a failing caFile case (missing/unreadable path) and assert applySecurityAuthentication rejects.

As per coding guidelines: “Every feature and fix must carry its test: happy path, one edge case, one error condition minimum.”

Minimal error-case addition
 describe('kubeconfig TLS knobs under Bun', () => {
@@
   test('neither skipTLSVerify nor caFile → default TLS (no overrides on the agent)', async () => {
@@
     expect(agentOpts?.ca).toBeUndefined();
   });
+
+  test('missing caFile → applySecurityAuthentication rejects', async () => {
+    const kc = makeKc({ caFile: '/tmp/does-not-exist.pem' });
+    const ctx = new RequestContext(
+      'https://k8s.local:6443/api/v1/pods',
+      HttpMethod.GET,
+    );
+    await expect(kc.applySecurityAuthentication(ctx)).rejects.toThrow();
+  });
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test('skipTLSVerify:true → rejectUnauthorized:false on the per-request https.Agent', async () => {
const kc = makeKc({ skipTLSVerify: true });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
// applySecurityAuthentication calls createAgent(cluster, { rejectUnauthorized: false, … })
// and sets the resulting https.Agent on the context. The agent stores
// constructor options in agent.options (Node.js / Bun node:https parity).
expect(getAgentOpts(ctx)?.rejectUnauthorized).toBe(false);
});
test('caFile → ca buffer present on the per-request https.Agent', async () => {
const caPath = '/tmp/k8s-client-test-ca.pem';
// Content is a placeholder; bufferFromFileOrString reads the bytes as-is.
writeFileSync(caPath, 'placeholder-pem-content');
try {
const kc = makeKc({ caFile: caPath });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
// applyHTTPSOptions calls fs.readFileSync(caFile) and copies the result
// to agentOptions.ca before constructing the https.Agent.
const agentOpts = getAgentOpts(ctx);
expect(agentOpts?.ca).toBeDefined();
expect(Buffer.isBuffer(agentOpts?.ca)).toBe(true);
} finally {
unlinkSync(caPath);
}
});
test('neither skipTLSVerify nor caFile → default TLS (no overrides on the agent)', async () => {
const kc = makeKc({});
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
const agentOpts = getAgentOpts(ctx);
expect(agentOpts?.rejectUnauthorized).toBeUndefined();
expect(agentOpts?.ca).toBeUndefined();
});
});
test('skipTLSVerify:true → rejectUnauthorized:false on the per-request https.Agent', async () => {
const kc = makeKc({ skipTLSVerify: true });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
// applySecurityAuthentication calls createAgent(cluster, { rejectUnauthorized: false, … })
// and sets the resulting https.Agent on the context. The agent stores
// constructor options in agent.options (Node.js / Bun node:https parity).
expect(getAgentOpts(ctx)?.rejectUnauthorized).toBe(false);
});
test('caFile → ca buffer present on the per-request https.Agent', async () => {
const caPath = '/tmp/k8s-client-test-ca.pem';
// Content is a placeholder; bufferFromFileOrString reads the bytes as-is.
writeFileSync(caPath, 'placeholder-pem-content');
try {
const kc = makeKc({ caFile: caPath });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
// applyHTTPSOptions calls fs.readFileSync(caFile) and copies the result
// to agentOptions.ca before constructing the https.Agent.
const agentOpts = getAgentOpts(ctx);
expect(agentOpts?.ca).toBeDefined();
expect(Buffer.isBuffer(agentOpts?.ca)).toBe(true);
} finally {
unlinkSync(caPath);
}
});
test('neither skipTLSVerify nor caFile → default TLS (no overrides on the agent)', async () => {
const kc = makeKc({});
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await kc.applySecurityAuthentication(ctx);
const agentOpts = getAgentOpts(ctx);
expect(agentOpts?.rejectUnauthorized).toBeUndefined();
expect(agentOpts?.ca).toBeUndefined();
});
test('missing caFile → applySecurityAuthentication rejects', async () => {
const kc = makeKc({ caFile: '/tmp/does-not-exist.pem' });
const ctx = new RequestContext(
'https://k8s.local:6443/api/v1/pods',
HttpMethod.GET,
);
await expect(kc.applySecurityAuthentication(ctx)).rejects.toThrow();
});
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/backend/kubernetes/k8s-client.test.ts` around lines 159
- 204, Add a new test case after the existing three tests that verifies the
error condition when caFile references a missing or unreadable file path. Create
a test that calls makeKc with a caFile pointing to a nonexistent path, then
invoke applySecurityAuthentication on the RequestContext and assert that the
promise rejects with an appropriate error. This completes the test coverage by
adding the required error-path condition alongside the existing happy-path and
edge-case tests.

Source: Coding guidelines

26 changes: 19 additions & 7 deletions services/sandbox/src/backend/kubernetes/k8s-client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,20 @@
// createNamespacedPod / readNamespacedPodLog / deleteNamespacedPod + presigned-
// URL I/O done inside the Pod — every primitive below is plain HTTP.
//
// AUTH NOTE (Bun): Bun's fetch does NOT apply a kubeconfig's client cert or
// custom CA, so a client-cert cluster (e.g. kind's default kubeconfig) auths
// as system:anonymous. The real in-cluster path uses a ServiceAccount BEARER
// TOKEN (an Authorization header Bun sends fine) + the cluster CA — for Bun to
// trust that CA, the container must set NODE_EXTRA_CA_CERTS to the SA ca.crt
// (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt). Local dev needs a
// token-based kubeconfig, not kind's client-cert one.
// AUTH NOTE (Bun): @kubernetes/client-node@1.4.0 routes requests through
// node-fetch v2, which calls https.request() with an https.Agent — NOT Bun's
// native fetch(). Through node:https, skipTLSVerify (→ rejectUnauthorized:
// false on the Agent) and caFile (→ ca Buffer on the Agent) ARE honoured by
// Bun's TLS stack. What is NOT honoured is client-cert auth (cert/key options
// on the Agent): Bun's TLS layer does not support mutual TLS client
// certificates, so a client-cert kubeconfig (e.g. kind's default) auths as
// system:anonymous. Use a ServiceAccount bearer-token kubeconfig instead.
//
// In-cluster CA trust: loadFromCluster() sets caFile to the projected SA
// ca.crt, which is read into the Agent's ca option and works as above. As a
// belt-and-suspenders measure the container should also set NODE_EXTRA_CA_CERTS
// to the same path so that any native fetch() calls (e.g. in harvest) trust
// the apiserver CA without requiring an explicit Agent.

import {
type ConfigurationOptions,
Expand Down Expand Up @@ -77,6 +84,11 @@ export function makeK8sClient(namespace: string): K8sClient {
// Explicit bearer-token config (dev / Bun-friendly). Bun can't use a
// client-cert kubeconfig, so point at the apiserver with an SA token; CA
// trust via SANDBOX_K8S_CAFILE + NODE_EXTRA_CA_CERTS, or skipTLSVerify.
if (!process.env.SANDBOX_K8S_CAFILE) {
console.warn(
'[sandbox.k8s] SANDBOX_K8S_CAFILE not set — TLS certificate verification DISABLED (skipTLSVerify: true). Do not use in production.',
);
}
kc.loadFromOptions({
clusters: [
{
Expand Down