Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions services/sandbox/docs/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,18 +83,18 @@ stream. Keep it out so a stray exec call fails closed.

## Environment

| Env | Required | Notes |
| ----------------------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `SANDBOX_BACKEND=kubernetes` | yes | Selects this backend. |
| `SANDBOX_SPAWNER_IMAGE` | yes | The spawner's **own** image ref, used for the `stage`/`harvest` containers. Must match the deployed spawner version. |
| `SANDBOX_RUNTIME_IMAGE` | yes | The `runner` image (`tale-sandbox-runtime:<tag>`). |
| `SANDBOX_K8S_NAMESPACE` | yes | Namespace the per-exec Pods/Secrets are created in (default `tale-sandbox`). |
| `NODE_EXTRA_CA_CERTS` | in-cluster | Point at the SA `ca.crt` (`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`). **Bun's fetch ignores the kubeconfig CA**, so without this the apiserver TLS isn't trusted. |
| `SANDBOX_RUNTIME=runsc` | optional | Sets the Pod `runtimeClassName` (gVisor) via `SANDBOX_RUNTIME_CLASS` (default `gvisor`). |
| `SANDBOX_CACHE=pvc` | optional | Mounts per-org RWX cache PVCs on the runner; needs the PVC RBAC above + an RWX StorageClass. Default `none` (installs fresh each run via the egress proxy). |
| `SANDBOX_K8S_WORKSPACE_SIZE_LIMIT` | optional | `sizeLimit` on the per-exec `/user` emptyDir (default `4Gi`). Bounds deps + temp + outputs; exceeding it evicts the Pod. |
| `SANDBOX_EGRESS_PROXY` | optional | The runner's `HTTPS_PROXY`/`HTTP_PROXY` for `pip`/`npm` (default `http://sandbox-egress:3128`). |
| `SANDBOX_K8S_SERVER` / `SANDBOX_K8S_TOKEN` / `SANDBOX_K8S_CAFILE` | dev only | Explicit bearer-token kubeconfig for local Bun dev (kind's client-cert kubeconfig auths as `system:anonymous` under Bun). In-cluster uses the projected SA token automatically. |
| Env | Required | Notes |
| ----------------------------------------------------------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `SANDBOX_BACKEND=kubernetes` | yes | Selects this backend. |
| `SANDBOX_SPAWNER_IMAGE` | yes | The spawner's **own** image ref, used for the `stage`/`harvest` containers. Must match the deployed spawner version. |
| `SANDBOX_RUNTIME_IMAGE` | yes | The `runner` image (`tale-sandbox-runtime:<tag>`). |
| `SANDBOX_K8S_NAMESPACE` | yes | Namespace the per-exec Pods/Secrets are created in (default `tale-sandbox`). |
| `NODE_EXTRA_CA_CERTS` | optional | Defence-in-depth: point at the SA `ca.crt` (`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`). `@kubernetes/client-node` routes API calls through `node-fetch` → `https.request()`, which Bun honours — the kubeconfig `caFile` from `loadFromCluster` is trusted without this env var. `NODE_EXTRA_CA_CERTS` additionally covers global-`fetch` paths (e.g. presigned-URL I/O) and is recommended belt-and-suspenders. |
| `SANDBOX_RUNTIME=runsc` | optional | Sets the Pod `runtimeClassName` (gVisor) via `SANDBOX_RUNTIME_CLASS` (default `gvisor`). |
| `SANDBOX_CACHE=pvc` | optional | Mounts per-org RWX cache PVCs on the runner; needs the PVC RBAC above + an RWX StorageClass. Default `none` (installs fresh each run via the egress proxy). |
| `SANDBOX_K8S_WORKSPACE_SIZE_LIMIT` | optional | `sizeLimit` on the per-exec `/user` emptyDir (default `4Gi`). Bounds deps + temp + outputs; exceeding it evicts the Pod. |
| `SANDBOX_EGRESS_PROXY` | optional | The runner's `HTTPS_PROXY`/`HTTP_PROXY` for `pip`/`npm` (default `http://sandbox-egress:3128`). |
| `SANDBOX_K8S_SERVER` / `SANDBOX_K8S_TOKEN` / `SANDBOX_K8S_CAFILE` | dev only | Explicit bearer-token kubeconfig for local Bun dev. `SANDBOX_K8S_CAFILE` sets the cluster CA; when absent, `skipTLSVerify: true` disables TLS verification — both are honoured under Bun (verified: issue #1849). In-cluster uses the projected SA token automatically. |

The Pod sets `automountServiceAccountToken: false` — the runtime never gets an
SA token.
Expand Down
70 changes: 68 additions & 2 deletions services/sandbox/src/backend/kubernetes/k8s-client.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@
// middleware (the only thing standing between a wedged TCP connection and a
// forever-hung execute()) and withRetry's retryability gate.

import { describe, expect, test } from 'bun:test';
import { afterEach, beforeEach, describe, expect, test } from 'bun:test';

import { HttpMethod, RequestContext } from '@kubernetes/client-node';

import { apiTimeout, httpStatusCode, withRetry } from './k8s-client.ts';
import {
apiTimeout,
httpStatusCode,
makeK8sConfig,
withRetry,
} from './k8s-client.ts';

/** Await a rejection and hand back the error (typed alternative to .rejects). */
async function rejectionOf(p: Promise<unknown>): Promise<Error> {
Expand Down Expand Up @@ -110,3 +115,64 @@ describe('withRetry', () => {
expect(Date.now() - start).toBeLessThan(1_000);
});
});

// Snapshot the env vars touched by makeK8sConfig so tests don't pollute each
// other or the runner's environment.
const K8S_ENV_KEYS = [
'SANDBOX_K8S_SERVER',
'SANDBOX_K8S_TOKEN',
'SANDBOX_K8S_CAFILE',
] as const;

let savedEnv: Record<string, string | undefined>;

beforeEach(() => {
savedEnv = {};
for (const k of K8S_ENV_KEYS) {
savedEnv[k] = process.env[k];
delete process.env[k];
}
});

afterEach(() => {
for (const k of K8S_ENV_KEYS) {
if (savedEnv[k] === undefined) delete process.env[k];
else process.env[k] = savedEnv[k];
}
});

describe('makeK8sConfig — explicit bearer-token path (SANDBOX_K8S_SERVER + TOKEN)', () => {
// Empirically verified under Bun 1.3.14 (issue #1849): @kubernetes/client-node
// uses node-fetch → https.request(), which Bun honours. skipTLSVerify and
// caFile both take effect; neither is dead code.

test('sets caFile when SANDBOX_K8S_CAFILE is provided', () => {
process.env.SANDBOX_K8S_SERVER = 'https://k8s.example.com:6443';
process.env.SANDBOX_K8S_TOKEN = 'test-token';
process.env.SANDBOX_K8S_CAFILE = '/etc/k8s/ca.crt';

const kc = makeK8sConfig();
const cluster = kc.clusters[0]!;
expect(cluster.caFile).toBe('/etc/k8s/ca.crt');
expect(cluster.skipTLSVerify).toBeFalsy();
});

test('sets skipTLSVerify when SANDBOX_K8S_CAFILE is absent', () => {
process.env.SANDBOX_K8S_SERVER = 'https://k8s.example.com:6443';
process.env.SANDBOX_K8S_TOKEN = 'test-token';

const kc = makeK8sConfig();
const cluster = kc.clusters[0]!;
expect(cluster.skipTLSVerify).toBe(true);
expect(cluster.caFile).toBeUndefined();
});

test('sets the server URL and SA token', () => {
process.env.SANDBOX_K8S_SERVER = 'https://apiserver.internal:6443';
process.env.SANDBOX_K8S_TOKEN = 'my-sa-token';

const kc = makeK8sConfig();
expect(kc.clusters[0]!.server).toBe('https://apiserver.internal:6443');
expect(kc.users[0]!.token).toBe('my-sa-token');
});
});
40 changes: 28 additions & 12 deletions services/sandbox/src/backend/kubernetes/k8s-client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,20 @@
// createNamespacedPod / readNamespacedPodLog / deleteNamespacedPod + presigned-
// URL I/O done inside the Pod — every primitive below is plain HTTP.
//
// AUTH NOTE (Bun): Bun's fetch does NOT apply a kubeconfig's client cert or
// custom CA, so a client-cert cluster (e.g. kind's default kubeconfig) auths
// as system:anonymous. The real in-cluster path uses a ServiceAccount BEARER
// TOKEN (an Authorization header Bun sends fine) + the cluster CA — for Bun to
// trust that CA, the container must set NODE_EXTRA_CA_CERTS to the SA ca.crt
// (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt). Local dev needs a
// token-based kubeconfig, not kind's client-cert one.
// TLS NOTE (Bun): Bun's *native* fetch() ignores the `agent` option, so TLS
// settings passed via an https.Agent are dropped for global-fetch calls.
// @kubernetes/client-node@1.4.0 uses *node-fetch* internally, which routes
// through https.request() — Bun does honour agent options there. Empirically
// verified under Bun 1.3.14 (issue #1849):
// • skipTLSVerify: true → rejectUnauthorized: false takes effect (not dead code)
// • caFile → agent.ca is trusted; no NODE_EXTRA_CA_CERTS required for
// k8s API calls made through this library
// • client certs → sent correctly via agent.cert/key
// NODE_EXTRA_CA_CERTS is therefore defence-in-depth for k8s API calls, not a
// hard requirement; it is still valuable for global-fetch paths (presigned-URL
// I/O) and as belt-and-suspenders. Local dev still needs a token-based
// kubeconfig (SANDBOX_K8S_SERVER + SANDBOX_K8S_TOKEN) rather than kind's
// client-cert kubeconfig; kind client-cert auth was not re-verified post-#1849.

import {
type ConfigurationOptions,
Expand Down Expand Up @@ -69,14 +76,17 @@ export function httpStatusCode(err: unknown): number | undefined {
/** Definitive 4xx responses that a retry can never fix. */
const NON_RETRYABLE_STATUS = new Set([400, 401, 403, 404, 409, 422]);

export function makeK8sClient(namespace: string): K8sClient {
/**
* Builds the KubeConfig for the current environment. Exported for unit tests;
* callers should use makeK8sClient() instead.
*/
export function makeK8sConfig(): KubeConfig {
const kc = new KubeConfig();
const server = process.env.SANDBOX_K8S_SERVER;
const token = process.env.SANDBOX_K8S_TOKEN;
if (server && token) {
// Explicit bearer-token config (dev / Bun-friendly). Bun can't use a
// client-cert kubeconfig, so point at the apiserver with an SA token; CA
// trust via SANDBOX_K8S_CAFILE + NODE_EXTRA_CA_CERTS, or skipTLSVerify.
// Explicit bearer-token config (dev). skipTLSVerify and caFile both take
// effect under Bun via node-fetch → https.request (see TLS NOTE above).
kc.loadFromOptions({
clusters: [
{
Expand All @@ -93,7 +103,8 @@ export function makeK8sClient(namespace: string): K8sClient {
});
} else {
try {
// In-cluster: SA token + ca.crt from the projected volume.
// In-cluster: SA token + ca.crt from the projected volume. The caFile
// set by loadFromCluster() is trusted via the https.Agent — see TLS NOTE.
kc.loadFromCluster();
} catch (err) {
console.warn(
Expand All @@ -103,6 +114,11 @@ export function makeK8sClient(namespace: string): K8sClient {
kc.loadFromDefault();
}
}
return kc;
}

export function makeK8sClient(namespace: string): K8sClient {
const kc = makeK8sConfig();
return {
core: kc.makeApiClient(CoreV1Api),
namespace,
Expand Down