Skip to content

openclaw/crabbox

Repository files navigation

🦀 📦 Crabbox

Crabbox banner

CI Release Latest release

Warm a box, sync the diff, run the suite.

Crabbox is a remote software testing and execution control plane for maintainers and AI agents. Lease fast managed cloud capacity, point at an existing SSH host, or use an agent sandbox provider — then sync your dirty checkout, run commands remotely, stream output, collect evidence, and release. Local edit-save-run loop, cloud-grade compute, agent-ready observability.

crabbox run -- pnpm test

Behind that one command: a Go CLI on your laptop, a Cloudflare Worker broker that owns provider credentials and lease state, and a managed or delegated runner.

How it works

your laptop                Cloudflare Worker            cloud provider
-------------              ------------------           --------------
crabbox CLI    -- HTTPS --> Fleet Durable Object  -->   Hetzner / AWS / Azure / GCP
   |                         lease + cost state              |
   |                                                         |
   +------------ SSH + rsync to leased runner <--------------+
  • CLI — Go binary. Loads config, mints a per-lease SSH key, asks the broker for a lease, waits for SSH, seeds remote Git, rsyncs the dirty checkout (with a fingerprint skip when nothing changed), runs the command, streams output, releases.
  • Broker — Cloudflare Worker plus a single Fleet Durable Object. Owns provider credentials, serializes lease state, enforces active-lease and monthly spend caps, and expires stale leases by alarm. Auth is GitHub browser login or a shared bearer token.
  • Runner — a throwaway machine reachable over SSH on the primary port (default 2222) plus configured fallback ports, prepared with Crabbox's sync/run prerequisites. Linux uses Ubuntu with cloud-init and /work/crabbox; native Windows uses OpenSSH, Git for Windows, and C:\crabbox. No broker credentials live on the box. Project runtimes (Go, Node, Docker, services, secrets) come from your repo's GitHub Actions hydration, devcontainer, Nix, mise/asdf, or setup scripts — not from Crabbox.

The data plane — SSH, rsync, command execution — always runs directly from the CLI to the runner. The broker only manages leases, cost, and observability.

Only aws, azure, gcp, and hetzner can be brokered through the Worker, and even those run direct from the CLI when no broker URL is configured. Every other provider always runs direct. A direct-provider mode (--provider hetzner|aws|azure|gcp|proxmox with local credentials) exists for debugging the broker itself or using private infrastructure.

For the full mental model, see How Crabbox Works. For the doc-to-code map, see Source Map.

Install

brew install openclaw/tap/crabbox
crabbox --version

No Homebrew? Grab a GoReleaser archive for macOS, Linux, or Windows.

Laptop prerequisites: git, ssh, ssh-keygen, rsync, curl.

Quick start

Broker access is deployment-specific. Use a coordinator URL from your team, use direct-provider mode for a personal cloud account, or self-host the Worker broker with your own provider credentials and spend caps. See Getting started and Infrastructure for the setup paths.

# log in once per machine (stores a broker token in user config)
crabbox login --url https://broker.example.com

# verify local prerequisites and broker reachability
crabbox doctor

# one-shot: lease, sync, run, release
crabbox run -- pnpm test

# named repo workflow from .crabbox.yaml
crabbox job run full-ci

# or warm a box once, then reuse it
crabbox warmup                                       # prints cbx_... + a slug
crabbox run --id blue-lobster -- pnpm test:changed
crabbox ssh --id blue-lobster
crabbox stop blue-lobster

Every lease has a stable cbx_... ID and a friendly crustacean slug (blue-lobster, swift-hermit, …). Either works wherever an --id is accepted. Use --slug <name> on fresh leases when a specific reusable slug helps, and --label <text> on run when the history entry needs a human-readable name.

Providers

Coordinator: brokered providers can run through the Worker (or direct when no broker is configured); every other provider always runs direct from the CLI. Targets: Linux, MacOS, Windows.

SSH-lease providers (provision or connect a box, full lifecycle)

Provider provider: (aliases) Targets Coordinator Notes
AWS EC2 aws L / M / W brokered EC2 instances and EC2 Mac; native AMI/EBS checkpoints.
Azure azure L / W brokered VMs with Tailscale support; native Windows and WSL2.
Google Cloud gcp (google, google-cloud) L brokered Linux Compute Engine VMs with Tailscale support.
Hetzner Cloud hetzner L brokered Linux VMs with desktop/browser/code and Tailscale.
Parallels parallels L / M / W direct Local or remote macOS host; checkpoint/fork/restore/snapshot.
Proxmox proxmox L direct Clone Linux QEMU templates on a private Proxmox VE cluster.
Static SSH ssh (static, static-ssh) L / M / W direct Existing machines; no provisioning.
Local Container local-container (docker, container, local-docker) L direct Local Docker-compatible runtime (Docker Desktop, OrbStack, Colima).
exe.dev exe-dev (exe, exedev) L direct exe.dev VMs exposed as public SSH leases.
Namespace Devbox namespace-devbox (namespace, namespace-devboxes) L direct Namespace.so Devboxes over SSH.
Semaphore semaphore (sem) L direct A Semaphore CI job leased as a testbox.
Sprites sprites L direct Sprites microVMs through sprite proxy.
Daytona daytona L direct Daytona-managed dev sandbox over SSH.
RunPod runpod (run-pod, runpodio) L direct RunPod GPU pods with public SSH.
ASCII Box ascii-box (ascii, asciibox) L direct ASCII Box Ubuntu sandboxes exposed as SSH leases.

Delegated-run providers (sandbox/proof runners, no SSH lease)

Provider provider: (aliases) Targets Notes
Cloudflare cloudflare (cf) L Cloudflare Containers via the Worker runtime.
E2B e2b L E2B Firecracker sandbox.
Islo islo L Islo sandbox.
Modal modal L Modal Sandbox through the local Python client.
Railway railway (rail, railwayapp) L Redeploy and stream an existing Railway service.
Tensorlake tensorlake (tl, tensorlake-sbx) L Tensorlake Firecracker sandbox via the Tensorlake CLI.
Upstash Box upstash-box (upstash, box, upstashbox) L Upstash Box through the Box REST API.
Azure Dynamic Sessions azure-dynamic-sessions L Azure Container Apps dynamic sessions.
Blacksmith Testbox blacksmith-testbox (blacksmith) L Delegated Blacksmith CI Testbox lifecycle and execution.
W&B Sandboxes wandb (weights-and-biases) L Weights & Biases Sandboxes; reuses wandb login credentials.

See Providers for the full reference, capabilities, and authoring guide.

Highlights

  • One-shot or warm workspaces. crabbox run for fire-and-forget; crabbox warmup + --id for repeated runs against the same box. See warmup and run.
  • Named repo jobs. crabbox job run <name> lets repos define warmup, optional Actions hydration, run command, and cleanup policy in .crabbox.yaml. See Jobs.
  • Local-first workspace sync. No clean-checkout requirement. Tracked and nonignored files only, fingerprint skip on no-op runs, sanity checks against suspicious mass deletions, optional shallow base-ref hydration for changed-test workflows. See Sync.
  • Run observability. Every coordinator-backed run gets an early run_... handle. Use crabbox attach <run-id> while it is active, crabbox events <run-id> for durable lifecycle/output events, and crabbox logs <run-id> for retained output after completion. See History and logs and Observability.
  • GitHub Actions hydration. crabbox actions hydrate runs supported setup steps from the repo's workflow locally over SSH, so leased boxes get the same runtimes and tooling without GitHub write access. Use --github-runner only when setup needs full Actions semantics such as repository secrets, OIDC, service containers, or unsupported uses: steps. See Actions hydration.
  • Failure capsules. crabbox capsule from-actions <run-url> captures a failing CI run into a portable, replayable bundle; capsule replay reruns it. See Capsules.
  • Checkpoints. Save VM-or-workspace state and restore/fork from it, via workspace archives or provider-native snapshots/images. See Checkpoints.
  • Pond peer groups. Leases that share a --pond <name> label form an emergent peer group with discovery (pond peers), an SSH-mesh of ssh -L forwards to members' --expose ports (pond connect), and bulk pond release. See Pond.
  • Brokered cloud with cost guardrails. Maintainers and agents share infra without sharing provider tokens. Hetzner, AWS, Azure, and Google Cloud are the managed providers; per-lease and monthly spend caps reject over-budget leases. Providers fall back across compatible instance families when capacity or quota rejects a request. crabbox usage summarizes spend by user, org, provider, and type. See Coordinator, Capacity fallback, and Cost and usage.
  • Interactive desktop, browser, and code leases. --browser provisions Chrome/Chromium for headless automation, --desktop provisions a visible UI with tunnel-only VNC takeover, and --code provisions code-server on managed Linux. crabbox desktop click/paste/type/key provide first-class input helpers; desktop proof captures metadata, screenshot, diagnostics, MP4, and a contact-sheet PNG in one publishable bundle. See Interactive desktop and VNC.
  • Authenticated web portal. Browser login opens owner-scoped and shared lease/run views with run logs/events, WebVNC, code-server, and telemetry charts. crabbox webvnc/crabbox code bridge a lease into the portal; crabbox share grants a lease to a user or the owning org. See Portal.
  • Agent workspace evidence. History, logs, events, telemetry, JUnit summaries, screenshots, recordings, artifacts, and PR publishing make autonomous work reviewable instead of only ephemeral terminal output. See Artifacts and Telemetry.
  • Stable timing records. --timing-json on run, warmup, and actions hydrate gives scripts one machine-readable sync/command/total timing schema across providers.
  • Hardened coordinator auth. GitHub browser login, owner-scoped leases, admin-only routes, optional GitHub team allowlists, Cloudflare Access JWT verification, and service-token support keep normal use and operator automation separate. See Auth and admin and Security.
  • OpenClaw plugin. The repo root is a native OpenClaw plugin for box lifecycle operations. See OpenClaw plugin below and OpenClaw plugin.

Machine classes

beast is the default for providers that expose class-based managed capacity. The providers below fall back across ordered instance-type lists unless --type pins a specific provider-native size.

Hetzner    standard  ccx33, cpx62, cx53
           fast      ccx43, cpx62, cx53
           large     ccx53, ccx43, cpx62, cx53
           beast     ccx63, ccx53, ccx43, cpx62, cx53

AWS Linux  standard  c7a/c7i/m7a/m7i.8xlarge family
           fast      …16xlarge family
           large     …24xlarge family
           beast     …48xlarge family, falling back to 32x/24x/16x
           arm64     c7g/m7g/r7g families with --arch arm64

AWS Win    standard  m7i.large, m7a.large, t3.large
           fast      m7i.xlarge, m7a.xlarge, t3.xlarge
           large     m7i.2xlarge, m7a.2xlarge, t3.2xlarge
           beast     m7i.4xlarge, m7a.4xlarge, m7i.2xlarge

AWS WSL2   standard  m8i.large, m8i-flex.large, c8i.large, r8i.large
           fast      m8i.xlarge, m8i-flex.xlarge, c8i.xlarge, r8i.xlarge
           large     m8i.2xlarge, m8i-flex.2xlarge, c8i.2xlarge, r8i.2xlarge
           beast     m8i.4xlarge, m8i-flex.4xlarge, c8i.4xlarge, r8i.4xlarge, m8i.2xlarge

AWS macOS  all       mac2.metal, then mac1.metal unless --type is set

Azure      standard  Standard_D32ads_v6, Standard_D32ds_v6, Standard_F32s_v2, then 16-vCPU fallbacks
           fast      Standard_D64ads_v6, Standard_D64ds_v6, Standard_F64s_v2, then 48/32-vCPU fallbacks
           large     Standard_D96ads_v6, Standard_D96ds_v6, then 64/48-vCPU fallbacks
           beast     Standard_D192ds_v6, Standard_D128ds_v6, then 96/64-vCPU fallbacks
           arm64     Standard_D*ps_v6 / D*pds_v6 Cobalt families with --arch arm64

Azure Win/
WSL2       standard  Standard_D2ads_v6, Standard_D2ds_v6, Standard_D2ads_v5, Standard_D2ds_v5, Standard_D2as_v6
           fast      Standard_D4ads_v6, Standard_D4ds_v6, Standard_D4ads_v5, Standard_D4ds_v5, Standard_D4as_v6
           large     Standard_D8ads_v6, Standard_D8ds_v6, Standard_D8ads_v5, Standard_D8ds_v5, Standard_D8as_v6
           beast     Standard_D16ads_v6, Standard_D16ds_v6, Standard_D16ads_v5, Standard_D16ds_v5, Standard_D8ads_v6

Namespace  standard  S
           fast      M
           large     L
           beast     XL

Cloudflare standard  standard-4
           fast      standard-4
           large     standard-4
           beast     standard-4

Override with --type or CRABBOX_SERVER_TYPE for a specific instance. Use --arch arm64 / architecture: arm64 for Linux ARM capacity on Azure or AWS; explicit ARM provider types also select ARM images when no custom image is set. Cloudflare also accepts lite, basic, standard-1, standard-2, and standard-3 as smaller explicit --type values; standard-4 is the default. Providers without a row either use provider-native capacity settings or reject class/type selection.

Configuration

Config resolves in order: flags → env → repo .crabbox.yaml → user ~/.config/crabbox/config.yaml → defaults.

broker:
  url: https://broker.example.com
  provider: aws
  token: ...
class: beast
capacity:
  market: spot
  strategy: most-available
  fallback: on-demand-after-120s
  hints: true
aws:
  region: eu-west-1
  rootGB: 400
lease:
  idleTimeout: 30m
  ttl: 90m
ssh:
  key: ~/.ssh/id_ed25519
  user: crabbox
  port: "2222"
  # Ordered fallback ports tried after ssh.port; use [] to disable fallback.
  fallbackPorts:
    - "22"

Forwarded environment is intentionally narrow: NODE_OPTIONS and CI. Do not pass secrets as command-line arguments. For live-secret smoke tests, use crabbox run --env-from-profile <file> --allow-env NAME so Crabbox forwards only selected names and prints redacted presence/length metadata. For stale warm boxes, --full-resync (alias --fresh-sync) resets the remote workdir before syncing. For larger commands, use --script <file> or --script-stdin so the remote runner executes an uploaded file instead of a giant quoted shell string.

For binary or terminal-hostile output, use crabbox run --capture-stdout <path> or --capture-stderr <path>. Add --preflight for a remote capability snapshot, --keep-on-failure to SSH into the exact failed one-shot lease, or --download remote=local to copy a successful-run artifact back. Failed SSH-backed and Blacksmith delegated runs save local .crabbox/captures/*.tar.gz bundles by default. Captured files are not redacted by Crabbox.

Optional Tailscale reachability for managed Linux leases:

tailscale:
  enabled: true
  network: auto
  tags:
    - tag:crabbox
  hostnameTemplate: crabbox-{slug}
  authKeyEnv: CRABBOX_TAILSCALE_AUTH_KEY
  exitNode: mac-studio.example.ts.net
  exitNodeAllowLanAccess: true

Tailscale is a network plane, not a provider. --tailscale joins new managed Linux leases to the tailnet; --network auto|tailscale|public chooses how SSH and VNC tunnel commands resolve the host. Brokered mode uses Worker OAuth secrets to mint one-off keys; direct-provider mode reads the auth key from the configured env var. See Tailscale.

A few provider-specific config snippets:

# Static macOS or Windows target (existing machine, no provisioning)
provider: ssh
target: windows
windows:
  mode: normal # or wsl2
static:
  host: win-dev.local
  user: alice
  port: "22"
  workRoot: C:\crabbox
# Local container (alias: docker; works with OrbStack as the active context)
provider: local-container
localContainer:
  runtime: docker
  image: debian:bookworm
  workRoot: /work/crabbox
# Delegated Blacksmith CI Testbox
provider: blacksmith-testbox
blacksmith:
  org: example-org
  workflow: .github/workflows/ci-check-testbox.yml
  job: test
  ref: main
  idleTimeout: 90m

Keep provider tokens in environment variables, not repo config (for example CRABBOX_SEMAPHORE_TOKEN, CRABBOX_SPRITES_TOKEN, RUNPOD_API_KEY, ASCII_BOX_API_KEY, E2B_API_KEY, DAYTONA_API_KEY). The full env-var reference, per-provider sections, and per-command flags are in docs/cli.md, Configuration, and the provider docs.

OpenClaw plugin

The repo root is a native OpenClaw plugin package. Once installed, it exposes Crabbox as agent tools:

  • crabbox_run, crabbox_warmup, crabbox_status, crabbox_list, crabbox_stop

The plugin shells out to the configured crabbox binary with argv arrays, so local config, broker login, repo claims, and sync behavior stay owned by the CLI. Set plugins.entries.crabbox.config.binary if crabbox is not on PATH.

Durable run inspection is intentionally CLI/skill-led instead of additional plugin tools: use crabbox history, crabbox events --after --limit, crabbox attach, crabbox logs, crabbox results, and crabbox usage from a shell-capable agent. See OpenClaw plugin.

Development

# Go CLI
go build -trimpath -o bin/crabbox ./cmd/crabbox
go vet ./...
go test -race ./...

# Cloudflare Worker (Node 22+ locally; CI runs Node 24)
npm ci --prefix worker
npm test --prefix worker
npm run build --prefix worker

# Docs
npm run docs:check

# Optional live smoke, when broker/provider credentials are available
CRABBOX_LIVE=1 CRABBOX_LIVE_REPO=/path/to/my-app scripts/live-smoke.sh

CI runs the full gate (gofmt, vet, race tests, all Go modules, coverage threshold, docs link/build check, GoReleaser snapshot, and Worker lint/typecheck/tests/build) on every push and PR. Tagged pushes matching v* publish Go archives via GoReleaser and bump the Homebrew formula at openclaw/homebrew-tap.

Worker deployment, required secrets, and DNS routing live in docs/infrastructure.md.

Docs

The GitHub Pages site at https://openclaw.github.io/crabbox/ is generated from the docs/ Markdown:

npm run docs:check
open dist/docs-site/index.html

License

MIT — see LICENSE.