Skip to content

Research: Docker Sandboxes (sbx) as opt-in process isolation #13

@samkeen

Description

@samkeen

Why

Today Tilth's only worker containment is cwd=workspace for the bash tool (tilth/tools/bash.py) plus a narrow denylist in pre_tool (force-push, sudo, curl|sh). _resolve keeps the file tools well-behaved, but bash runs with the full credentials of the user running uv run tilthcat /etc/passwd, cat ../../.env, cat ../../sessions/<id>/events.jsonl all succeed. See discussion in #10.

This issue is the research artifact for an opt-in isolated mode: a secondary mode that wraps the harness in real process isolation. Default mode stays best-effort. No code changes proposed here.

Design preference (settled)

When isolated mode is requested for a session, pick the most-isolated option that's actually available on the host, in this order:

  1. Apple Containerization (container CLI) — if Apple Silicon + macOS 26+ + container installed and container system start'd. Apache-2.0, sub-second microVM, native fit. Preferred when available.
  2. Docker Sandboxes (sbx) — if sbx is installed and authenticated. Cross-platform fallback covering Intel Macs, Linux (KVM required), Windows.
  3. Default best-effort — what Tilth does today. Always available.

Default mode stays best-effort; isolated mode is opt-in (default behaviour unchanged for users who don't ask for it).

CLI shape deferred to a future implementation issue: probably --sandbox with auto-detect-and-announce (verbose log line saying which backend got picked), with explicit --sandbox=apple|sbx available as an override.

Integration shape

Two integration levels, with strong preference for the first regardless of backend:

  1. Wrap-the-harness (recommended). The whole uv run tilth ~/projects/foo process runs inside the sandbox. Workspace + sessions/ are bind/passthrough mounted so --resume and --visualize from the host still see the same session dir. Likely zero-code, possibly one thin CLI affordance per backend.
  2. Wrap-the-tool-calls (not recommended). Harness on host, each bash tool call routed through sandbox exec. Invasive — every tool needs sandbox awareness, file tools too. Loses the simplicity of "the worker is one process in a known cwd."

(1) is the natural fit given Tilth's process model.

Backend: Apple Containerization (container)

  • License: Apache-2.0. Source at github.com/apple/container.
  • Platform: Apple Silicon only. Requires macOS 26 (Tahoe) for full functionality; less capable on macOS 15.
  • Tech: one lightweight Linux microVM per container (not a shared-kernel Docker model). Swift-based vminitd per VM. Sub-second cold start.
  • Image format: fully OCI-compatible. Any standard image from a standard registry works. Run a tilth-sandbox image (preinstalled uv + Python) the same way you'd run any OCI image.
  • Maturity: pre-1.0, active development, breaking changes possible between minor versions.
  • Tilth fit: when available, this is the right choice — open source, native, fast, no Docker Inc. dependency.

Backend: Docker Sandboxes (sbx)

What it is

  • microVM-per-sandbox (not a container). Linux uses KVM (sudo usermod -aG kvm $USER); macOS uses the platform hypervisor (Apple Virtualization Framework, not explicit in docs); Windows uses Hyper-V.
  • Released GA, v0.30.0 (May 2026) — sub-1.0, expect churn.
  • Proprietary (Docker Inc.). github.com/docker/sbx-releases ships binaries only. Free for core functionality; team admin controls (network restrictions, FS policies) require sales engagement.
  • Docker Desktop not required.
  • Pitched specifically at coding-agent workloads. Recognised agents: claude, codex, copilot, cursor, docker-agent, droid, gemini, kiro, opencode, shell.

Sources: product page, docs, usage, get started, architecture.

How isolation works

  • Filesystem: workspace mounted via passthrough; absolute paths preserved between host and sandbox. Multiple workspaces via sbx run/create AGENT PATH [PATH...], with :ro suffix for read-only.
  • Network: all egress routed through an HTTP/HTTPS proxy on the host. Default-deny with three policy presets at first sbx login; docs recommend Balanced. Allowlist additions via sbx policy allow network -g <host:port>. sbx policy also has named profiles for team governance.
  • Secrets: sbx secret set -g <service> stores in the OS keychain (service names like github, anthropic, openai); the host proxy injects into outbound API requests so the secret never appears inside the sandbox. Alternative: sbx exec -e KEY=VAL passes env vars directly, identical to docker exec.
  • Lifecycle: sbx createsbx run (attach) → sbx stop (env persists) → sbx rm (destroy). sbx exec -it <name> bash to drop into an existing sandbox.
  • Resource limits: --cpus, -m/--memory (defaults to 50% host, max 32 GiB).
  • Custom base image: -t/--template <oci image> — agent-specific image by default, but a custom OCI image is supported. We could ship tilth-sandbox preinstalled with uv + Python.
  • Other utilities: sbx cp (file copy host↔sandbox), sbx ports (publish sandbox ports), sbx diagnose (debug install).
  • Surprise overlap: sbx run --branch creates a git worktree as part of the sandbox. Overlaps directly with Tilth's worktree machinery — if Tilth runs inside sbx, we'd ignore the flag and let Tilth manage its own worktrees as today.

Automation-friendly invocation

For a --sandbox mode driven from a script:

sbx create shell . --name tilth-session-<id> [--memory 8g --cpus 4]
sbx exec -e TILTH_API_KEY="$TILTH_API_KEY" \
         -e TILTH_MODEL="$TILTH_MODEL" \
         -w /workspace tilth-session-<id> \
         uv run tilth ~/projects/tilth-demo
sbx rm tilth-session-<id>   # or keep for resume

Frictions specific to sbx

  • Proprietary — strategic dependency on a closed Docker Inc. product. Bounded since this is an opt-in second backend.
  • Linux requires KVM — rules out most cloud VMs without nested virt and most CI runners. Dev-machine use on macOS / Windows / Linux desktop is the realistic target.
  • Sub-1.0 maturity — CLI surface may churn.

Other options considered

  • sandbox-exec — built into macOS (no install), kernel-level sandbox profiles (Scheme/LISP syntax), zero VM overhead. Apple has technically deprecated it for app distribution but it remains the lowest-friction macOS option for sandboxing arbitrary CLI tools. There's an open issue against apple/container asking Apple to clarify the deprecation timeline. Profile authoring is painful. Could become a fourth-tier fallback for Intel Macs if we want to avoid the sbx dependency there; not pursuing in the first cut.
  • Plain Docker / rootless podman — weaker isolation (shared kernel), but ubiquitous on Linux. Could be a Linux-only fallback ahead of sbx in the preference order if we want to keep all backends open source. Not pursuing in the first cut.

Comparison matrix

Backend OS support License Maturity Isolation Overhead
Apple container Apple Silicon + macOS 26+ only Apache-2.0 pre-1.0 microVM per container sub-second
Docker sbx Mac / Win / Linux Proprietary v0.30 microVM VM boot
sandbox-exec macOS only Built-in Stable but discouraged Kernel profile negligible
Plain Docker / rootless podman All / Linux OSS Stable Shared kernel / namespaces Low

Open questions (verify by running)

  • Default network policy on Balanced: does it include OpenRouter (openrouter.ai)? OpenAI, Anthropic, Google almost certainly yes.
  • Does localhost:11434 from inside a sbx microVM reach the host's Ollama, or does it need a bridge (host.docker.internal-style)?
  • Same network + filesystem questions for Apple container, plus its default network policy.
  • Cold microVM boot time on Mac (sbx, Apple container) and Linux (sbx + KVM).
  • License terms for sbx — anything restricting commercial use of the runtime itself.

Proposed next step

Install both sbx (already done) and Apple container on a dev machine, run the demo inside each (sbx run shelluv run tilth ~/projects/tilth-demo, equivalent for container), document what breaks. Cheap probes that answer most of the open questions and confirm whether wrap-the-harness is genuinely zero-code per backend or wants a thin --sandbox affordance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions