Skip to content

Add pi coding agent to Ralph #44

Description

@rjernst

branch: ralph-pi-agent

Spec: Add pi coding agent to Ralph

Overview

Add pi as a new agent type in Ralph's agent loop. Pi is a third-party AI coding agent (@mariozechner/pi-coding-agent) that supports multiple model providers. The initial implementation uses the claude-sdk provider (subscription-backed via the Claude Agent SDK), with the architecture designed so future providers (OpenAI, Cursor API, etc.) can be added without restructuring.

The pi provider is configured per-project in .agent-loop/config.json. The provider selection determines the Docker base image, sandbox type, auth mechanism, and network policy — all driven by a providers dict in the agent config.

For the claude-sdk provider: pi runs in a Docker sandbox based on docker/sandbox-templates:claude-code (which provides Claude Code + Node.js). The provider spawns a Claude Code subprocess via the Agent SDK; env vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_BASE_URL) propagate through the process tree so the existing credential proxy handles auth unchanged.

Future providers (e.g., OpenAI) would use a lightweight node:22-slim base, "shell" sandbox type, and direct API key injection — no proxy needed.

Architecture

ralph --agent pi --issue 42
  │
  ├─ cli.py: ensure_token() + ensure_proxy() (based on resolved provider config)
  │
  ├─ loop.py: process_issue()
  │   ├─ load_runtime_config()  →  reads .agent-loop/config.json
  │   │                             including pi.provider (default: "claude-sdk")
  │   ├─ resolve_agent_config() →  merges provider-specific config into agent config
  │   ├─ ensure_sandbox()       →  builds pi Docker image (base image varies by provider)
  │   └─ run_iteration()        →  docker sandbox exec ... pi -p "<prompt>" ...
  │
  └─ Docker sandbox:
      pi -p "<ITERATION_PROMPT>" --model claude-sonnet-4-6 --no-skills --no-session
        └─ claude-sdk-provider extension
            └─ sdk.query() → Claude Code subprocess
                └─ API calls → proxy (host.docker.internal:18080) → api.anthropic.com

Docker build context assembly

The pi Docker image needs source from two locations: the Dockerfile/settings from docker/agent-loop/pi/ and extensions from pi/packages/. A prepare_build_context(agent) method on DockerImageMixin assembles a temp directory containing both before running docker build. All directories in pi/packages/ are auto-discovered — adding a new extension requires no wiring.

temp-context/
  Dockerfile              ← from docker/agent-loop/pi/
  settings.json           ← from docker/agent-loop/pi/
  packages/
    claude-sdk-provider/  ← from pi/packages/claude-sdk-provider/ (sans node_modules, dist)
    future-provider/      ← auto-discovered from pi/packages/

Provider-aware agent config

The AGENTS["pi"] entry uses a providers dict keyed by provider name. Each provider defines its own uses_proxy, env_var_name, allowed_hosts, sandbox_agent, base_image, and auth config. A helper resolves the effective config by merging the selected provider's settings into the base agent config.

AGENTS["pi"] = {
    "cli_command": "pi",
    "cli_flags": _pi_cli_flags,
    "default_model": "claude-sonnet-4-6",
    "default_provider": "claude-sdk",
    "providers": {
        "claude-sdk": {
            "sandbox_agent": "claude",
            "base_image": "docker/sandbox-templates:claude-code",
            "uses_proxy": True,
            "allowed_hosts": ["api.anthropic.com", "statsig.anthropic.com", "sentry.io"],
            "env_var_name": "CLAUDE_CODE_OAUTH_TOKEN",
            "default_auth_mode": "oauth",
            "auth_modes": { ... },  # same as claude agent
        },
        # Future example:
        # "openai": {
        #     "sandbox_agent": "shell",
        #     "base_image": "node:22-slim",
        #     "uses_proxy": False,
        #     "allowed_hosts": ["api.openai.com"],
        #     "env_var_name": "OPENAI_API_KEY",
        # },
    },
}

Dockerfile with parameterized base image

The Dockerfile uses ARG BASE_IMAGE so the build process can pass the provider-appropriate base:

ARG BASE_IMAGE
FROM ${BASE_IMAGE}
# ... install pi, extensions, settings

The build_image method passes --build-arg BASE_IMAGE=<provider.base_image> when building.

Per-project config

.agent-loop/config.json gains an optional pi section:

{
  "type": "docker-sandbox",
  "pi": {
    "provider": "claude-sdk",
    "model": "claude-sonnet-4-6"
  }
}

Both fields are optional and fall back to the AGENTS dict defaults.


1. Build Context Assembly

DockerImageMixin gains a prepare_build_context(agent) method:

  • For agents without extensions: returns the existing build_context(agent) path unchanged.
  • For pi: creates a temp directory, copies docker/agent-loop/pi/* into it, then copies each subdirectory of pi/packages/ (excluding node_modules/, dist/, .git/) into a packages/ subdirectory.
  • Returns the assembled path.

The content_hash computation must also incorporate the extension source so cache invalidation works when extensions change. Hash the filtered contents of pi/packages/ alongside the Dockerfile and base digest.

build_image calls prepare_build_context and builds from the returned path. If a temp directory was created, it is cleaned up after the build.

2. Pi Docker Image

The Dockerfile uses ARG BASE_IMAGE so the build process controls the base per provider. For claude-sdk, the base is docker/sandbox-templates:claude-code (Claude Code + Node.js). Future providers would pass node:22-slim.

The image installs pi globally, builds all extensions from packages/, symlinks them into pi's extension discovery path, and copies the settings file.

3. Provider-Aware Agent Config

The pi agent introduces a provider abstraction. Each provider in the providers dict defines sandbox_agent, base_image, uses_proxy, allowed_hosts, env_var_name, and optionally auth_modes. A resolve_agent_config() function merges the selected provider's config into a flat dict compatible with the rest of ralph's agent dispatch.

4. Per-Project Config

load_runtime_config() parses the optional pi section from .agent-loop/config.json and validates the provider against known providers.

5. Constraints

  • No new dependencies: ralph remains stdlib-only Python.
  • No changes to existing agents: claude and cursor behavior must be unchanged.
  • Proxy always starts for pi in this iteration: since only claude-sdk is implemented, the proxy is always needed. Future providers that skip the proxy will require refactoring token/proxy bootstrap order in cli.py — out of scope.
  • Extension auto-discovery: adding a new pi extension package to pi/packages/ must not require any code changes in ralph — only a Docker image rebuild.

Implementation Plan

Step 1: Build context assembly in DockerImageMixin [done]

Notes:

  • prepare_build_context(agent) returns a tuple (path, cleanup) where cleanup is either a tempfile.TemporaryDirectory instance (whose .cleanup() should be invoked when the build is done) or None for the non-pi case. This keeps the path-returning contract while letting build_image clean up the temp dir afterward.
  • Added _EXTENSION_EXCLUDES = ("node_modules", "dist", ".git") and extensions_dir(agent) helper. extensions_dir returns the path to pi/packages/ for the pi agent and None for all other agents (backwards compatible — content_hash for non-pi agents is identical to before).
  • _hash_extensions_dir walks the dir streaming file contents into a sha256 (with relative paths and NUL separators, sorted entries) for stable, deterministic hashing.
  • content_hash gained an optional extensions_hash="" argument; empty string yields the legacy two-argument result so existing tests still pass.
  • build_image gained an optional base_image=None argument; when set, --build-arg BASE_IMAGE=<base_image> is passed through. Other existing call sites pass nothing and behave unchanged.
  • Pytest is unavailable in the iteration sandbox (pypi downloads blocked). The added pytest tests were exercised via stdlib unittest equivalents (13/13 passed) that drive the same code paths.

Files:

  • tools/ralph/src/ralph/runtime/__init__.py — add prepare_build_context(), update hashing, update build_image

Implement:

  1. Add prepare_build_context(agent) to DockerImageMixin. For agent == "pi", create a tempfile.TemporaryDirectory, copy docker/agent-loop/pi/* into it, then copy each subdir of pi/packages/ (excluding node_modules, dist, .git) into packages/<name>/. Return the temp dir path. For other agents, return self.build_context(agent).
  2. Add _hash_extensions_dir(packages_dir) that walks pi/packages/ (respecting the same exclusions) and produces a stable content hash of all file contents.
  3. Update compute_tag to incorporate the extensions hash for the pi agent, so image tags change when extension source changes.
  4. Update build_image to accept an optional base_image arg, call prepare_build_context, pass --build-arg BASE_IMAGE=<base_image> when provided, build from the returned path, and clean up the temp dir afterward.

Acceptance:

  • prepare_build_context("claude") returns the existing path unchanged.
  • prepare_build_context("pi") returns a temp dir containing Dockerfile, settings.json, and packages/claude-sdk-provider/ (without node_modules or dist).
  • compute_tag("pi") produces different tags when extension source changes.
  • Existing claude/cursor image builds are unaffected.
  • pytest tools/ralph/tests/ -v -k build_context passes.

Step 2: Pi Docker image and settings [done]

Notes:

  • The build step uses plain npm install (not npm install --production) because the claude-sdk-provider extension's build runs tsc, and typescript lives in devDependencies. --production would skip dev deps and break the build. The extra dev deps remain in the image (no prune step) — this is consistent with how the host installs the extension via roles/pi/install.
  • Extension build loop runs in a single RUN with set -eu so any failure aborts the build cleanly.
  • Symlink uses ${pkg%/} to strip the trailing slash from the for-loop expansion before passing to ln -sfn so the link target is a directory path (not a path ending in /).
  • Single chown -R agent:agent /home/agent/.pi at the end fixes ownership of everything created as root: the extensions/ dir, the symlinks, and the COPY'd settings.json.
  • Acceptance test for "actual docker build succeeds" cannot be run in the iteration sandbox (image registry pulls are network-blocked). The Dockerfile passes docker buildx build --check for syntax — only warning is InvalidDefaultArgInFrom, which matches the existing generate_project_dockerfile() pattern in the codebase.
  • Added TestPiDockerfileShipped test class covering: Dockerfile uses ARG BASE_IMAGE/FROM ${BASE_IMAGE}, installs pi + extensions, installs all required apt deps, and settings.json parses with the four expected default fields.

Files:

  • docker/agent-loop/pi/Dockerfile — new
  • docker/agent-loop/pi/settings.json — new
  • tools/ralph/tests/test_runtime_docker_sandbox.py — new TestPiDockerfileShipped class

Implement:

  1. Create docker/agent-loop/pi/Dockerfile:
    • ARG BASE_IMAGE / FROM ${BASE_IMAGE}
    • Install system deps as root: build-essential jq openssh-client fd-find
    • Install pi: npm install -g @mariozechner/pi-coding-agent
    • COPY packages/ /opt/pi-extensions/
    • Build each extension: loop over /opt/pi-extensions/*/, run npm install && npm run build (deviation from spec — see Notes)
    • Symlink extensions into ~/.pi/agent/extensions/
    • COPY settings.json /home/agent/.pi/agent/settings.json
    • chown -R agent:agent /home/agent/.pi
    • Switch to USER agent
  2. Create docker/agent-loop/pi/settings.json with defaultProvider: "claude-sdk", defaultModel: "claude-sonnet-4-6", defaultThinkingLevel: "high", hideThinkingBlock: true.

Acceptance:

  • docker build --build-arg BASE_IMAGE=docker/sandbox-templates:claude-code succeeds with a valid assembled build context.
  • The image contains pi on PATH, the claude-sdk-provider extension built and symlinked, and settings.json at /home/agent/.pi/agent/settings.json.

Step 3: Pi agent config with provider abstraction [done]

Notes:

  • resolve_agent_config(name, provider=None) returns the base config dict unchanged (same identity) for agents without a providers key — keeping claude/cursor behavior 100% backwards-compatible.
  • For pi, a shallow copy of the base config is built (with providers stripped out) and the selected provider's keys are merged in. The selected provider name is also stored as provider on the result so downstream code can branch on it.
  • get_auth_mode was updated to delegate to resolve_agent_config instead of get_agent. This is a no-op for claude/cursor (resolve returns base) but lets pi reach into its claude-sdk provider's auth_modes. No call sites needed to change.
  • Pytest is unavailable in the iteration sandbox (pypi blocked), so acceptance assertions were exercised via stdlib equivalents driving the same code paths. The pytest tests added to test_agents.py follow the file's existing class/style conventions.

Files:

  • tools/ralph/src/ralph/agents.py — add pi agent entry, resolve_agent_config()

Implement:

  1. Add _pi_cli_flags(model) returning ["--no-skills", "--no-session"].
  2. Add AGENTS["pi"] with cli_command: "pi", default_model: "claude-sonnet-4-6", default_provider: "claude-sdk", and a providers dict containing the claude-sdk provider config (sandbox_agent, base_image, uses_proxy, allowed_hosts, env_var_name, default_auth_mode, auth_modes).
  3. Add resolve_agent_config(name, provider=None):
    • Gets base config via get_agent(name).
    • If providers dict exists, selects provider (fallback to default_provider), validates it exists, returns a shallow copy with provider-specific keys merged in.
    • If no providers dict, returns base config as-is (backwards-compatible for claude/cursor).
    • Raises ValueError for unknown providers.
  4. Update VALID_AGENTS to include "pi".

Acceptance:

  • get_agent("pi") returns the pi config dict with providers key.
  • resolve_agent_config("pi") returns a flat dict with uses_proxy: True, sandbox_agent: "claude", allowed_hosts from claude-sdk provider, etc.
  • resolve_agent_config("claude") returns the existing claude config unchanged.
  • resolve_agent_config("pi", provider="unknown") raises ValueError.
  • pytest tools/ralph/tests/ -v -k agent passes.

Step 4: Per-project config extension [done]

Notes:

  • AGENTS is imported lazily inside load_runtime_config() (rather than at module top) to keep the runtime module's import graph free of any agents-module dependency. Functionally identical, but avoids forcing every consumer of runtime to load agents.
  • Added a defensive type check: a non-dict pi value (e.g. {"pi": "claude-sdk"}) raises ValueError rather than crashing later when .get() is called on a string. The error message prefix matches the project convention (ralph: 'pi' section in <path> must be an object).
  • Empty pi: {} is treated as "no pi config" (no pi_provider/pi_model keys surface). This keeps the keys absent rather than None, so downstream callers can use a single cfg.get(...) lookup with their own default.
  • pi_model is not validated — pi/Claude has many model aliases and the agent itself rejects unknowns. Validating provider only is consistent with the spec.
  • pytest is unavailable in the iteration sandbox (pypi blocked); the new pytest tests follow the existing class style in test_runtime_docker_sandbox.py and were exercised end-to-end via stdlib equivalents that drive the same code paths (9/9 passed).

Files:

  • tools/ralph/src/ralph/runtime/__init__.py — extend load_runtime_config()
  • tools/ralph/tests/test_runtime_docker_sandbox.py — added pi-section coverage to TestLoadRuntimeConfig

Implement:

  1. In load_runtime_config(), parse the optional pi section from the config dict. Extract pi.provider and pi.model if present.
  2. Validate that pi.provider (if present) is a known provider by importing and checking against AGENTS["pi"]["providers"].
  3. Store pi_provider and pi_model in the returned config dict.

Acceptance:

  • Config without pi section works unchanged.
  • Config with {"pi": {"provider": "claude-sdk"}} parses correctly, returns pi_provider: "claude-sdk".
  • Config with {"pi": {"provider": "unknown"}} raises ValueError.
  • pytest tools/ralph/tests/ -v -k runtime_config passes.

Step 5: Loop integration [done]

Notes:

  • cli.py now uses resolve_agent_config(agent) for the uses_proxy check. For pi this resolves the default provider (claude-sdk), giving uses_proxy=True. The proxy reuses port 18080 via proxy_port_for_agent("pi") (DEFAULT_PROXY_PORT fallback) and writes its PID to /tmp/ralph-proxy-pi.pid. ensure_proxy("pi", ...) will reuse a healthy claude proxy on the same port (mode/version match), so concurrent ralph invocations on the same auth mode share one proxy.
  • --model precedence is tracked via a new explicit_model boolean: cli.py sets it to True only when the user passed --model. It's forwarded to process_issue / poll_loop, where the order becomes: explicit --model > pi.model from .agent-loop/config.json > agent default. This avoids interpreting "user passed the default value explicitly" as "use pi.model instead", because explicit-ness is tracked separately from the resolved value.
  • pi_provider / pi_model are popped from the runtime config dict in process_issue before the remaining kwargs are forwarded to create_runtime. Otherwise unknown keys would leak into runtime constructors (the existing kwargs are forwarded with **config).
  • runtime.ensure_sandbox gained a base_image=None kwarg that is threaded into ensure_imagepull_base_image / compute_tag / build_image. Loop only forwards base_image=... when the resolved agent config has a non-empty base_image field, keeping the existing claude/cursor assert_called_once_with(...) assertions matching exactly.
  • DockerImageMixin._effective_base_image(agent, base_image) is the new resolution helper. It returns the override when supplied; otherwise it parses the Dockerfile's FROM directive and rejects ARG-style placeholders (e.g. ${BASE_IMAGE}). This way, agents that template their base must supply an override at the call site instead of silently producing a broken docker pull.
  • compute_tag was kept tolerant of ARG-style FROM with no override — it falls back to an empty base_digest so existing TestComputeTagPiExtensions (which doesn't pass base_image) continues to pass and exercise the extensions-hash logic.
  • token.py::_resolve_mode_string switched from get_agent to resolve_agent_config, so pi's auth_modes (which live on the claude-sdk provider) surface correctly when the user runs ralph store-token --agent pi.
  • pytest is unavailable in the iteration sandbox; the new pytest tests in test_loop.py::TestProcessIssuePi were exercised end-to-end via stdlib unittest.mock equivalents (7/7 passed) plus targeted runtime checks for _effective_base_image and ensure_image with/without base_image.

Files:

  • tools/ralph/src/ralph/loop.py — pop pi keys, re-resolve agent config, model precedence, base_image kwarg
  • tools/ralph/src/ralph/cli.pyresolve_agent_config, track explicit_model, forward to loop
  • tools/ralph/src/ralph/runtime/__init__.py_effective_base_image helper, thread base_image through ensure_image / compute_tag / pull_base_image / needs_rebuild
  • tools/ralph/src/ralph/runtime/docker_sandbox.pyresolve_agent_config; ensure_sandbox accepts base_image
  • tools/ralph/src/ralph/runtime/container.pyresolve_agent_config; ensure_sandbox accepts base_image
  • tools/ralph/src/ralph/token.py_resolve_mode_string uses resolve_agent_config
  • tools/ralph/tests/test_loop.py — new TestProcessIssuePi class

Implement:

  1. In process_issue(), after load_runtime_config(), extract pi_provider and pi_model from the config.
  2. Call resolve_agent_config(agent, provider=pi_provider) to get the effective config. Use the resolved config for all downstream calls: proxy env building, ensure_sandbox (pass base_image from resolved config), run_iteration, network policy.
  3. If pi_model is set in config, use it as the model (CLI --model takes precedence if explicitly provided).
  4. In ensure_sandbox, pass the resolved base_image to build_image so the correct base is used per provider.
  5. In cli.py, ensure ensure_token and ensure_proxy use the resolved agent config. Since pi reuses claude's auth_modes, this should work without changes, but verify the code path.

Acceptance:

  • ralph --agent pi --issue <N> runs pi in a Docker sandbox with the claude-sdk provider.
  • Pi receives CLAUDE_CODE_OAUTH_TOKEN=phantom and ANTHROPIC_BASE_URL=http://host.docker.internal:18080 as env vars.
  • The iteration runs: pi -p "<prompt>" --model claude-sonnet-4-6 --no-skills --no-session.
  • Spec file is written to /tmp/spec.md and read back after iteration.
  • Claude and cursor agents are unaffected.

Step 6: Run all checks [done]

Notes:

  • Full pytest collection surfaced four regressions that the per-step pytest filters in steps 1–5 (-k build_context, -k agent, -k runtime_config) had skipped past. All four were fixed in this step:
    1. TestSecretFileLifecycle's parametrize fixture filtered VALID_AGENTS by get_agent(a)["uses_proxy"], which KeyErrors on pi (its uses_proxy lives under the claude-sdk provider). Switched to resolve_agent_config.
    2. TestSandboxEnsureSandbox / TestContainerEnsureSandbox test_force_rebuild_passed_through assertions used assert_called_once_with("claude", force_rebuild=True). The Step 5 implementation note promised the existing claude/cursor call signature would be preserved; in practice both ensure_sandbox methods unconditionally forwarded base_image=None to ensure_image. Updated ensure_sandbox to only forward base_image when it's set, matching the spec's intent and the existing loop.py pattern.
    3. test_rejects_invalid_env_var_name patched ralph.runtime.container.get_agent but Step 5 switched the call site to resolve_agent_config. Updated the patch target.
    4. test_proxy_recovery_passes_auth_mode was failing as of commit a90c82e ("proxy resilience for idle shutdown") — that commit added a per-issue ensure_proxy(...) call before the recovery path, so ensure_proxy is now called twice (once at startup health check, once on recovery) but the test still used assert_called_once_with. Rewrote the assertion to verify both calls forward auth_mode="api_key".
  • After fixes: 880 passed, 14 skipped (integration/docker-gated). Pre-existing collection errors in tests/test_sandbox_prune.py and tests/test_tart_sandbox.py (top-level dotfiles tests, unrelated to ralph/pi) reference a ralph.sandbox.tart module that no longer exists; verified they failed identically on 0bd0801~1 so they are out of scope for this spec.

Acceptance:

  • pytest tools/ralph/tests/ -v — all tests pass ✓ (880 passed, 14 skipped)
  • python3 -m py_compile tools/ralph/src/ralph/agents.py — no syntax errors ✓
  • python3 -m py_compile tools/ralph/src/ralph/runtime/__init__.py — no syntax errors ✓
  • python3 -m py_compile tools/ralph/src/ralph/loop.py — no syntax errors ✓
  • python3 -m py_compile tools/ralph/src/ralph/cli.py — no syntax errors ✓

Conventions

  • Language: Python 3 (stdlib only, no third-party deps) for ralph; Dockerfile for image
  • Tests: pytest, files in tools/ralph/tests/
  • Error messages: Prefix with ralph:
  • Exit codes: 0=success, 1=runtime error, 2=usage error

Metadata

Metadata

Assignees

No one assigned

    Labels

    specRalph spec for automated executionstatus:doneCompleted

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions