Skip to content

Cursor Agent Support for Ralph #33

Description

@rjernst

branch: ralph-cursor-agent

Spec: Cursor Agent Support for Ralph

Overview

Ralph's Python codebase already has the agent abstraction plumbed through (--agent flag, per-agent tokens, proxy ports, sandbox names, Dockerfile paths). However, the actual agent CLI invocation and several supporting functions are hardcoded to "claude". This spec adds cursor-agent as a second supported agent, enabling ralph --agent cursor --issue <N>.

Key differences from Claude:

  • Auth: cursor-agent uses CURSOR_API_KEY env var (simple API key, not OAuth). No proxy-based credential injection — cursor-agent doesn't support HTTP proxy configuration.
  • CLI flags: cursor-agent -p --force --trust --model <model> (no --effort, no --dangerously-skip-permissions)
  • Permissions config: cursor-agent needs ~/.cursor/cli-config.json with {"permissions": {"allow": ["*"], "deny": []}} baked into the Dockerfile
  • Sandbox: Uses docker sandbox create shell with a custom template (no built-in cursor agent in Docker sandbox)
  • Network: Needs *.cursor.sh hosts allowed (api2, api5, etc.) instead of Anthropic hosts
  • Secret delivery: API key written to a temp file in the sandbox, read into env var and deleted before cursor-agent starts

Architecture

ralph --agent cursor --issue 42
  │
  ├─ 1. Token: store_token/ensure_token
  │   ├─ Claude: run_claude_setup_token() → OAuth token → Keychain
  │   └─ Cursor: prompt for API key → Keychain (service: "cursor-token")
  │
  ├─ 2. Proxy (Claude only)
  │   ├─ Claude: start proxy on port 18080, inject OAuth token into API requests
  │   └─ Cursor: NO proxy — API key injected via secret file (see step 4)
  │
  ├─ 3. Sandbox
  │   ├─ Image: docker/agent-loop/cursor/Dockerfile (FROM sandbox-templates:shell + cursor-agent)
  │   ├─ Project layer: .agent-loop/Dockerfile.sandbox or .agent-loop/dependencies (same as claude)
  │   ├─ Create: docker sandbox create shell -t <tag> --name <name> <workspace>
  │   └─ Network: deny-by-default + allow api2.cursor.sh, api5.cursor.sh, sentry.io
  │
  ├─ 4. Secret file lifecycle (per iteration)
  │   ├─ Write API key to /tmp/.cursor-api-key in sandbox via docker sandbox exec
  │   ├─ Iteration command is a shell wrapper:
  │   │     sh -c 'export CURSOR_API_KEY=$(cat /tmp/.cursor-api-key) &&
  │   │            rm /tmp/.cursor-api-key &&
  │   │            exec cursor-agent -p --force --trust ...'
  │   └─ Key is deleted from disk BEFORE cursor-agent starts (exec replaces shell)
  │
  └─ 5. Iteration
      └─ cursor-agent -p --force --trust --model <model> --output-format text <prompt>

Scope

In scope:

  • Cursor agent support for Docker sandboxes
  • Agent-specific token setup, sandbox creation, network policy, and CLI invocation
  • Secret file lifecycle with pre-exec cleanup
  • Project-level Dockerfile/dependencies support for cursor (same .agent-loop/ config as claude)
  • Parameterize all hardcoded "claude" references that block cursor support

Out of scope:

  • Tart sandbox backend (Docker only for now)
  • Cursor-specific selftest (selftest remains claude-only; cursor selftest can be added later)
  • Proxy support for cursor (cursor-agent doesn't support HTTP proxies)
  • MCP configuration inside the sandbox

1. Agent Abstraction

Each agent needs different behavior for:

  • Token setup: Claude uses setup-token interactive TUI; Cursor prompts for an API key
  • Token validation: Claude calls claude -p --model haiku ok; Cursor calls cursor-agent -p --force --model auto ok (or skips validation — API keys are long-lived)
  • Proxy: Claude needs a proxy; Cursor does not
  • Sandbox creation: Claude uses docker sandbox create claude; Cursor uses docker sandbox create shell
  • Network policy: Different allowed hosts per agent
  • CLI invocation: Different commands and flags
  • Secret delivery: Claude uses proxy phantom token; Cursor uses secret file with read-and-delete-before-exec pattern

The implementation should use a per-agent configuration dict or similar structure to centralize these differences, rather than scattering if/else blocks.

2. CLI Flags

cursor-agent headless invocation:

cursor-agent -p --force --trust --model <model> --output-format text "<prompt>"
  • -p / --print — headless mode (like Claude's -p)
  • --force — allow file writes (like Claude's --dangerously-skip-permissions)
  • --trust — trust the workspace (skip workspace trust prompt)
  • --model <model> — model selection (default: auto)
  • --output-format text — human-readable output

Additionally, ~/.cursor/cli-config.json must exist with permissive tool permissions:

{
  "permissions": {
    "allow": ["*"],
    "deny": []
  }
}

This is baked into the Cursor Dockerfile.

No equivalent of Claude's --effort high.

3. Network Policy

Per-agent allowed hosts:

  • Claude: api.anthropic.com, statsig.anthropic.com, sentry.io
  • Cursor: api2.cursor.sh, api5.cursor.sh, sentry.io

Note: Cursor may use additional subdomains (api3, api4, gcpp). Verify during implementation by running cursor-agent with network logging and add any missing hosts.

4. Project-Level Image Layers

The existing project image system (.agent-loop/Dockerfile.sandbox and .agent-loop/dependencies) already uses ARG BASE_IMAGE / FROM ${BASE_IMAGE}, so it layers generically on top of whichever agent's base image is in use. This must work for cursor the same way it works for claude — the cursor base image tag is passed as BASE_IMAGE when building the project layer.

No code changes should be needed for this, but the parameterized sandbox_agent subcommand (step 3) must be used consistently through ensure_sandboxensure_project_image_docker_sandbox_create.


Implementation Plan

Step 1: Add agent configuration registry [done]

Files:

  • tools/ralph/src/ralph/agents.py — New file: agent configuration registry

Implement:

  1. Create a module with per-agent configuration dicts containing:
    • cli_command: the CLI binary name ("claude" or "cursor-agent")
    • sandbox_agent: the docker sandbox create subcommand ("claude" or "shell")
    • cli_flags: function that returns agent-specific flags for iteration (e.g., ["--dangerously-skip-permissions", "--effort", "high"] for claude, ["--force", "--trust", "--output-format", "text"] for cursor)
    • allowed_hosts: list of hosts for network policy
    • default_model: default model name ("sonnet" for claude, "auto" for cursor)
    • uses_proxy: boolean (True for claude, False for cursor)
    • env_var_name: the token env var name ("CLAUDE_CODE_OAUTH_TOKEN" for claude, "CURSOR_API_KEY" for cursor)
  2. Provide a get_agent(name) lookup function that raises a clear error for unknown agents
  3. Keep the valid agent names list in one place for CLI validation

Acceptance:

  • get_agent("claude") returns claude config
  • get_agent("cursor") returns cursor config
  • get_agent("unknown") raises ValueError
  • pytest tests/test_agents.py -v passes

Step 2: Create Cursor Dockerfile and sandbox template [done]

Files:

  • docker/agent-loop/cursor/Dockerfile — New Cursor sandbox image

Implement:

  1. Create docker/agent-loop/cursor/Dockerfile:
    FROM docker/sandbox-templates:shell
    USER root
    RUN curl https://cursor.com/install -fsS | bash
    RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential jq openssh-client fd-find \
        && rm -rf /var/lib/apt/lists/*
    USER agent
    RUN mkdir -p ~/.cursor && \
        echo '{"permissions":{"allow":["*"],"deny":[]}}' > ~/.cursor/cli-config.json
  2. Verify the cursor installer works in the Docker build context and places cursor-agent on PATH
  3. If the installer requires a different approach in Docker (e.g., direct binary download), adapt accordingly
  4. The cli-config.json gives cursor-agent full tool permissions (equivalent to Claude's --dangerously-skip-permissions)

Acceptance:

  • docker build -t test-cursor docker/agent-loop/cursor/ succeeds
  • docker run --rm test-cursor which cursor-agent finds the binary
  • docker run --rm test-cursor cursor-agent --version returns a version
  • docker run --rm test-cursor cat ~/.cursor/cli-config.json shows the permissions config

Step 3: Parameterize sandbox creation to support shell agent [done]

Files:

  • tools/ralph/src/ralph/sandbox/docker.py — Update _docker_sandbox_create and apply_network_policy

Implement:

  1. Update _docker_sandbox_create to accept an agent parameter and use the agent config's sandbox_agent value instead of hardcoded "claude"
  2. Update apply_network_policy to accept an agent parameter and use the agent config's allowed_hosts instead of hardcoded Anthropic hosts
  3. Thread the agent parameter through ensure_sandbox_docker_sandbox_create and apply_network_policy
  4. Verify that ensure_project_image continues to work — it already passes the base image tag generically via ARG BASE_IMAGE, so no changes should be needed there, but confirm the full path works: cursor base image → project layer → docker sandbox create shell -t <project-tag>

Acceptance:

  • Claude sandboxes still created with docker sandbox create claude
  • Cursor sandboxes created with docker sandbox create shell
  • Network policy uses agent-specific allowed hosts
  • Project-level image layers work for both agents (.agent-loop/Dockerfile.sandbox and .agent-loop/dependencies)
  • pytest tests/test_sandbox_docker.py -v passes

Step 4: Parameterize agent CLI invocation in run_iteration [done]

Files:

  • tools/ralph/src/ralph/sandbox/docker.py — Update run_iteration
  • tools/ralph/src/ralph/sandbox/__init__.py — Update SandboxBackend.run_iteration signature
  • tools/ralph/src/ralph/loop.py — Update env_vars and run_iteration call

Implement:

  1. Add agent parameter to run_iteration in the SandboxBackend base class and DockerSandbox
  2. For claude: use existing direct exec pattern — docker sandbox exec ... claude -p <prompt> --model <model> --dangerously-skip-permissions --effort high
  3. For cursor: implement the secret file lifecycle:
    a. Write API key to /tmp/.cursor-api-key inside sandbox via docker sandbox exec -i ... tee
    b. Build a shell wrapper command: sh -c 'export CURSOR_API_KEY=$(cat /tmp/.cursor-api-key) && rm /tmp/.cursor-api-key && exec cursor-agent -p --force --trust --model <model> --output-format text "<prompt>"'
    c. The exec replaces the shell process, so the key exists only in the env of the cursor-agent process (not as a file on disk)
  4. In loop.py, make env_vars agent-aware:
    • Claude: CLAUDE_CODE_OAUTH_TOKEN=phantom, ANTHROPIC_BASE_URL=..., ANTHROPIC_CUSTOM_MODEL_OPTION=... (existing)
    • Cursor: no env vars passed via -e flags — the API key is injected via the secret file + shell wrapper pattern above
  5. Pass the API key string (from Keychain) through to run_iteration so it can write the secret file

Acceptance:

  • Claude iterations still call claude -p ... --dangerously-skip-permissions --effort high
  • Cursor iterations: secret file written, read into env var, deleted, then cursor-agent execs
  • The API key file does not exist on disk while cursor-agent is running
  • pytest tests/test_sandbox_docker.py -v passes
  • pytest tests/test_loop.py -v passes

Implementation notes:

  • Secret file path is /tmp/.agent-api-key (generic, not cursor-specific)
  • Uses shlex.quote for shell quoting in the sh -c wrapper (consistent with tart.py)
  • All values interpolated into sh -c string (prompt, model, flags) are shell-quoted for defense in depth
  • TartSandbox.run_iteration signature updated to accept agent/api_key kwargs for compatibility
  • Renamed test_agent_codex_uses_correct_namestest_agent_cursor_uses_correct_names since "codex" is not a valid agent in the registry
  • read_token_from_keychain imported in loop.py for non-proxy agents to retrieve the raw API key

Step 5: Add cursor-specific token management [done]

Files:

  • tools/ralph/src/ralph/token.py — Add cursor token setup alongside claude

Implement:

  1. Add prompt_for_api_key() function that:
    • Prompts user: "Enter your Cursor API key (from cursor.com/dashboard → Integrations → User API Keys):"
    • Reads the key from stdin
    • Stores in Keychain under cursor-token service as JSON: {"accessToken": "<key>", "expiresAt": <far-future>}
  2. Make store_token dispatch based on agent:
    • claude → existing run_claude_setup_token() flow
    • cursorprompt_for_api_key() flow
  3. Make ensure_token dispatch based on agent:
    • claude → existing flow (run setup-token if missing)
    • cursor → prompt for API key if missing
  4. Make _parse_and_store_token agent-aware:
    • Claude: validate via claude -p --model haiku ok with CLAUDE_CODE_OAUTH_TOKEN
    • Cursor: skip validation (API keys are long-lived and can't be validated without a full agent run) OR validate via cursor-agent -p --force --model auto "ok" with CURSOR_API_KEY if cursor-agent is installed on host
  5. Update error messages to be agent-specific instead of hardcoding "claude setup-token"

Acceptance:

  • ralph store-token --agent claude still runs claude setup-token
  • ralph store-token --agent cursor prompts for an API key
  • ralph check-token --agent cursor reports status from Keychain
  • pytest tests/test_token.py -v passes

Implementation notes:

  • Dispatch uses agent_config["uses_proxy"] rather than agent name comparison — claude (proxy) validates tokens, cursor (non-proxy) skips validation
  • prompt_for_api_key uses input() (not getpass) for consistency with how claude setup-token echoes output
  • Error messages changed from "running claude setup-token..." to "requesting new token..." (generic)
  • Existing test using "codex" agent updated to "cursor" since store_token now calls get_agent() which validates agent names
  • _parse_and_store_token validation uses agent_config["cli_command"] and agent_config["env_var_name"] for the validation subprocess call, not hardcoded "claude"

Step 6: Make proxy conditional (skip for cursor) [done]

Files:

  • tools/ralph/src/ralph/cli.py — Conditionally start proxy
  • tools/ralph/src/ralph/loop.py — Conditionally use proxy env vars

Implement:

  1. In cli.py main flow, check agent_config.uses_proxy:
    • If True (claude): ensure_proxy, start_proxy_keepalive as before
    • If False (cursor): skip proxy entirely, set proxy_port to None
  2. In loop.py process_issue, build env_vars based on agent:
    • Claude: phantom token + proxy base URL + custom model option (existing)
    • Cursor: no proxy env vars — API key delivery handled in run_iteration (Step 4)
  3. In loop.py, skip proxy health check / restart for non-proxy agents
  4. Pass the raw API key (from ensure_token) through to process_issuerun_iteration for cursor's secret file injection

Acceptance:

  • ralph --agent claude --issue N starts proxy as before
  • ralph --agent cursor --issue N skips proxy, delivers API key via secret file in run_iteration
  • pytest tests/test_cli.py -v passes
  • pytest tests/test_loop.py -v passes

Implementation notes:

  • Only cli.py needed changes — loop.py already had agent-conditional env_vars and proxy health check logic from Step 4
  • proxy_port=None for non-proxy agents is safe: all downstream uses in loop.py are gated behind agent_config["uses_proxy"]
  • Updated test_agent_flag_passed_through from "codex" to "cursor" since get_agent() now validates agent names
  • Added test_cursor_skips_proxy and test_claude_starts_proxy for explicit proxy conditional coverage

Step 7: Update default model handling [done]

Files:

  • tools/ralph/src/ralph/cli.py — Use agent config for default model

Implement:

  1. After agent is determined, set default model from agent_config.default_model instead of hardcoded "sonnet"
  2. Only override if user didn't specify --model explicitly

Acceptance:

  • ralph --agent claude defaults to model sonnet
  • ralph --agent cursor defaults to model auto
  • ralph --agent cursor --model gpt-5 uses gpt-5
  • pytest tests/test_cli.py -v passes

Implementation notes:

  • model initialized to None instead of "sonnet", then resolved from agent_config["default_model"] after arg parsing completes
  • Moved get_agent(agent) call earlier (before validation) so agent_config is available for model resolution — also catches unknown agent names sooner
  • Removed duplicate get_agent(agent) call that was further down in the function
  • Updated usage text from "Claude model (default: sonnet)" to "Model name (default: per-agent, e.g. sonnet for claude)"

Step 8: Run all checks [done]

Acceptance:

  • pytest tests/ -v — all tests pass
  • python3 -m py_compile tools/ralph/src/ralph/agents.py — no syntax errors
  • python3 -m py_compile tools/ralph/src/ralph/token.py — no syntax errors
  • python3 -m py_compile tools/ralph/src/ralph/sandbox/docker.py — no syntax errors
  • python3 -m py_compile tools/ralph/src/ralph/loop.py — no syntax errors
  • python3 -m py_compile tools/ralph/src/ralph/cli.py — no syntax errors
  • No regressions in claude agent behavior

Implementation notes:

  • All 5 py_compile checks pass clean
  • 415 tests pass, 6 skipped (integration tests requiring Docker)
  • Fixed 7 pre-existing test failures:
    • test_contains_step_structure: updated assertion to match current ITERATION_PROMPT wording ("For each task, follow this workflow")
    • 6 tart sandbox tests: removed "--" separator from expected tart exec commands to match tart 2.x syntax change (commit cd425ae), and adjusted command index offsets accordingly

Conventions

  • Language: Python 3 (stdlib only, no third-party dependencies)
  • Tests: pytest with monkeypatching for subprocess calls
  • Error messages: Prefix with ralph:
  • Exit codes: 0=success, 1=runtime error, 2=usage error

Metadata

Metadata

Assignees

No one assigned

    Labels

    specRalph spec for automated executionstatus:doneCompleted

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions