Pier is a Harbor-compatible framework for evaluating coding agents in sandboxed environments. It reads Harbor's task format and runs trials against it.
pier run -p path/to/task --agent claude-code --env modalPier is a fork. We wanted a smaller, more opinionated base to build on. On top of Harbor, Pier adds:
- Installed agents in air-gapped tasks (
allow_internet = false). When the agent runs inside the sandbox (Claude Code, Codex, etc.), both the install step and the inference call need the network. Pier lets agents declare their install scripts and a network allowlist, whichdockerandmodalenvironments honor when setting up the sandbox. - Augmented ATIF v1.7. Strict one step per API turn, strict reasoning vs agent message separation, no fabricated assistant text,
peak_context_tokens,summarization_count,llm_call_count, real upstream timestamps. - A chat-style trajectory viewer (
pier view). pier critique runfor inspecting completed trials with a fresh agent in a fresh sandbox.
- Task format: Harbor-compatible.
- Environments:
docker,modal. Per-agent install specs and network allowlists are honored on both, so installed agents work underallow_internet = false. - Agents:
nop,oracle,claude-code,codex,cursor-cli,gemini-cli,opencode,mini-swe-agent. All emit augmented ATIF v1.7. - Datasets: local Harbor-format task directories via
-p/--path. - CLI:
pier run,pier job,pier view,pier critique run,pier check/pier analyze(vendored from Harbor)
Pier does not currently resolve or download Harbor registry datasets directly.
uv tool install datacurve-pier
# or
pip install datacurve-pierexport ANTHROPIC_API_KEY=...
pier run -p path/to/task --agent claude-code --env modal --env-file .envRun a local dataset, optionally a deterministic random subset:
pier run -p path/to/dataset --agent claude-code --env modal
pier run -p path/to/dataset --n-tasks 10 --sample-seed 0To use a Harbor registry dataset, download it with Harbor first, then point Pier at it:
uv run --directory ~/code/harbor harbor download swebenchpro -o ~/code/pier/datasets
uv run pier run -p datasets/swebenchpro --n-tasks 10 --sample-seed 0Trials land under jobs/<timestamp_or_name>/<trial_id>/. See pier run --help, pier job --help, pier critique --help, and pier view --help for everything else.
Use agent.model_name for trial metadata, agent.env for runtime env vars, and agent-specific kwargs for tool config. Pier's network allowlist also reads URLs out of those configs (Codex config_toml, OpenCode opencode_config, mini-swe config_yaml), so any base URL you set is allowlisted without code changes.
A few things we've learned plumbing this through Respan and OpenRouter:
Claude Code routes through the Anthropic face from Respan. Plan mode is disabled by default (--disallowedTools EnterPlanMode).
- name: claude-code
model_name: claude-opus-4-7
env:
ANTHROPIC_AUTH_TOKEN: ${RESPAN_API_KEY}
ANTHROPIC_BASE_URL: https://endpoint.respan.ai/api/anthropic
ANTHROPIC_CUSTOM_HEADERS: "X-Respan-Route-Provider: vertex_ai"
kwargs:
reasoning_effort: maxCodex needs a [model_providers.<name>] block with wire_api = "responses" (not WebSockets, which Codex defaults to and Respan doesn't speak).
- name: codex
model_name: openai/gpt-5.5
env: { RESPAN_API_KEY: ${RESPAN_API_KEY} }
kwargs:
config_toml: |
model_provider = "respan"
[model_providers.respan]
name = "Respan Gateway"
base_url = "https://endpoint.respan.ai/api/"
wire_api = "responses"
env_key = "RESPAN_API_KEY"
reasoning_effort: xhighGemini CLI:
- name: gemini-cli
model_name: gemini/gemini-3.1-pro-preview
env:
GEMINI_API_KEY: ${RESPAN_API_KEY}
GOOGLE_GENERATIVE_AI_API_KEY: ${RESPAN_API_KEY}
GEMINI_API_BASE: https://endpoint.respan.ai/api/google/vertexai/v1beta
GOOGLE_GEMINI_BASE_URL: https://endpoint.respan.ai/api/google/vertexai/Cursor CLI uses the installed cursor-agent binary, so it fits the same
inside-the-sandbox path as Claude Code, Codex, Gemini CLI, and OpenCode. Use
cursor/composer-2.5 for Composer 2.5 trial metadata and pass CURSOR_API_KEY
through your env file.
- name: cursor-cli
model_name: cursor/composer-2.5
env:
CURSOR_API_KEY: ${CURSOR_API_KEY}OpenCode uses opencode_config to add unknown providers or override known ones. To redirect Google to Respan, override just options.baseURL; to add a fully custom provider, use opencode_config.provider.<name> with the npm package, options, and models.
mini-swe-agent picks a native adapter from the model-name prefix: openai/... → litellm_response (OpenAI Responses end-to-end), openrouter/... → openrouter (BYOK costs from cost_details.upstream_inference_cost), everything else → LiteLLM auto.
For Gemini 3 via mini-swe-agent/LiteLLM, omitting reasoning_effort uses the Gemini API default high/dynamic thinking level, but it does not request readable thought summaries. Set kwargs.reasoning_effort: high explicitly when you want LiteLLM to send includeThoughts and preserve returned summaries as reasoning content.
- name: mini-swe-agent
model_name: openrouter/qwen/qwen3.6-plus
env: { OPENROUTER_API_KEY: ${OPENROUTER_API_KEY} }
kwargs:
set_cache_control: default_end