Agent Shell is a light weight abstraction for executing a cli coding agent headlessly and returning the output that can be used programatically as a unified contract
- One unified contract — the same
execute/stream/health_checkAPI across every agent; swap the backend without changing a line of consuming code. - Five CLI agents — Claude Code, OpenCode, Copilot CLI, Codex, and Pi behind a common adapter protocol.
- Execute or stream — get one
AgentResponse, or async-iterate normalizedStreamEvents with optional thinking/reasoning. - Session resumption — continue any conversation by passing back its
session_id. - Normalized cost & tokens — consistent
costandoutput_tokens(reasoning included) regardless of how each CLI reports them. - Health checks — confirm an agent + model combination actually works before you rely on it, read from the event stream rather than unreliable exit codes.
- Portable tool control — one canonical allow/deny vocabulary
(
bash, edit, read, web_search, web_fetch) translated to each CLI's own tool names. - Unified MCP management — register, remove, and list MCP servers across agents through a single API.
- Async & dependency-free — pure
asyncio, zero runtime dependencies, Python 3.12+.
uv add agent-shell-pyor with pip:
pip install agent-shell-pyfrom agent_shell.shell import AgentShell
from agent_shell.models.agent import AgentType
shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)
response = await shell.execute(
cwd="/path/to/project",
prompt="Can you tell me about this project?",
allowed_tools=["Read", "Glob", "Grep"],
model="sonnet",
)
print(response.response)
print(f"Cost: ${response.cost:.4f}")
print(f"Output tokens: {response.output_tokens}") # billed output, reasoning included
print(f"Session: {response.session_id}")
# Resume the conversation using the session_id
follow_up = await shell.execute(
cwd="/path/to/project",
prompt="Now refactor the auth module based on your findings",
allowed_tools=["Read", "Edit", "Bash"],
model="sonnet",
session_id=response.session_id,
)
output_tokensis a cost measure: the billed output-token count, which includes reasoning tokens (they are billed at the output rate). It is reported consistently across all adapters.
from agent_shell.shell import AgentShell
from agent_shell.models.agent import AgentType
shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)
async for event in shell.stream(
cwd="/path/to/project",
prompt="Refactor the auth module",
allowed_tools=["Read", "Edit", "Bash"],
model="sonnet",
effort="high",
include_thinking=True,
):
if event.type == "system":
print(f"Session: {event.session_id}")
else:
print(f"[{event.type}] {event.content}")Verify an agent + model combination actually works before relying on it. It sends a trivial prompt and reports whether a real response came back — catching bad model names, missing credentials, and billing/quota failures. Exit codes alone are unreliable (some CLIs exit 0 on failure), so the verdict is read from the normalized event stream.
shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)
result = await shell.health_check(cwd="/path/to/project", model="haiku")
if not result.healthy:
print(f"unavailable: {result.exception}")Pass a deny-list of tools that the agent must not use. Use the canonical vocabulary
{bash, edit, read, web_search, web_fetch} and Agent Shell translates it to each CLI's
own tool names — callers don't need to know the per-harness vocabulary:
shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)
response = await shell.execute(
cwd="/path/to/project",
prompt="Audit this code but don't run anything or touch the network",
disallowed_tools=["bash", "web_search", "web_fetch"],
)editcovers write/edit/notebook-edit (it fans out on harnesses that split them).- Any name outside the canonical set passes through verbatim (e.g. an MCP tool
mcp__server__tool, or a harness-specific name likeWrite, or Copilot'sview). - Deny takes precedence over auto-approve on every backend that supports it.
- Where a backend cannot enforce a deny, the adapter emits a
UserWarninglisting the ignored tools rather than failing silently. Coverage varies: Claude and OpenCode enforce all five canonical names; Copilot enforces onlybash/editcanonically (use a verbatim name for its other tools); Codex can only denyweb_search. - Denying
editorreadis best-effort: a model can still modify or read files through the shell, so also denybashwhen you need a hard file boundary.
from agent_shell.shell import AgentShell
from agent_shell.models.agent import AgentType
shell = AgentShell(agent_type=AgentType.OPENCODE)
response = await shell.execute(
cwd="/path/to/project",
prompt="Can you tell me about this project?",
model="anthropic/claude-sonnet-4-5",
)
print(response.response)
print(f"Session: {response.session_id}")
# Resume the conversation using the session_id
follow_up = await shell.execute(
cwd="/path/to/project",
prompt="Now refactor the auth module based on your findings",
model="anthropic/claude-sonnet-4-5",
session_id=response.session_id,
)Note: OpenCode's
runmode auto-approves all tools. Theallowed_toolsandeffortparameters are configured viaopencode.json, not CLI flags.
Register MCP servers for any supported agent through a unified API. All adapters use user-scope configuration so registrations persist across the agent's execute/stream calls.
from agent_shell.shell import AgentShell
from agent_shell.models.agent import AgentType, MCPServerSpec, MCPServerType
shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)
# Register a stdio MCP server (e.g. forgetful) before running an eval
await shell.add_mcp_server(MCPServerSpec(
name="forgetful",
type=MCPServerType.STDIO,
command="uvx",
args=["forgetful-ai"],
env={"FORGETFUL_API_KEY": "..."},
))
response = await shell.execute(
cwd="/path/to/project",
prompt="Recall any prior decisions about the auth module",
)
# Optional cleanup
await shell.remove_mcp_server("forgetful")For HTTP transport, pass url and headers instead of command/args/env:
await shell.add_mcp_server(MCPServerSpec(
name="remote",
type=MCPServerType.HTTP,
url="https://example.com/mcp",
headers={"Authorization": "Bearer ..."},
))add_mcp_server overwrites an existing server with the same name. remove_mcp_server warns rather than raises when the named server is not found. list_mcp_servers() works for OpenCode and Copilot CLI; for Claude Code it currently raises NotImplementedError.
Agent Shell uses Python's standard logging module. Configure the agent_shell logger to capture tool calls, session IDs, costs, and errors:
import logging
logging.getLogger("agent_shell").setLevel(logging.INFO)
logging.getLogger("agent_shell").addHandler(logging.StreamHandler())Set to DEBUG for raw JSON events and full command arguments.