Skip to content

Releases: ScottRBK/agent-shell

v0.1.16

Choose a tag to compare

@ScottRBK ScottRBK released this 05 Jul 13:58

Highlights

  • feat(claude-code): implement list_mcp_servers — reads configured MCP servers from ~/.claude.json and reconstructs MCPServerSpecs for both stdio and http/sse transports. Tolerant of a missing config file (returns []) and malformed entries (warns and skips rather than raising). Previously raised NotImplementedError.
  • fix(claude-code): correct add_mcp_server argument ordering — place --scope/--transport after the variadic --env/--header values so the server name can no longer be consumed as an extra option value.
  • docs: expand development notes and README coverage of MCP server support.

Full changelog: v0.1.15...v0.1.16

v0.1.15

Choose a tag to compare

@ScottRBK ScottRBK released this 28 Jun 21:05

Health checks for agent-type/model combinations

  • New: AgentShell.health_check(cwd, model=None, timeout=60.0) -> HealthCheckResult. Probes an agent+model combo with a trivial prompt and reports healthy: bool plus an exception: str | None reason. Catches bad model names, bad credentials, billing/quota, and transport failures.
  • Implemented once in a shared run_health_probe helper; added to the AgentAdapter Protocol with every adapter delegating to it. Health is determined from the normalized StreamEvent contract (a result event with content ok and no error), since exit codes are unreliable (opencode exits 0 on failure) and stderr placement is inconsistent (copilot/pi report bad models only on stderr).
  • Hardening: the Codex adapter now surfaces turn.failed reasons (bad model, usage limit) instead of the opaque Reading additional input from stdin... message.

Validated live across all five adapters (Claude Code, OpenCode, Copilot CLI, Codex, Pi) for both healthy and bad-model cases.

v0.1.14

Choose a tag to compare

@ScottRBK ScottRBK released this 28 Jun 14:05

Highlights

  • fix: harden adapter stream transport (UTF-8 split + concurrent stderr) (18d6113)
    Resolves a UnicodeDecodeError when a multibyte UTF-8 character is split across a
    64KB stream read boundary. All 5 adapters now use an incremental UTF-8 decoder
    (codecs.getincrementaldecoder("utf-8")("replace")) and drain stderr concurrently
    with stdout to avoid pipe-buffer deadlock. Covered by tests/unit/test_adapter_transport.py.

What's Changed

  • feat: add Pi coding agent adapter (fe3b657)
  • docs: added pi agent to readme (e4ae276)

Full Changelog: v0.1.13...v0.1.14

v0.1.13 — output_tokens cost measure

Choose a tag to compare

@ScottRBK ScottRBK released this 26 Jun 19:48

output_tokens

Adds output_tokens to AgentResponse and StreamEvent: the billed
output-token count
, reported consistently across all four adapters
(Claude Code, OpenCode, Copilot CLI, Codex).

Semantics

  • output_tokens is a cost measure — it includes reasoning tokens,
    which are billed at the output rate.
  • Claude / Codex / Copilot report reasoning-inclusive output natively.
  • OpenCode subtracts reasoning out of tokens.output (reporting it in a
    sibling tokens.reasoning), so its adapter adds it back for consistency.
  • OpenCode / Copilot accumulate per-step / per-message tokens within a
    single stream() run; Claude / Codex read the single authoritative result.

Docs

  • New design doc: docs/development/total_token_count.md
  • Notes added to README.md and AGENTS.md

Full unit / integration / e2e coverage.

v0.1.12 — disallowed_tools deny-list

Choose a tag to compare

@ScottRBK ScottRBK released this 16 Jun 21:28

disallowed_tools — a deny-list across every adapter

New disallowed_tools parameter on execute() / stream(). Pass a canonical vocabulary — {bash, edit, read, web_search, web_fetch} — and Agent Shell translates it to each CLI's native tool names. Deny takes precedence over auto-approve wherever the backend supports it.

Highlights

  • Canonical → native translation, with verbatim passthrough for any non-canonical name (e.g. mcp__server__tool, Write, Copilot's view).
  • Fail-loud, never fail-open. Where a backend can't enforce a deny, the adapter emits a UserWarning listing the ignored tools instead of silently dropping it.

Per-backend coverage

  • Claude Code — all five canonical names (edit fans out to Edit,Write,NotebookEdit).
  • OpenCode — all five via process-scoped OPENCODE_PERMISSION, merged over any inherited policy (our deny wins); holds even under --dangerously-skip-permissions. An inherited bare-string "deny" is promoted to the {"*":"deny"} form OpenCode actually enforces, even with no deny-list passed. $PWD is pinned to cwd.
  • Copilot CLIbash/edit canonically; other tools via verbatim names.
  • Codex — only web_search is deniable; every other deny warns. Also warns (fail-loud) when web_search is denied under model_reasoning_effort="minimal", where Codex ignores it (openai/codex#5002).

Notes

  • Denying edit/read is best-effort: a model can still touch files via the shell, so also deny bash for a hard boundary.
  • 311 unit + integration tests; e2e enforcement guards for OpenCode (bash deny holds under skip-permissions) and Codex (web_search disable config accepted).

Docs: README "Restricting tools" section + docs/development/disabled_tools.md.

v0.1.11

Choose a tag to compare

@ScottSDWorx ScottSDWorx released this 12 Jun 08:35

Fixes

OpenCode adapter: non-interactive runs no longer silently abort (#opencode)

Two bugs that killed opencode run sessions spawned from a different directory than the launcher's:

  • auto_approve was accepted but discarded. opencode run auto-rejects permission prompts in non-interactive mode, so the first permission ask (e.g. reading a file outside the project directory) aborted the agent loop with no error event — the stream just ended. auto_approve=True now maps to --dangerously-skip-permissions, mirroring the Copilot adapter's --allow-all-tools.

  • Stale $PWD misplaced the project root. opencode resolves its project directory (and with it the permission boundary) from $PWD when set; the spawned process inherited the launcher's PWD, overriding the cwd= passed to create_subprocess_exec. The child environment now pins PWD to the resolved cwd.

Both surfaced in a real workload where parallel encode agents were spawned with per-repo working directories: every run died ~7 minutes in with exit 0 and no diagnostics.

Tests

  • New tests/unit/test_opencode_spawn.py: asserts the skip-permissions flag follows auto_approve, PWD is pinned to cwd while the rest of the environment flows through, and cwd= is still applied. 146 unit tests passing.

v0.1.10

Choose a tag to compare

@ScottRBK ScottRBK released this 10 May 12:09

New Features

  • Codex CLI adapterAgentType.CODEX is now a fully-supported fourth adapter alongside Claude Code, OpenCode, and Copilot CLI. Drives OpenAI's @openai/codex CLI (verified against v0.130.0) via the same AgentShell API.
    • Streaming: subprocess + NDJSON loop parses thread.started / item.completed{agent_message,command_execution} / turn.completed events into the standard StreamEvent shape.
    • Session resume: passing session_id switches to codex exec resume <UUID>; sandbox/approval flags are correctly dropped on the resume path (codex rejects them).
    • Sandboxing: non-resume calls run with --sandbox workspace-write; auto_approve=True (default) adds --dangerously-bypass-approvals-and-sandbox to match the Copilot adapter's UX.
    • Reasoning effort: effort parameter is forwarded as -c model_reasoning_effort='"<value>"'.
    • MCP delegation: add_mcp_server / remove_mcp_server / list_mcp_servers shell out to codex mcp add/remove/list --json rather than touching ~/.codex/config.toml directly. transport.type == "streamable_http" round-trips to MCPServerType.HTTP.

Semantics

  • include_thinking=True emits a one-shot UserWarning per adapter instance — Codex --json does not stream reasoning items, so the flag has no effect.
  • allowed_tools non-empty emits a one-shot UserWarning per adapter instance — Codex CLI has no per-call allowed-tools mechanism.
  • MCP HTTP headers emits a UserWarning when present — codex mcp add only supports --bearer-token-env-var, not arbitrary headers.
  • cost = 0.0, duration = 0.0 on AgentResponse — Codex events carry no cost, and per-turn duration is not synthesized (matches Copilot adapter's behavior).
  • Remove warns rather than raises if the named server is not in Codex config (mirrors existing adapters).
  • List tolerates unknown transport types — a transport.type we don't recognize is skipped with a UserWarning naming the offender; the rest of the list returns intact.

Tests

  • Unit + integration: 246 passing (was 217). New: parse-event unit tests, warning unit tests, cancel mirror, full integration suite covering stream/tool-use/command-construction/session-resume/stderr-error/malformed-JSON-tolerance, plus MCP CLI delegation tests.
  • E2E: 3 new tests against the real codex CLI (gated on -m e2e), pinned to gpt-5.4-mini to keep token spend low.

Known gaps

  • bearer_token_env_var and HTTP headers from codex mcp list are not round-tripped through MCPServerSpec (schema gap, tracked as a follow-up).

v0.1.9

Choose a tag to compare

@ScottRBK ScottRBK released this 04 May 14:27
4470195

New Features

  • Unified MCP server configuration APIAgentShell now exposes add_mcp_server, remove_mcp_server, and list_mcp_servers, implemented across all three supported adapters. Callers no longer need per-agent dispatch to wire MCP servers. (#2, #4)
    • Claude Code: shells out to claude mcp add/remove --scope user, translating MCPServerSpec into -e KEY=VALUE / --header flags.
    • OpenCode: direct JSON write to ~/.config/opencode/opencode.json under the mcp key.
    • Copilot CLI: direct JSON write to ~/.copilot/mcp-config.json under the mcpServers key.
  • MCPServerSpec model — dataclass with __post_init__ validation that enforces exactly-one-of semantics between STDIO (command/args/env) and HTTP (url/headers) fields. MCPServerType StrEnum with STDIO and HTTP variants.

Semantics

  • Add is idempotent — existing entry with the same name is overwritten (Claude Code does a pre-remove + add; file-based adapters overwrite the JSON key).
  • Remove warns rather than raises if the named server is not found (UserWarning).
  • List tolerates malformed entries — a single bad entry in the user-editable config (missing command, missing url, non-object value) no longer aborts the whole listing. The bad entry is skipped with a UserWarning naming the offender, and the rest of the list returns intact.

Notes

  • All adapters write user scope to align with Copilot CLI, which only supports user scope.
  • list_mcp_servers for Claude Code raises NotImplementedError — tracked in #3.

Tests

  • 198 passing (was 192). 6 new unit tests for `MCPServerSpec` validation, 4 for `AgentShell` passthroughs, plus integration coverage per adapter (mocked subprocess for Claude Code, real file I/O against `monkeypatch HOME=tmp_path` for OpenCode/Copilot).

v0.1.8

Choose a tag to compare

@ScottRBK ScottRBK released this 20 Apr 23:14
8ea00c1

New Features

  • GitHub Copilot CLI adapterAgentType.COPILOT_CLI now joins Claude Code and OpenCode as a supported agent. Maps Copilot's NDJSON event stream (reasoning_delta, message_delta, result, tool requests) onto the shared StreamEvent model. Supports session resume via --resume, optional reasoning summaries via --enable-reasoning-summaries, and model/effort flags. (#1)

Enhancements

  • AgentResponse.duration — the duration field (from the underlying result event) is now surfaced on AgentResponse, not just StreamEvent. Populated for Claude Code and Copilot; OpenCode currently reports 0.0 pending upstream duration data.

Build

  • Switched to hatch-vcs for dynamic versioning — the package version is now derived from the git tag, no more manual pyproject.toml bumps per release.

Notes

  • Copilot CLI doesn't expose pricing data, so AgentResponse.cost will always be 0.0 for that adapter.

v0.1.7

Choose a tag to compare

@ScottRBK ScottRBK released this 09 Apr 14:16

Bug Fixes

  • Fixed orphaned child processes on Ctrl+Casyncio.run() converts SIGINT into CancelledError, not KeyboardInterrupt. The except KeyboardInterrupt handlers in AgentShell.execute() and stream() were effectively dead code, meaning cancel() was never called and child process groups (isolated by os.setsid) survived as orphans pinning CPU indefinitely.

Fix

  • shell.py: Exception handlers now catch both KeyboardInterrupt and asyncio.CancelledError
  • New process_cleanup.py module: module-level process group registry with atexit handler as a safety net — kills any registered child process groups during interpreter shutdown
  • Both adapters register child PIDs on subprocess creation and unregister on normal completion or explicit cancel()

Other Changes

  • Added agent skills for invoking agent-shell from Claude Code / OpenCode