Releases · ScottRBK/agent-shell

Release list

v0.1.16 Latest

Latest

ScottRBK released this 05 Jul 13:58

v0.1.16

9e9c553

Highlights

feat(claude-code): implement list_mcp_servers — reads configured MCP servers from ~/.claude.json and reconstructs MCPServerSpecs for both stdio and http/sse transports. Tolerant of a missing config file (returns []) and malformed entries (warns and skips rather than raising). Previously raised NotImplementedError.
fix(claude-code): correct add_mcp_server argument ordering — place --scope/--transport after the variadic --env/--header values so the server name can no longer be consumed as an extra option value.
docs: expand development notes and README coverage of MCP server support.

Full changelog: v0.1.15...v0.1.16

Assets 2

v0.1.15

ScottRBK released this 28 Jun 21:05

v0.1.15

4608fc3

Health checks for agent-type/model combinations

New: AgentShell.health_check(cwd, model=None, timeout=60.0) -> HealthCheckResult. Probes an agent+model combo with a trivial prompt and reports healthy: bool plus an exception: str | None reason. Catches bad model names, bad credentials, billing/quota, and transport failures.
Implemented once in a shared run_health_probe helper; added to the AgentAdapter Protocol with every adapter delegating to it. Health is determined from the normalized StreamEvent contract (a result event with content ok and no error), since exit codes are unreliable (opencode exits 0 on failure) and stderr placement is inconsistent (copilot/pi report bad models only on stderr).
Hardening: the Codex adapter now surfaces turn.failed reasons (bad model, usage limit) instead of the opaque Reading additional input from stdin... message.

Validated live across all five adapters (Claude Code, OpenCode, Copilot CLI, Codex, Pi) for both healthy and bad-model cases.

Assets 2

v0.1.14

ScottRBK released this 28 Jun 14:05

v0.1.14

e4ae276

Highlights

fix: harden adapter stream transport (UTF-8 split + concurrent stderr) (18d6113)
Resolves a UnicodeDecodeError when a multibyte UTF-8 character is split across a
64KB stream read boundary. All 5 adapters now use an incremental UTF-8 decoder
(codecs.getincrementaldecoder("utf-8")("replace")) and drain stderr concurrently
with stdout to avoid pipe-buffer deadlock. Covered by tests/unit/test_adapter_transport.py.

What's Changed

feat: add Pi coding agent adapter (fe3b657)
docs: added pi agent to readme (e4ae276)

Full Changelog: v0.1.13...v0.1.14

Assets 2

v0.1.13 — output_tokens cost measure

ScottRBK released this 26 Jun 19:48

v0.1.13

ee1fee9

output_tokens

Adds output_tokens to AgentResponse and StreamEvent: the billed
output-token count, reported consistently across all four adapters
(Claude Code, OpenCode, Copilot CLI, Codex).

Semantics

output_tokens is a cost measure — it includes reasoning tokens,
which are billed at the output rate.
Claude / Codex / Copilot report reasoning-inclusive output natively.
OpenCode subtracts reasoning out of tokens.output (reporting it in a
sibling tokens.reasoning), so its adapter adds it back for consistency.
OpenCode / Copilot accumulate per-step / per-message tokens within a
single stream() run; Claude / Codex read the single authoritative result.

Docs

New design doc: docs/development/total_token_count.md
Notes added to README.md and AGENTS.md

Full unit / integration / e2e coverage.

Assets 2

v0.1.12 — disallowed_tools deny-list

ScottRBK released this 16 Jun 21:28

v0.1.12

76e26d6

`disallowed_tools` — a deny-list across every adapter

New disallowed_tools parameter on execute() / stream(). Pass a canonical vocabulary — {bash, edit, read, web_search, web_fetch} — and Agent Shell translates it to each CLI's native tool names. Deny takes precedence over auto-approve wherever the backend supports it.

Highlights

Canonical → native translation, with verbatim passthrough for any non-canonical name (e.g. mcp__server__tool, Write, Copilot's view).
Fail-loud, never fail-open. Where a backend can't enforce a deny, the adapter emits a UserWarning listing the ignored tools instead of silently dropping it.

Per-backend coverage

Claude Code — all five canonical names (edit fans out to Edit,Write,NotebookEdit).
OpenCode — all five via process-scoped OPENCODE_PERMISSION, merged over any inherited policy (our deny wins); holds even under --dangerously-skip-permissions. An inherited bare-string "deny" is promoted to the {"*":"deny"} form OpenCode actually enforces, even with no deny-list passed. $PWD is pinned to cwd.
Copilot CLI — bash/edit canonically; other tools via verbatim names.
Codex — only web_search is deniable; every other deny warns. Also warns (fail-loud) when web_search is denied under model_reasoning_effort="minimal", where Codex ignores it (openai/codex#5002).

Notes

Denying edit/read is best-effort: a model can still touch files via the shell, so also deny bash for a hard boundary.
311 unit + integration tests; e2e enforcement guards for OpenCode (bash deny holds under skip-permissions) and Codex (web_search disable config accepted).

Docs: README "Restricting tools" section + docs/development/disabled_tools.md.

Assets 2

v0.1.11

ScottSDWorx released this 12 Jun 08:35

v0.1.11

985323c

Fixes

OpenCode adapter: non-interactive runs no longer silently abort (#opencode)

Two bugs that killed opencode run sessions spawned from a different directory than the launcher's:

auto_approve was accepted but discarded. opencode run auto-rejects permission prompts in non-interactive mode, so the first permission ask (e.g. reading a file outside the project directory) aborted the agent loop with no error event — the stream just ended. auto_approve=True now maps to --dangerously-skip-permissions, mirroring the Copilot adapter's --allow-all-tools.
Stale $PWD misplaced the project root. opencode resolves its project directory (and with it the permission boundary) from $PWD when set; the spawned process inherited the launcher's PWD, overriding the cwd= passed to create_subprocess_exec. The child environment now pins PWD to the resolved cwd.

Both surfaced in a real workload where parallel encode agents were spawned with per-repo working directories: every run died ~7 minutes in with exit 0 and no diagnostics.

Tests

New tests/unit/test_opencode_spawn.py: asserts the skip-permissions flag follows auto_approve, PWD is pinned to cwd while the rest of the environment flows through, and cwd= is still applied. 146 unit tests passing.

Assets 2

v0.1.10

ScottRBK released this 10 May 12:09

v0.1.10

52bf034

New Features

Codex CLI adapter — AgentType.CODEX is now a fully-supported fourth adapter alongside Claude Code, OpenCode, and Copilot CLI. Drives OpenAI's @openai/codex CLI (verified against v0.130.0) via the same AgentShell API.
- Streaming: subprocess + NDJSON loop parses thread.started / item.completed{agent_message,command_execution} / turn.completed events into the standard StreamEvent shape.
- Session resume: passing session_id switches to codex exec resume <UUID>; sandbox/approval flags are correctly dropped on the resume path (codex rejects them).
- Sandboxing: non-resume calls run with --sandbox workspace-write; auto_approve=True (default) adds --dangerously-bypass-approvals-and-sandbox to match the Copilot adapter's UX.
- Reasoning effort: effort parameter is forwarded as -c model_reasoning_effort='"<value>"'.
- MCP delegation: add_mcp_server / remove_mcp_server / list_mcp_servers shell out to codex mcp add/remove/list --json rather than touching ~/.codex/config.toml directly. transport.type == "streamable_http" round-trips to MCPServerType.HTTP.

Semantics

include_thinking=True emits a one-shot UserWarning per adapter instance — Codex --json does not stream reasoning items, so the flag has no effect.
allowed_tools non-empty emits a one-shot UserWarning per adapter instance — Codex CLI has no per-call allowed-tools mechanism.
MCP HTTP headers emits a UserWarning when present — codex mcp add only supports --bearer-token-env-var, not arbitrary headers.
cost = 0.0, duration = 0.0 on AgentResponse — Codex events carry no cost, and per-turn duration is not synthesized (matches Copilot adapter's behavior).
Remove warns rather than raises if the named server is not in Codex config (mirrors existing adapters).
List tolerates unknown transport types — a transport.type we don't recognize is skipped with a UserWarning naming the offender; the rest of the list returns intact.

Tests

Unit + integration: 246 passing (was 217). New: parse-event unit tests, warning unit tests, cancel mirror, full integration suite covering stream/tool-use/command-construction/session-resume/stderr-error/malformed-JSON-tolerance, plus MCP CLI delegation tests.
E2E: 3 new tests against the real codex CLI (gated on -m e2e), pinned to gpt-5.4-mini to keep token spend low.

Known gaps

bearer_token_env_var and HTTP headers from codex mcp list are not round-tripped through MCPServerSpec (schema gap, tracked as a follow-up).

Assets 2

v0.1.9

ScottRBK released this 04 May 14:27

v0.1.9

4470195

New Features

Unified MCP server configuration API — AgentShell now exposes add_mcp_server, remove_mcp_server, and list_mcp_servers, implemented across all three supported adapters. Callers no longer need per-agent dispatch to wire MCP servers. (#2, #4)
- Claude Code: shells out to claude mcp add/remove --scope user, translating MCPServerSpec into -e KEY=VALUE / --header flags.
- OpenCode: direct JSON write to ~/.config/opencode/opencode.json under the mcp key.
- Copilot CLI: direct JSON write to ~/.copilot/mcp-config.json under the mcpServers key.
MCPServerSpec model — dataclass with __post_init__ validation that enforces exactly-one-of semantics between STDIO (command/args/env) and HTTP (url/headers) fields. MCPServerType StrEnum with STDIO and HTTP variants.

Semantics

Add is idempotent — existing entry with the same name is overwritten (Claude Code does a pre-remove + add; file-based adapters overwrite the JSON key).
Remove warns rather than raises if the named server is not found (UserWarning).
List tolerates malformed entries — a single bad entry in the user-editable config (missing command, missing url, non-object value) no longer aborts the whole listing. The bad entry is skipped with a UserWarning naming the offender, and the rest of the list returns intact.

Notes

All adapters write user scope to align with Copilot CLI, which only supports user scope.
list_mcp_servers for Claude Code raises NotImplementedError — tracked in #3.

Tests

198 passing (was 192). 6 new unit tests for `MCPServerSpec` validation, 4 for `AgentShell` passthroughs, plus integration coverage per adapter (mocked subprocess for Claude Code, real file I/O against `monkeypatch HOME=tmp_path` for OpenCode/Copilot).

Assets 2

v0.1.8

ScottRBK released this 20 Apr 23:14

v0.1.8

8ea00c1

New Features

GitHub Copilot CLI adapter — AgentType.COPILOT_CLI now joins Claude Code and OpenCode as a supported agent. Maps Copilot's NDJSON event stream (reasoning_delta, message_delta, result, tool requests) onto the shared StreamEvent model. Supports session resume via --resume, optional reasoning summaries via --enable-reasoning-summaries, and model/effort flags. (#1)

Enhancements

AgentResponse.duration — the duration field (from the underlying result event) is now surfaced on AgentResponse, not just StreamEvent. Populated for Claude Code and Copilot; OpenCode currently reports 0.0 pending upstream duration data.

Build

Switched to hatch-vcs for dynamic versioning — the package version is now derived from the git tag, no more manual pyproject.toml bumps per release.

Notes

Copilot CLI doesn't expose pricing data, so AgentResponse.cost will always be 0.0 for that adapter.

Assets 2

v0.1.7

ScottRBK released this 09 Apr 14:16

v0.1.7

482ce28

Bug Fixes

Fixed orphaned child processes on Ctrl+C — asyncio.run() converts SIGINT into CancelledError, not KeyboardInterrupt. The except KeyboardInterrupt handlers in AgentShell.execute() and stream() were effectively dead code, meaning cancel() was never called and child process groups (isolated by os.setsid) survived as orphans pinning CPU indefinitely.

Fix

shell.py: Exception handlers now catch both KeyboardInterrupt and asyncio.CancelledError
New process_cleanup.py module: module-level process group registry with atexit handler as a safety net — kills any registered child process groups during interpreter shutdown
Both adapters register child PIDs on subprocess creation and unregister on normal completion or explicit cancel()

Other Changes

Added agent skills for invoking agent-shell from Claude Code / OpenCode

Assets 2

Releases: ScottRBK/agent-shell

Release list

v0.1.16

Highlights

Uh oh!

v0.1.15

Health checks for agent-type/model combinations

Uh oh!

v0.1.14

Highlights

What's Changed

Uh oh!

v0.1.13 — output_tokens cost measure

output_tokens

Semantics

Docs

Uh oh!

v0.1.12 — disallowed_tools deny-list

disallowed_tools — a deny-list across every adapter

Highlights

Per-backend coverage

Notes

Uh oh!

v0.1.11

Fixes

OpenCode adapter: non-interactive runs no longer silently abort (#opencode)

Tests

Uh oh!

v0.1.10

New Features

Semantics

Tests

Known gaps

Uh oh!

v0.1.9

New Features

Semantics

Notes

Tests

Uh oh!

v0.1.8

New Features

Enhancements

Build

Notes

Uh oh!

v0.1.7

Bug Fixes

Fix

Other Changes

Uh oh!

`disallowed_tools` — a deny-list across every adapter