Releases: ScottRBK/agent-shell
Release list
v0.1.16
Highlights
- feat(claude-code): implement
list_mcp_servers— reads configured MCP servers from~/.claude.jsonand reconstructsMCPServerSpecs for bothstdioandhttp/ssetransports. Tolerant of a missing config file (returns[]) and malformed entries (warns and skips rather than raising). Previously raisedNotImplementedError. - fix(claude-code): correct
add_mcp_serverargument ordering — place--scope/--transportafter the variadic--env/--headervalues so the server name can no longer be consumed as an extra option value. - docs: expand development notes and README coverage of MCP server support.
Full changelog: v0.1.15...v0.1.16
v0.1.15
Health checks for agent-type/model combinations
- New:
AgentShell.health_check(cwd, model=None, timeout=60.0) -> HealthCheckResult. Probes an agent+model combo with a trivial prompt and reportshealthy: boolplus anexception: str | Nonereason. Catches bad model names, bad credentials, billing/quota, and transport failures. - Implemented once in a shared
run_health_probehelper; added to theAgentAdapterProtocol with every adapter delegating to it. Health is determined from the normalizedStreamEventcontract (aresultevent with contentokand noerror), since exit codes are unreliable (opencode exits 0 on failure) and stderr placement is inconsistent (copilot/pi report bad models only on stderr). - Hardening: the Codex adapter now surfaces
turn.failedreasons (bad model, usage limit) instead of the opaqueReading additional input from stdin...message.
Validated live across all five adapters (Claude Code, OpenCode, Copilot CLI, Codex, Pi) for both healthy and bad-model cases.
v0.1.14
Highlights
- fix: harden adapter stream transport (UTF-8 split + concurrent stderr) (18d6113)
Resolves aUnicodeDecodeErrorwhen a multibyte UTF-8 character is split across a
64KB stream read boundary. All 5 adapters now use an incremental UTF-8 decoder
(codecs.getincrementaldecoder("utf-8")("replace")) and drain stderr concurrently
with stdout to avoid pipe-buffer deadlock. Covered bytests/unit/test_adapter_transport.py.
What's Changed
Full Changelog: v0.1.13...v0.1.14
v0.1.13 — output_tokens cost measure
output_tokens
Adds output_tokens to AgentResponse and StreamEvent: the billed
output-token count, reported consistently across all four adapters
(Claude Code, OpenCode, Copilot CLI, Codex).
Semantics
output_tokensis a cost measure — it includes reasoning tokens,
which are billed at the output rate.- Claude / Codex / Copilot report reasoning-inclusive output natively.
- OpenCode subtracts reasoning out of
tokens.output(reporting it in a
siblingtokens.reasoning), so its adapter adds it back for consistency. - OpenCode / Copilot accumulate per-step / per-message tokens within a
singlestream()run; Claude / Codex read the single authoritative result.
Docs
- New design doc:
docs/development/total_token_count.md - Notes added to
README.mdandAGENTS.md
Full unit / integration / e2e coverage.
v0.1.12 — disallowed_tools deny-list
disallowed_tools — a deny-list across every adapter
New disallowed_tools parameter on execute() / stream(). Pass a canonical vocabulary — {bash, edit, read, web_search, web_fetch} — and Agent Shell translates it to each CLI's native tool names. Deny takes precedence over auto-approve wherever the backend supports it.
Highlights
- Canonical → native translation, with verbatim passthrough for any non-canonical name (e.g.
mcp__server__tool,Write, Copilot'sview). - Fail-loud, never fail-open. Where a backend can't enforce a deny, the adapter emits a
UserWarninglisting the ignored tools instead of silently dropping it.
Per-backend coverage
- Claude Code — all five canonical names (
editfans out toEdit,Write,NotebookEdit). - OpenCode — all five via process-scoped
OPENCODE_PERMISSION, merged over any inherited policy (our deny wins); holds even under--dangerously-skip-permissions. An inherited bare-string"deny"is promoted to the{"*":"deny"}form OpenCode actually enforces, even with no deny-list passed.$PWDis pinned tocwd. - Copilot CLI —
bash/editcanonically; other tools via verbatim names. - Codex — only
web_searchis deniable; every other deny warns. Also warns (fail-loud) whenweb_searchis denied undermodel_reasoning_effort="minimal", where Codex ignores it (openai/codex#5002).
Notes
- Denying
edit/readis best-effort: a model can still touch files via the shell, so also denybashfor a hard boundary. - 311 unit + integration tests; e2e enforcement guards for OpenCode (bash deny holds under skip-permissions) and Codex (web_search disable config accepted).
Docs: README "Restricting tools" section + docs/development/disabled_tools.md.
v0.1.11
Fixes
OpenCode adapter: non-interactive runs no longer silently abort (#opencode)
Two bugs that killed opencode run sessions spawned from a different directory than the launcher's:
-
auto_approvewas accepted but discarded.opencode runauto-rejects permission prompts in non-interactive mode, so the first permission ask (e.g. reading a file outside the project directory) aborted the agent loop with no error event — the stream just ended.auto_approve=Truenow maps to--dangerously-skip-permissions, mirroring the Copilot adapter's--allow-all-tools. -
Stale
$PWDmisplaced the project root. opencode resolves its project directory (and with it the permission boundary) from$PWDwhen set; the spawned process inherited the launcher's PWD, overriding thecwd=passed tocreate_subprocess_exec. The child environment now pinsPWDto the resolved cwd.
Both surfaced in a real workload where parallel encode agents were spawned with per-repo working directories: every run died ~7 minutes in with exit 0 and no diagnostics.
Tests
- New
tests/unit/test_opencode_spawn.py: asserts the skip-permissions flag followsauto_approve,PWDis pinned to cwd while the rest of the environment flows through, andcwd=is still applied. 146 unit tests passing.
v0.1.10
New Features
- Codex CLI adapter —
AgentType.CODEXis now a fully-supported fourth adapter alongside Claude Code, OpenCode, and Copilot CLI. Drives OpenAI's@openai/codexCLI (verified against v0.130.0) via the sameAgentShellAPI.- Streaming: subprocess + NDJSON loop parses
thread.started/item.completed{agent_message,command_execution}/turn.completedevents into the standardStreamEventshape. - Session resume: passing
session_idswitches tocodex exec resume <UUID>; sandbox/approval flags are correctly dropped on the resume path (codex rejects them). - Sandboxing: non-resume calls run with
--sandbox workspace-write;auto_approve=True(default) adds--dangerously-bypass-approvals-and-sandboxto match the Copilot adapter's UX. - Reasoning effort:
effortparameter is forwarded as-c model_reasoning_effort='"<value>"'. - MCP delegation:
add_mcp_server/remove_mcp_server/list_mcp_serversshell out tocodex mcp add/remove/list --jsonrather than touching~/.codex/config.tomldirectly.transport.type == "streamable_http"round-trips toMCPServerType.HTTP.
- Streaming: subprocess + NDJSON loop parses
Semantics
include_thinking=Trueemits a one-shotUserWarningper adapter instance — Codex--jsondoes not stream reasoning items, so the flag has no effect.allowed_toolsnon-empty emits a one-shotUserWarningper adapter instance — Codex CLI has no per-call allowed-tools mechanism.- MCP HTTP
headersemits aUserWarningwhen present —codex mcp addonly supports--bearer-token-env-var, not arbitrary headers. cost = 0.0,duration = 0.0onAgentResponse— Codex events carry no cost, and per-turn duration is not synthesized (matches Copilot adapter's behavior).- Remove warns rather than raises if the named server is not in Codex config (mirrors existing adapters).
- List tolerates unknown transport types — a
transport.typewe don't recognize is skipped with aUserWarningnaming the offender; the rest of the list returns intact.
Tests
- Unit + integration: 246 passing (was 217). New: parse-event unit tests, warning unit tests, cancel mirror, full integration suite covering stream/tool-use/command-construction/session-resume/stderr-error/malformed-JSON-tolerance, plus MCP CLI delegation tests.
- E2E: 3 new tests against the real
codexCLI (gated on-m e2e), pinned togpt-5.4-minito keep token spend low.
Known gaps
bearer_token_env_varand HTTP headers fromcodex mcp listare not round-tripped throughMCPServerSpec(schema gap, tracked as a follow-up).
v0.1.9
New Features
- Unified MCP server configuration API —
AgentShellnow exposesadd_mcp_server,remove_mcp_server, andlist_mcp_servers, implemented across all three supported adapters. Callers no longer need per-agent dispatch to wire MCP servers. (#2, #4)- Claude Code: shells out to
claude mcp add/remove --scope user, translatingMCPServerSpecinto-e KEY=VALUE/--headerflags. - OpenCode: direct JSON write to
~/.config/opencode/opencode.jsonunder themcpkey. - Copilot CLI: direct JSON write to
~/.copilot/mcp-config.jsonunder themcpServerskey.
- Claude Code: shells out to
MCPServerSpecmodel — dataclass with__post_init__validation that enforces exactly-one-of semantics between STDIO (command/args/env) and HTTP (url/headers) fields.MCPServerTypeStrEnum withSTDIOandHTTPvariants.
Semantics
- Add is idempotent — existing entry with the same name is overwritten (Claude Code does a pre-remove + add; file-based adapters overwrite the JSON key).
- Remove warns rather than raises if the named server is not found (
UserWarning). - List tolerates malformed entries — a single bad entry in the user-editable config (missing
command, missingurl, non-object value) no longer aborts the whole listing. The bad entry is skipped with aUserWarningnaming the offender, and the rest of the list returns intact.
Notes
- All adapters write user scope to align with Copilot CLI, which only supports user scope.
list_mcp_serversfor Claude Code raisesNotImplementedError— tracked in #3.
Tests
- 198 passing (was 192). 6 new unit tests for `MCPServerSpec` validation, 4 for `AgentShell` passthroughs, plus integration coverage per adapter (mocked subprocess for Claude Code, real file I/O against `monkeypatch HOME=tmp_path` for OpenCode/Copilot).
v0.1.8
New Features
- GitHub Copilot CLI adapter —
AgentType.COPILOT_CLInow joins Claude Code and OpenCode as a supported agent. Maps Copilot's NDJSON event stream (reasoning_delta,message_delta,result, tool requests) onto the sharedStreamEventmodel. Supports session resume via--resume, optional reasoning summaries via--enable-reasoning-summaries, and model/effort flags. (#1)
Enhancements
AgentResponse.duration— thedurationfield (from the underlyingresultevent) is now surfaced onAgentResponse, not justStreamEvent. Populated for Claude Code and Copilot; OpenCode currently reports0.0pending upstream duration data.
Build
- Switched to
hatch-vcsfor dynamic versioning — the package version is now derived from the git tag, no more manualpyproject.tomlbumps per release.
Notes
- Copilot CLI doesn't expose pricing data, so
AgentResponse.costwill always be0.0for that adapter.
v0.1.7
Bug Fixes
- Fixed orphaned child processes on Ctrl+C —
asyncio.run()converts SIGINT intoCancelledError, notKeyboardInterrupt. Theexcept KeyboardInterrupthandlers inAgentShell.execute()andstream()were effectively dead code, meaningcancel()was never called and child process groups (isolated byos.setsid) survived as orphans pinning CPU indefinitely.
Fix
shell.py: Exception handlers now catch bothKeyboardInterruptandasyncio.CancelledError- New
process_cleanup.pymodule: module-level process group registry withatexithandler as a safety net — kills any registered child process groups during interpreter shutdown - Both adapters register child PIDs on subprocess creation and unregister on normal completion or explicit
cancel()
Other Changes
- Added agent skills for invoking agent-shell from Claude Code / OpenCode