shell-agent-v2

macOS local-first chat & agent tool with interactive data analysis.

Successor to shell-agent v0.7.x, redesigned with session-scoped analysis, an Idle/Busy agent execution model, and hybrid LLM backend (Local + Vertex AI).

Features

Interactive data analysis — dialogue-driven exploration with embedded DuckDB. Every analysis tool (load_data, query_sql, describe_data, save_query, analyze_data, etc.) is exposed to the LLM every round so the model can plan multi-step workflows up front instead of discovering tools round-by-round. See agent-tool-visibility.md. Set tools.hide_analysis_tools_until_data_loaded: true in config.json to restore the pre-v0.1.21 hide-until-load behaviour (opt-in for weaker local backends).
Session-scoped analysis — each session owns its own database, no cross-session state leakage
Agent execution model — Idle/Busy states with UI lockout during processing
Hybrid LLM backend — Local LLM (LM Studio) and Vertex AI (Gemini), switchable at runtime via /model
Multi-profile LLM (v0.12.0) — define multiple named (Local, VertexAI) profiles for billing attribution / GCP project isolation. Each session references one profile (persisted in session.json). Edit profiles in Settings → LLM Profiles (live-apply on blur, no Save button); switch a session via the status-bar pill popover or /profile <name> chat command. See ADR-0016.
Per-backend context budgets — ContextBudget configured separately for Local and Vertex inside each profile (Settings → LLM Profiles → pick profile → expand the Local / Vertex AI sections).
Memory model (v0.2.0 rewrite) — four facilities work together. Records (immutable conversation history) live in chat.json. Session Memory auto-extracts fact / context per session. Findings are session-scoped data-analysis discoveries surfaced in a dedicated chat-pane panel. Global Memory holds preference / decision across sessions. Auto-extraction routes by category; "Pin to Global Memory" is the explicit user action that promotes a Session Memory entry or a Finding into the cross-session pool. Context-budget enforcement is non-destructive (internal/contextbuild summary cache). See memory-model.md.
Container sandbox (opt-in) — eight sandbox-* tools that execute shell or Python in a per-session podman/docker container with /work mounted from the session's data dir, MITL-gated, network-off by default. Includes sandbox_load_into_analysis (CSV/JSON in /work → DuckDB) and sandbox_export_sql (SQL query → CSV in /work) so query results flow between analysis and Python without round-tripping through chat. See sandbox-execution.md for the macOS setup guide.
Findings panel — chat-pane disclosure with severity filter, free-text search, bulk delete, real-time refresh, and a Pin-to-Global-Memory star button per row.
Shell script Tool Calling — register scripts as tools with MITL approval for write/execute. Per-tool @timeout: N header (seconds) overrides the 30-second default for legitimately long-running tools — see agent-tool-visibility.md and tool-execution-timeout.md. Scripts can write to $SHELL_AGENT_WORK_DIR (the same physical directory the sandbox bind-mounts at /work); use the built-in register_object tool to surface the artefact in chat as object:<ID> — see work-dir-shell-bridge.md.
MITL approval, end-to-end — every tool source (analysis / shell / sandbox / MCP) routes through one gate. Destructive analysis tools (load_data, reset_analysis, promote_finding) and SQL/analyze prompts are MITL-by-default; metadata reads (describe_data, list_tables, etc.) are not. Override per-tool from Settings → Tools — the toggle reflects the actual dispatcher default. See security-hardening-2.md.
Bundled scripts — file_info, preview_file, list_files, weather, get_location, write_note. Auto-installed on first launch via go:embed; user customizations are preserved. Optional shell-tool examples wrapping companion CLIs (not auto-installed — copy from examples/shell_tools/ into your tool dir): web-search (gem-search), generate-image (gem-image), search-kb-gem (gem-rag — Vertex AI Gemini RAG), search-kb-lite (lite-rag — local LLM RAG). KB examples require the corresponding CLI installed and a pre-indexed corpus.
Tool-call timeline — every tool start/end appears inline in the chat as a transient pill, in addition to the existing status-bar indicator. The pill is restored on session reload as a compact tool-name + status (success / error) bubble; live argument and result text remain ephemeral. See tool-event-restore.md.
Background task visibility — when the agent kicks off post-response work (title generation, memory extraction), a small badge appears in the input-status-bar naming what's running. The input field stays disabled until those tasks finish, so the next user message can't race them and lose extracted facts. See background-task-indicator.md.
MCP support — via mcp-guardian stdio proxy
Multimodal — image input via drag & drop, paste, or file picker
Per-session Data panel — collapsible disclosure at the top of the chat pane showing the current session's objects (images / reports / blobs as cards with thumbnails), DuckDB tables (click for a 20-row preview), and sandbox /work files. Click an image for the lightbox, a report for the markdown viewer, or a CSV / text blob for an in-app preview — CSV / TSV render as an HTML table, other text MIMEs (JSON, plain text, HTML, etc.) drop to a fixed-width pre. Bulk-select and delete with separate Yes / No confirmation.
Bulk select / delete — Findings, Global Memory, and Session Memory entries can be checked individually or all-at-once, with two-click confirm.
Private sessions (v0.3.0) — + New Private Chat in the sidebar bottom-nav opts a session out of cross-session Global Memory promotion. preference / decision facts are dropped at the extraction layer; Pin to Global Memory is hidden in the UI and rejected server-side. A 🔒 indicator appears on the sidebar row and as a chat-pane banner. The privacy flag is fixed at session creation and persisted in chat.json (omitempty keeps legacy sessions loading as non-private). See privacy-controls.md.
Log privacy controls (v0.3.0) — app.log defaults to info level so prompt / response / tool-argument bodies stop leaking to disk. Settings → Privacy → Log verbosity flips to debug for diagnosis. Audit log entries (session created/loaded/exported/imported/deleted) are content-free.
Session import / export (v0.4.0) — package a complete session (chat, session memory, findings, summaries, sandbox work/, analysis DuckDB, and every objstore object the session owns) into a single .shellagent ZIP bundle and re-import it on the same or a different machine. Per-row Export icon in the sidebar, Import Chat button in the bottom-nav, /export and /import slash commands. Privacy flag preserved across the round-trip; object IDs are always regenerated on import with bounded reference rewriting in chat.json and summaries.json. See session-import-export.md.
In-place tool progress (v0.4.1) — long-running tools (currently analyze_data) update a single chat-pane bubble in place via the tool_progress activity event, rather than spawning a fresh "running" pill per progress tick. The bubble matches by tool_call_id, so future parallel-tool work won't cross-contaminate. See tool-progress-events.md.
Session delete safeguards (v0.4.2) — the row's ✕ button arms a 6-second Confirm state (red-emphasis text matching the existing bulk-delete pattern) before the destructive call fires; while the delete is in flight the row greys with a ↻ Deleting… spinner. The agent state machine holds Busy for the duration so concurrent Send / Load / Export / Import return ErrBusy instead of racing the half-deleted session directory. See session-delete-ux.md.
Sandbox UID mapping fix (v0.4.3) — on podman the container is now started with --userns=keep-id:uid=1000,gid=1000 + --user 1000:1000 instead of --user $(id -u). Large host UIDs (e.g., the 200M+ values produced by Active-Directory / LDAP-mapped corporate macOS accounts) used to fall outside the rootless subuid range and crash crun with setresuid: Invalid argument at container start; the keep-id remap pulls them into a small in-namespace UID while preserving /work file ownership. Docker path is unchanged. See sandbox-uid-mapping.md.
analyze_data row cap fix (v0.4.4) — the sliding-window summarizer used to inherit the interactive 10,000-row chat-output cap and refused to start on tables larger than that, defeating the feature in the only regime where it was interesting. A new Engine.QuerySQLForAnalyze path (backed by MaxAnalyzeRows = 1_000_000) keeps the chat-output cap intact for query_sql / query_preview / quick_summary while letting analyze_data walk through tables that are large enough for the sliding window to actually matter. The new error message (only reachable past the much higher cap) suggests pre-aggregation rather than LIMIT, since LIMIT would silently truncate the analysis to the first N rows and defeat the feature. See analyze_data-row-cap.md.
Session rename persistence fix (v0.4.5) — bindings.RenameSession used to call memory.RenameSession directly, which read chat.json from disk, mutated the title, and wrote it back. The agent's in-memory a.session.Title was untouched, so any subsequent a.session.Save() (after a Send / tool / generateTitleIfNeeded) silently overwrote the rename with the stale title and on next launch the user saw the original name. The fix routes rename through a new agent.RenameSession method that updates the in-memory title under a.mu before the disk save, mirroring the v0.4.0+ pattern where every per-session-state operation goes through the agent layer (parallels Export / Import / Delete).
Markdown attachments (v0.5.0) — drag-drop / paste .md / .txt files into the chat input alongside images. Each attachment is stored as TypeMarkdown in objstore with auto-computed Lines and Tokens metadata. Three new tools — analyze_text (sliding-window summarisation), grep_text (RE2 regex search with context), get_text (verbatim line-range read) — operate on markdown attachments AND existing create_report outputs, enabling "report on report" follow-up analysis. The LLM discovers attachments via list_objects (now showing Lines / Tokens columns for text-bearing types) and sees a Document (object ID: …): anchor at the top of any user message that carried an attachment — symmetric with the existing Image (object ID: …): convention but text-only. System prompt teaches the LLM the provenance distinction (TypeReport = agent-generated, TypeMarkdown = user-attached) so citations in follow-up reports calibrate appropriately. PDF / DOCX deferred to v0.6 (external converter contract is its own design problem). See markdown-attachments.md.
System Rules (v0.7.0) — user-authored standing instructions injected near the top of the system prompt at every turn. The AGENTS.md / CLAUDE.md analogue for shell-agent-v2: write durable rules ("always respond in Japanese", "default to creating reports for long answers", "never propose rm -rf without confirmation") once, and the agent follows them across every session. Edit from Settings → System Rules (textarea with live char / token counter + context-budget advisory) or directly in any text editor at ~/Library/Application Support/shell-agent-v2/system_rules.md. Hot-reloaded on Save — no restart needed. Separate from the four memory facilities; System Rules is configuration, not learned state. See system-rules.md.
Filtered analysis via save_query (v0.8.0) — analyze_data previously ran sliding-window analysis over the whole table only. New save_query tool materialises a SELECT result as a fresh derived base table; pass that table's name to analyze_data to deep-analyse just the filtered subset (last 24 h, errors only, one customer's events, …). Derived tables appear in list_tables alongside loaded ones, travel inside analysis.duckdb for export/import, and are wiped by reset_analysis like everything else. No engine schema changes, no bundle-format version bump. See saved-query-tables.md.
Temporal context — enriched date/time injection + resolve_date system tool

Installation

cd app
make build
# Output: dist/shell-agent-v2.app

Configuration

Settings stored at ~/Library/Application Support/shell-agent-v2/config.json.

LLM Backend

# In chat:
/model           # Show current engine
/model local     # Switch to local LLM (within the current profile)
/model vertex    # Switch to Vertex AI (within the current profile)
/profile         # List profiles, mark the current with ●
/profile <name>  # Switch this session's profile binding

Or use the status-bar pill ([Profile / Local|Vertex]) — clicking opens the Session Control Popover for one-click profile / backend switching.

Vertex AI Setup

gcloud auth application-default login
# Requires roles/aiplatform.user

Settings reference

Knobs exposed in the Settings dialog. Per-backend values override the legacy top-level fallbacks in config.json.

Agent loop

Setting	JSON path	Default	Notes
Max tool rounds per message	`agent.max_tool_rounds`	10	Hard cap on tool-call rounds for one user message. The loop-detection ring buffer (Feature 1, v0.1.16) catches stuck same-error stretches early, so raising this is reasonably safe when a long, legitimate analysis legitimately needs more rounds.

Per-backend context budget (Local / Vertex AI)

Setting	JSON path	Default (Local)	Default (Vertex)	Notes
Hot Token Limit	`llm.{local,vertex_ai}.hot_token_limit`	4096	65536	Compaction trigger. When the total token count of the Hot tier exceeds this, the oldest Hot records are summarised into Warm.
Max Context Tokens	`llm.{local,vertex_ai}.context_budget.max_context_tokens`	16384	524288	Total token budget sent to the model per call. 0 = unlimited.
Max Warm Summary Tokens	`llm.{local,vertex_ai}.context_budget.max_warm_tokens`	1024	16384	Cap for the warm-summary block. Older summaries are dropped past this.
Max Tool-Result Tokens	`llm.{local,vertex_ai}.context_budget.max_tool_result_tokens`	2048	32768	Per-tool-result truncation before insertion into the LLM message list.
Output Reserve	`llm.{local,vertex_ai}.context_budget.output_reserve`	4096	4096	Tokens reserved for the model's reply. Subtracted from `max_context_tokens` before context packing, so the request stays under the model's window.
Per-request timeout (s)	`llm.{local,vertex_ai}.request_timeout_seconds`	300	180	Per-attempt cap inside the retry layer.
Retry max attempts	`llm.{local,vertex_ai}.retry_max_attempts`	3	3	Total LLM call attempts including the first (1 = no retries). Settings → Local LLM / Vertex AI surface this.
Retry backoff base (s)	`llm.{local,vertex_ai}.retry_backoff_base_seconds`	5	5	Initial backoff between retries. Doubles on each subsequent retry, capped at the max below. Config-only — not in Settings UI.
Retry backoff max (s)	`llm.{local,vertex_ai}.retry_backoff_max_seconds`	120	120	Cap on the per-retry wait. Config-only.
Retry jitter (s)	`llm.{local,vertex_ai}.retry_jitter_seconds`	1	1	Uniform `±jitter` randomisation around each backoff. Config-only.

Sandbox (`sandbox.*`)

Setting	JSON path	Default	Notes
Enabled	`sandbox.enabled`	false	Master toggle. When off, the eight `sandbox-*` tools are not registered.
Engine	`sandbox.engine`	`auto`	`auto` picks `podman` then `docker` from PATH.
Image	`sandbox.image`	(empty until you Build)	Active container image. Locally-built images (`shell-agent-v2-sandbox:<sha>`) and `@sha256:`-pinned references are treated as safe; mutable upstream tags (e.g. `python:3.12-slim`) trigger an advisory banner in the Settings → Sandbox tab.
Max output bytes	`sandbox.max_output_bytes`	`8388608` (8 MiB)	Per-`exec` cap on each of stdout / stderr. Excess is dropped with a `[output truncated at N bytes]` marker — defends against an LLM-issued `cat /dev/zero` etc. OOMing the app. Config-only; no UI surface.
Network	`sandbox.network`	false	Egress; default off.
CPU limit	`sandbox.cpu_limit`	`2`	Passed to `--cpus`.
Memory limit	`sandbox.memory_limit`	`1g`	Passed to `--memory`.
Per-call timeout (s)	`sandbox.timeout_seconds`	60	Per-`exec` cap.

Cross-session memory trust

shell-agent-v2 auto-extracts important facts from each conversation. Cross-session entries (Global Memory) are re-injected into every future session's system prompt as authoritative context — which means anything that ever appears in an assistant turn (a quoted CSV cell, an MCP response, OCR'd image text, a fetched web page) can structurally end up steering future sessions. Each entry carries a provenance tag:

user-stated — came from a user turn, a manual pin, or an explicit "Pin to Global Memory" promotion. Treated as authoritative.
derived — extracted from an assistant turn, or a finding the LLM promoted via promote_finding. Lower trust because the content traces back through the LLM and may carry attacker- influenced bytes.

The sidebar (Global / Session Memory) and the chat-pane Findings panel show the badge inline. If a fact starts driving weird behaviour (the recoverable case being the THINK leak that prompted this hardening), open the relevant list, select the offending entries, and bulk-delete them. See docs/en/history/memory-injection-hardening.md for the full threat model and docs/en/memory-model.md for the v0.2.0 4-facility design.

Requirements

macOS 10.15+
LM Studio (for local backend) — Apple Silicon M1/M2 Pro+ recommended
GCP project with billing enabled (for Vertex AI backend)

Building

cd app
make build      # Build .app bundle
make dev        # Development with hot reload
make test       # Run tests

Documentation

Users — this README + CHANGELOG.md.
New contributors — CONTRIBUTING.md (Japanese: CONTRIBUTING.ja.md).
Maintainers / deep dives — docs/en/INDEX.md is the single entry point listing the evergreen reference documents (architecture, memory model, data analysis, privacy controls) and the chronological ADR catalogue. Japanese mirror: docs/ja/INDEX.ja.md.

Older pre-v0.2.0 audit-trail material lives under docs/en/history/ (Japanese: docs/ja/history/). Most of it has been superseded by the current reference docs — consult only when you need the historical "why."

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shell-agent-v2

Features

Installation

Configuration

LLM Backend

Vertex AI Setup

Settings reference

Agent loop

Per-backend context budget (Local / Vertex AI)

Sandbox (`sandbox.*`)

Cross-session memory trust

Requirements

Building

Documentation

License

About

Uh oh!

Releases 70

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 487 Commits
app		app
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.ja.md		CONTRIBUTING.ja.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
TODO.md		TODO.md

Folders and files

Latest commit

History

Repository files navigation

shell-agent-v2

Features

Installation

Configuration

LLM Backend

Vertex AI Setup

Settings reference

Agent loop

Per-backend context budget (Local / Vertex AI)

Sandbox (sandbox.*)

Cross-session memory trust

Requirements

Building

Documentation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 70

Contributors

Uh oh!

Languages

Sandbox (`sandbox.*`)