macOS local-first chat & agent tool with interactive data analysis.
Successor to shell-agent v0.7.x, redesigned with session-scoped analysis, an Idle/Busy agent execution model, and hybrid LLM backend (Local + Vertex AI).
- Interactive data analysis — dialogue-driven exploration with embedded DuckDB. Every analysis tool (
load_data,query_sql,describe_data,save_query,analyze_data, etc.) is exposed to the LLM every round so the model can plan multi-step workflows up front instead of discovering tools round-by-round. See agent-tool-visibility.md. Settools.hide_analysis_tools_until_data_loaded: trueinconfig.jsonto restore the pre-v0.1.21 hide-until-load behaviour (opt-in for weaker local backends). - Session-scoped analysis — each session owns its own database, no cross-session state leakage
- Agent execution model — Idle/Busy states with UI lockout during processing
- Hybrid LLM backend — Local LLM (LM Studio) and Vertex AI (Gemini), switchable at runtime via
/model - Multi-profile LLM (v0.12.0) — define multiple named
(Local, VertexAI)profiles for billing attribution / GCP project isolation. Each session references one profile (persisted insession.json). Edit profiles in Settings → LLM Profiles (live-apply on blur, no Save button); switch a session via the status-bar pill popover or/profile <name>chat command. See ADR-0016. - Per-backend context budgets —
ContextBudgetconfigured separately for Local and Vertex inside each profile (Settings → LLM Profiles → pick profile → expand the Local / Vertex AI sections). - Memory model (v0.2.0 rewrite) — four facilities work together. Records (immutable conversation history) live in
chat.json. Session Memory auto-extractsfact/contextper session. Findings are session-scoped data-analysis discoveries surfaced in a dedicated chat-pane panel. Global Memory holdspreference/decisionacross sessions. Auto-extraction routes by category; "Pin to Global Memory" is the explicit user action that promotes a Session Memory entry or a Finding into the cross-session pool. Context-budget enforcement is non-destructive (internal/contextbuildsummary cache). See memory-model.md. - Container sandbox (opt-in) — eight
sandbox-*tools that execute shell or Python in a per-sessionpodman/dockercontainer with/workmounted from the session's data dir, MITL-gated, network-off by default. Includessandbox_load_into_analysis(CSV/JSON in/work→ DuckDB) andsandbox_export_sql(SQL query → CSV in/work) so query results flow between analysis and Python without round-tripping through chat. See sandbox-execution.md for the macOS setup guide. - Findings panel — chat-pane disclosure with severity filter, free-text search, bulk delete, real-time refresh, and a Pin-to-Global-Memory star button per row.
- Shell script Tool Calling — register scripts as tools with MITL approval for write/execute. Per-tool
@timeout: Nheader (seconds) overrides the 30-second default for legitimately long-running tools — see agent-tool-visibility.md and tool-execution-timeout.md. Scripts can write to$SHELL_AGENT_WORK_DIR(the same physical directory the sandbox bind-mounts at/work); use the built-inregister_objecttool to surface the artefact in chat asobject:<ID>— see work-dir-shell-bridge.md. - MITL approval, end-to-end — every tool source (analysis / shell / sandbox / MCP) routes through one gate. Destructive analysis tools (
load_data,reset_analysis,promote_finding) and SQL/analyze prompts are MITL-by-default; metadata reads (describe_data,list_tables, etc.) are not. Override per-tool from Settings → Tools — the toggle reflects the actual dispatcher default. See security-hardening-2.md. - Bundled scripts —
file_info,preview_file,list_files,weather,get_location,write_note. Auto-installed on first launch viago:embed; user customizations are preserved. Optional shell-tool examples wrapping companion CLIs (not auto-installed — copy fromexamples/shell_tools/into your tool dir):web-search(gem-search),generate-image(gem-image),search-kb-gem(gem-rag — Vertex AI Gemini RAG),search-kb-lite(lite-rag — local LLM RAG). KB examples require the corresponding CLI installed and a pre-indexed corpus. - Tool-call timeline — every tool start/end appears inline in the chat as a transient pill, in addition to the existing status-bar indicator. The pill is restored on session reload as a compact tool-name + status (success / error) bubble; live argument and result text remain ephemeral. See tool-event-restore.md.
- Background task visibility — when the agent kicks off post-response work (title generation, memory extraction), a small badge appears in the input-status-bar naming what's running. The input field stays disabled until those tasks finish, so the next user message can't race them and lose extracted facts. See background-task-indicator.md.
- MCP support — via mcp-guardian stdio proxy
- Multimodal — image input via drag & drop, paste, or file picker
- Per-session Data panel — collapsible disclosure at the top of the chat pane showing the current session's objects (images / reports / blobs as cards with thumbnails), DuckDB tables (click for a 20-row preview), and sandbox
/workfiles. Click an image for the lightbox, a report for the markdown viewer, or a CSV / text blob for an in-app preview — CSV / TSV render as an HTML table, other text MIMEs (JSON, plain text, HTML, etc.) drop to a fixed-width pre. Bulk-select and delete with separate Yes / No confirmation. - Bulk select / delete — Findings, Global Memory, and Session Memory entries can be checked individually or all-at-once, with two-click confirm.
- Private sessions (v0.3.0) —
+ New Private Chatin the sidebar bottom-nav opts a session out of cross-session Global Memory promotion.preference/decisionfacts are dropped at the extraction layer;Pin to Global Memoryis hidden in the UI and rejected server-side. A 🔒 indicator appears on the sidebar row and as a chat-pane banner. The privacy flag is fixed at session creation and persisted inchat.json(omitemptykeeps legacy sessions loading as non-private). See privacy-controls.md. - Log privacy controls (v0.3.0) —
app.logdefaults toinfolevel so prompt / response / tool-argument bodies stop leaking to disk. Settings → Privacy → Log verbosity flips todebugfor diagnosis. Audit log entries (session created/loaded/exported/imported/deleted) are content-free. - Session import / export (v0.4.0) — package a complete session (chat, session memory, findings, summaries, sandbox
work/, analysis DuckDB, and every objstore object the session owns) into a single.shellagentZIP bundle and re-import it on the same or a different machine. Per-row Export icon in the sidebar, Import Chat button in the bottom-nav,/exportand/importslash commands. Privacy flag preserved across the round-trip; object IDs are always regenerated on import with bounded reference rewriting inchat.jsonandsummaries.json. See session-import-export.md. - In-place tool progress (v0.4.1) — long-running tools (currently
analyze_data) update a single chat-pane bubble in place via thetool_progressactivity event, rather than spawning a fresh "running" pill per progress tick. The bubble matches bytool_call_id, so future parallel-tool work won't cross-contaminate. See tool-progress-events.md. - Session delete safeguards (v0.4.2) — the row's ✕ button arms a 6-second
Confirmstate (red-emphasis text matching the existing bulk-delete pattern) before the destructive call fires; while the delete is in flight the row greys with a↻ Deleting…spinner. The agent state machine holds Busy for the duration so concurrent Send / Load / Export / Import returnErrBusyinstead of racing the half-deleted session directory. See session-delete-ux.md. - Sandbox UID mapping fix (v0.4.3) — on
podmanthe container is now started with--userns=keep-id:uid=1000,gid=1000+--user 1000:1000instead of--user $(id -u). Large host UIDs (e.g., the 200M+ values produced by Active-Directory / LDAP-mapped corporate macOS accounts) used to fall outside the rootless subuid range and crashcrunwithsetresuid: Invalid argumentat container start; the keep-id remap pulls them into a small in-namespace UID while preserving/workfile ownership. Docker path is unchanged. See sandbox-uid-mapping.md. analyze_datarow cap fix (v0.4.4) — the sliding-window summarizer used to inherit the interactive 10,000-row chat-output cap and refused to start on tables larger than that, defeating the feature in the only regime where it was interesting. A newEngine.QuerySQLForAnalyzepath (backed byMaxAnalyzeRows = 1_000_000) keeps the chat-output cap intact forquery_sql/query_preview/quick_summarywhile lettinganalyze_datawalk through tables that are large enough for the sliding window to actually matter. The new error message (only reachable past the much higher cap) suggests pre-aggregation rather thanLIMIT, sinceLIMITwould silently truncate the analysis to the first N rows and defeat the feature. See analyze_data-row-cap.md.- Session rename persistence fix (v0.4.5) —
bindings.RenameSessionused to callmemory.RenameSessiondirectly, which readchat.jsonfrom disk, mutated the title, and wrote it back. The agent's in-memorya.session.Titlewas untouched, so any subsequenta.session.Save()(after a Send / tool /generateTitleIfNeeded) silently overwrote the rename with the stale title and on next launch the user saw the original name. The fix routes rename through a newagent.RenameSessionmethod that updates the in-memory title undera.mubefore the disk save, mirroring the v0.4.0+ pattern where every per-session-state operation goes through the agent layer (parallels Export / Import / Delete). - Markdown attachments (v0.5.0) — drag-drop / paste
.md/.txtfiles into the chat input alongside images. Each attachment is stored asTypeMarkdownin objstore with auto-computedLinesandTokensmetadata. Three new tools —analyze_text(sliding-window summarisation),grep_text(RE2 regex search with context),get_text(verbatim line-range read) — operate on markdown attachments AND existingcreate_reportoutputs, enabling "report on report" follow-up analysis. The LLM discovers attachments vialist_objects(now showing Lines / Tokens columns for text-bearing types) and sees aDocument (object ID: …):anchor at the top of any user message that carried an attachment — symmetric with the existingImage (object ID: …):convention but text-only. System prompt teaches the LLM the provenance distinction (TypeReport = agent-generated, TypeMarkdown = user-attached) so citations in follow-up reports calibrate appropriately. PDF / DOCX deferred to v0.6 (external converter contract is its own design problem). See markdown-attachments.md. - System Rules (v0.7.0) — user-authored standing instructions injected near the top of the system prompt at every turn. The
AGENTS.md/CLAUDE.mdanalogue for shell-agent-v2: write durable rules ("always respond in Japanese", "default to creating reports for long answers", "never proposerm -rfwithout confirmation") once, and the agent follows them across every session. Edit from Settings → System Rules (textarea with live char / token counter + context-budget advisory) or directly in any text editor at~/Library/Application Support/shell-agent-v2/system_rules.md. Hot-reloaded on Save — no restart needed. Separate from the four memory facilities; System Rules is configuration, not learned state. See system-rules.md. - Filtered analysis via
save_query(v0.8.0) —analyze_datapreviously ran sliding-window analysis over the whole table only. Newsave_querytool materialises aSELECTresult as a fresh derived base table; pass that table's name toanalyze_datato deep-analyse just the filtered subset (last 24 h, errors only, one customer's events, …). Derived tables appear inlist_tablesalongside loaded ones, travel insideanalysis.duckdbfor export/import, and are wiped byreset_analysislike everything else. No engine schema changes, no bundle-format version bump. See saved-query-tables.md. - Temporal context — enriched date/time injection +
resolve_datesystem tool
cd app
make build
# Output: dist/shell-agent-v2.appSettings stored at ~/Library/Application Support/shell-agent-v2/config.json.
# In chat:
/model # Show current engine
/model local # Switch to local LLM (within the current profile)
/model vertex # Switch to Vertex AI (within the current profile)
/profile # List profiles, mark the current with ●
/profile <name> # Switch this session's profile bindingOr use the status-bar pill ([Profile / Local|Vertex]) — clicking opens the Session Control Popover for one-click profile / backend switching.
gcloud auth application-default login
# Requires roles/aiplatform.userKnobs exposed in the Settings dialog. Per-backend values
override the legacy top-level fallbacks in config.json.
| Setting | JSON path | Default | Notes |
|---|---|---|---|
| Max tool rounds per message | agent.max_tool_rounds |
10 | Hard cap on tool-call rounds for one user message. The loop-detection ring buffer (Feature 1, v0.1.16) catches stuck same-error stretches early, so raising this is reasonably safe when a long, legitimate analysis legitimately needs more rounds. |
| Setting | JSON path | Default (Local) | Default (Vertex) | Notes |
|---|---|---|---|---|
| Hot Token Limit | llm.{local,vertex_ai}.hot_token_limit |
4096 | 65536 | Compaction trigger. When the total token count of the Hot tier exceeds this, the oldest Hot records are summarised into Warm. |
| Max Context Tokens | llm.{local,vertex_ai}.context_budget.max_context_tokens |
16384 | 524288 | Total token budget sent to the model per call. 0 = unlimited. |
| Max Warm Summary Tokens | llm.{local,vertex_ai}.context_budget.max_warm_tokens |
1024 | 16384 | Cap for the warm-summary block. Older summaries are dropped past this. |
| Max Tool-Result Tokens | llm.{local,vertex_ai}.context_budget.max_tool_result_tokens |
2048 | 32768 | Per-tool-result truncation before insertion into the LLM message list. |
| Output Reserve | llm.{local,vertex_ai}.context_budget.output_reserve |
4096 | 4096 | Tokens reserved for the model's reply. Subtracted from max_context_tokens before context packing, so the request stays under the model's window. |
| Per-request timeout (s) | llm.{local,vertex_ai}.request_timeout_seconds |
300 | 180 | Per-attempt cap inside the retry layer. |
| Retry max attempts | llm.{local,vertex_ai}.retry_max_attempts |
3 | 3 | Total LLM call attempts including the first (1 = no retries). Settings → Local LLM / Vertex AI surface this. |
| Retry backoff base (s) | llm.{local,vertex_ai}.retry_backoff_base_seconds |
5 | 5 | Initial backoff between retries. Doubles on each subsequent retry, capped at the max below. Config-only — not in Settings UI. |
| Retry backoff max (s) | llm.{local,vertex_ai}.retry_backoff_max_seconds |
120 | 120 | Cap on the per-retry wait. Config-only. |
| Retry jitter (s) | llm.{local,vertex_ai}.retry_jitter_seconds |
1 | 1 | Uniform ±jitter randomisation around each backoff. Config-only. |
| Setting | JSON path | Default | Notes |
|---|---|---|---|
| Enabled | sandbox.enabled |
false | Master toggle. When off, the eight sandbox-* tools are not registered. |
| Engine | sandbox.engine |
auto |
auto picks podman then docker from PATH. |
| Image | sandbox.image |
(empty until you Build) | Active container image. Locally-built images (shell-agent-v2-sandbox:<sha>) and @sha256:-pinned references are treated as safe; mutable upstream tags (e.g. python:3.12-slim) trigger an advisory banner in the Settings → Sandbox tab. |
| Max output bytes | sandbox.max_output_bytes |
8388608 (8 MiB) |
Per-exec cap on each of stdout / stderr. Excess is dropped with a [output truncated at N bytes] marker — defends against an LLM-issued cat /dev/zero etc. OOMing the app. Config-only; no UI surface. |
| Network | sandbox.network |
false | Egress; default off. |
| CPU limit | sandbox.cpu_limit |
2 |
Passed to --cpus. |
| Memory limit | sandbox.memory_limit |
1g |
Passed to --memory. |
| Per-call timeout (s) | sandbox.timeout_seconds |
60 | Per-exec cap. |
shell-agent-v2 auto-extracts important facts from each conversation. Cross-session entries (Global Memory) are re-injected into every future session's system prompt as authoritative context — which means anything that ever appears in an assistant turn (a quoted CSV cell, an MCP response, OCR'd image text, a fetched web page) can structurally end up steering future sessions. Each entry carries a provenance tag:
- user-stated — came from a user turn, a manual pin, or an explicit "Pin to Global Memory" promotion. Treated as authoritative.
- derived — extracted from an assistant turn, or a finding the
LLM promoted via
promote_finding. Lower trust because the content traces back through the LLM and may carry attacker- influenced bytes.
The sidebar (Global / Session Memory) and the chat-pane Findings panel show the badge inline. If a fact starts driving weird behaviour (the recoverable case being the THINK leak that prompted this hardening), open the relevant list, select the offending entries, and bulk-delete them. See docs/en/history/memory-injection-hardening.md for the full threat model and docs/en/memory-model.md for the v0.2.0 4-facility design.
- macOS 10.15+
- LM Studio (for local backend) — Apple Silicon M1/M2 Pro+ recommended
- GCP project with billing enabled (for Vertex AI backend)
cd app
make build # Build .app bundle
make dev # Development with hot reload
make test # Run tests- Users — this README + CHANGELOG.md.
- New contributors — CONTRIBUTING.md (Japanese: CONTRIBUTING.ja.md).
- Maintainers / deep dives — docs/en/INDEX.md is the single entry point listing the evergreen reference documents (architecture, memory model, data analysis, privacy controls) and the chronological ADR catalogue. Japanese mirror: docs/ja/INDEX.ja.md.
Older pre-v0.2.0 audit-trail material lives under
docs/en/history/ (Japanese:
docs/ja/history/). Most of it has been
superseded by the current reference docs — consult only when you
need the historical "why."
MIT