Skip to content

Demo: aiosandbox + Hermes + AgentKeys on ESP32 hardware #103

@hanwencheng

Description

@hanwencheng

Demo: Agent IAM for the AI device era — agentkeys wire + hooks

Pivoted 2026-05-28. This issue originally specced a single-act memory-injection demo built on a custom Rust "Hermes runtime" + a daemon memory endpoint + an extended sandbox image (12-step plan). That approach is superseded. The original plan + artifacts are archived under docs/archived/*-rust-runtime-2026-05*. The current direction below is the final expectation.

What this issue now tracks

A <5-minute, zero-config-editing demo that proves AgentKeys is Agent IAM, not chatbot infrastructure: a fresh user points a Task Host (Hermes) at AgentKeys with one command (agentkeys wire hermes) and the device immediately (1) reads only the memory it's permitted to, (2) is deterministically denied an over-cap action with no LLM in the decision, and (3) complies the instant a scope is revoked.

Strategic anchor: docs/agent-iam-strategy.md. Architecture record: docs/arch.md §22d. Terminology: docs/wiki/agent-iam-guarantee-glossary.md. Execution plan: docs/spec/plans/phase-1-fresh-user-wire-onboarding.md.

Architecture (Option B — hooks-first)

AgentKeys is the Authority Host; the Task Host does the work. We never become a Task Host (strategy §2.1/§2.4).

Vendor surface
  └── Task Host (Hermes / Claude Code / Codex / OpenClaw — has lifecycle hooks)
        ├── MCP → aiosandbox primitives (browser/file/terminal, POST :8080/mcp)
        └── MCP → AgentKeys Authority (memory/permission/cap/audit, the 7 tools)
        └── Hooks (PreToolUse/PostToolUse/Stop) → AgentKeys MCP tools  ← IAM GUARANTEE
  • IAM tool vs IAM guarantee: an MCP tool the LLM can call is not a guarantee — the LLM can skip it. A guarantee is a non-LLM gate in the execution path. Hooks are that gate (the LLM physically cannot bypass permission.check). See the glossary.
  • Hooks-first, proxy-fallback: hooks are primary (issue Phase 3: LLM-host hook integration (Claude Code, Codex/ChatGPT, etc.) #133 track; Tier-1 hosts Claude Code/Codex/Hermes/OpenClaw all have them, verified 2026-05-28). An OpenAI-compatible proxy is the lower-priority fallback (Phase 3b) for hosts without a hook surface (xiaozhi-server, mobile chatbots).
  • aiosandbox is a sandbox primitive, not a Task Host — it supplies browser/file/terminal; the Task Host runs inside it.

The fresh-user journey (7 steps)

  1. Install aiosandbox (docker run --security-opt seccomp=unconfined … ghcr.io/agent-infra/sandbox)
  2. curl … | bash installs the AgentKeys CLI; bootstraps device key + pairing
  3. On the master device: provision creds (LLM keys) + memory namespace scopes
  4. Install a Task Host (Hermes) — do NOT run its setup wizard
  5. agentkeys wire hermes — one idempotent command
  6. AgentKeys writes the hook scripts + hooks: config + consent + LLM key into the runtime
  7. Open the runtime → first conversation is already memory-aware (the "surprise")

The three-act demo (the pitch)

  1. Permissioned Memory — device reads ONLY its granted namespace ("knows what it's allowed to know about you", not "knows you")
  2. Deterministic Denial — over-cap spend → permission.check returns denied: daily_spend_cap_exceeded; the device refuses. No LLM in the decision.
  3. Online Revocation — parent revokes payment scope; next attempt fails on the online cap check.

Implementation status

Component Status
AgentKeys MCP server — 7 tools (identity.whoami, memory.get/put, permission.check, cap.mint/revoke, audit.append) ✅ shipped (#107)
Docs / strategy reset (strategy doc move, arch §22d, wiki glossary, plan) merged (#140)
agentkeys wire hermes + agentkeys hook check/audit/memory-inject + Hermes adapter + operator runbook 🔄 open (#141) — verified end-to-end against the in-memory MCP backend
Phase 1.b adapters — Claude Code / Codex / OpenClaw (the RuntimeAdapter seam) ⬜ deferred (#133)
Proxy fallback for hook-less hosts ⬜ deferred (Phase 3b)
Nightly drift-check cron, live-session identity defaulting, cap-mint pre-warming ⬜ deferred (plan §11)
ESP32 firmware (MagicLick / xiaozhi) deferred — the device-side substitute is a laptop curl/Task-Host today

Scope

IN (current): the 7-step agentkeys wire hermes flow + the three-act demo, driven from a laptop/sandbox; single demo actor; in-memory or real MCP backend.

OUT (deferred): ESP32 firmware, voice STT/TTS, multi-tenant orchestration, billing, the parent-control web UI's full build, real-time on-chain audit, cross-vendor memory portability.

Acceptance criteria

A reviewer, following docs/operator-runbook-wire.md (lands in #141), can on a fresh machine:

  • run agentkeys wire hermes and see every step report ok proceeding (re-run → all skip … matches)
  • send a query and get a response that reflects the permitted memory namespace (Act 1)
  • attempt an over-cap action and watch it be deterministically blocked with the cap reason (Act 2)
  • revoke a scope and watch the next attempt fail (Act 3)
  • all within ~15 minutes, editing zero config files by hand

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/firmwareESP32 firmware, device-side code, MCU workarea/mcpMCP server, MCP tool integration, MCP protocol work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions