diff --git a/.gitignore b/.gitignore index 4bbdb4e..f8d8e97 100644 --- a/.gitignore +++ b/.gitignore @@ -17,3 +17,4 @@ server/bin/ !server/internal/web/dist/ server/internal/web/dist/* !server/internal/web/dist/.gitkeep +.worktrees diff --git a/docs/ideation/2026-06-08-single-deuce-agent-ideation.md b/docs/ideation/2026-06-08-single-deuce-agent-ideation.md new file mode 100644 index 0000000..a065964 --- /dev/null +++ b/docs/ideation/2026-06-08-single-deuce-agent-ideation.md @@ -0,0 +1,122 @@ +--- +date: 2026-06-08 +topic: single-deuce-agent +focus: Collapse the 5 agent roles into one "Deuce" agent that supports skills + subagents; can the Pi harness support it? +mode: repo-grounded +--- + +# Ideation: One "Deuce" Agent With Skills + Subagents + +## Verdict + +**Yes — Pi can support a single "Deuce" modulated by skills + subagents, and the design is closer than it looks.** Under the Pi harness, an agent's `role` and `system_prompt` are fetched from the DB but **never forwarded to Pi** — Pi already runs as one generic coding agent regardless of which of the 5 rows triggered it. The 5 "agents" are largely a UI fiction at the execution layer, so the collapse removes a distinction the harness already ignores. + +> **Validated 2026-06-08** (spike against real `pi` 0.74.2; see `docs/solutions/architecture-patterns/pi-loads-agent-skills-standard-in-rpc-mode.md`). The compound-engineering plugin was exported to Pi's agent layout (`server/tmp/agent/`) and tested: +> +> - **Skills load in `--mode rpc`.** All 36 exported skills appeared via the `get_commands` RPC, each invocable as `/skill:`. `--mode rpc` is just an output format, orthogonal to skill loading. +> - **Pi implements the [Agent Skills standard](https://agentskills.io/specification)** (same as Claude Code) and auto-discovers from `~/.pi/agent/skills/` — the sibling of where Deuce already provisions `ask-user.ts`. So provisioning is the existing `InstallPiExtension` mechanism generalized to a tree. +> - **Deterministic invocation exists:** sending a `prompt` of `/skill:ce-commit …` expands the skill before processing — no reliance on the model choosing to `read` it. This *replaces* Idea 1's prompt-prepend hack. +> - **Correction to the constraint below:** `--system-prompt` / `--append-system-prompt` *are* real launch flags, so a per-session "Deuce" identity prompt is available at launch. There is still no *mid-session per-task* override via an RPC command — but per-task specialization rides `/skill:` expansion, which is cleaner. +> - **Still open:** live model triggering (needs an API key) and subagents (the `agents/*.md` personas run via the `pi-subagents` extension, which forks child Pi processes that bypass Deuce's `pirun.Key` supervisor + WS attribution — the Survivor 3 design question). + +The one real harness constraint (as originally framed): Pi `--mode rpc` has no per-session *system-prompt* override — but the task `Prompt` does reach Pi, which is the seam skills ride (Idea 1). *(See the validation note above — launch-time `--append-system-prompt` and `/skill:` expansion both turned out to be available, softening this constraint.)* + +## Grounding Context (Codebase) + +- **Agent model:** `agents` table (`id, name, role(free-text, never enum), color, color_muted, provider, model, description, system_prompt, deleted_at`), 5 hardcoded seed rows (Coder/Reviewer/Planner/Tester/Designer). `session_agents` join table `(session_id, agent_id, status, claude_session_id, pi_session_id)` allows multiple agents per session. Frontend `AgentRole = string` (deliberately un-enumerated). +- **Role is already decoupled from execution.** Under Pi, `role`/`system_prompt` are fetched (`messages.go`) but never forwarded. `EnqueueParams` carries only `{SessionID, AgentID, RequestedBy, AnchorMessageID, Prompt, WorkspaceID}`. The `system_prompt` is silently dropped on the Pi path — a live design-debt bug (only the legacy `claude` executor consumed `--append-system-prompt`). +- **@mention flow:** frontend `ChatView.tsx` regex-parses `@Name`, matches `session.agents` by name → UUID → `sendMessage({content, mentions:[uuid]})`. Backend `messages.go` loops mentions and enqueues to the Pi runtime. A near-miss name (`@coder`) silently no-ops. +- **Pi harness topology:** Go drives a persistent `pi --mode rpc` process inside each session's DevPod container via `devpod ssh`. Process granularity is `pirun.Key{SessionID, AgentID}` — one process per (session, agent) pair, serial queue per key. JSONL protocol; custom `ask-user.ts` extension provisioned to `~/.pi/agent/extensions/` via `pi.registerTool()` is the only extension today and the natural skills seam. +- **No "skills" concept exists** in the codebase. **Pi has no native subagent dispatch** in RPC mode; community `@tintinweb/pi-subagents` spawns child Pi instances (foreground works in RPC, background/scheduled degraded). Subagents today = the server spawning multiple Pi processes under separate agent rows. +- **Coupling points for a collapse:** DB schema (agents/session_agents/seed), `pirun.Key`, `EnqueueParams.AgentID`, the `messages.go` mention loop, frontend mention parse + per-agent message border colors, `SummaryPanel`/`AgentTaskCard`/`AgentThreadDrawer`/`AgentSettingsDialog`, WS payloads carrying `agentId`. + +### External context + +- Causal-independence is the criterion for safe role→single-agent collapse (arXiv:2601.04748): safe when roles don't feed back mid-session; 15–40% accuracy drop when downstream depends on upstream. In Deuce the human drives next steps → roles largely decoupled. +- Skill shadowing (arXiv:2605.24050): flat libraries degrade up to 21% at scale from selection failure; fix is hierarchical category→skill routing. +- Claude Code: Skills = inline file-based bounded capability bundles; Subagents (Task tool) = spawned child sessions w/ own context + restricted tools for parallel/isolated work. OpenAI/CrewAI/AutoGen lesson: start single-agent + tools, split only at real limits; over-splitting → routing failures, duplicated instructions, drift. + +### Past learnings (docs/solutions/) + +- No direct prior art on agent-role consolidation, the Pi RPC harness, or skills/subagents — treat as greenfield, strong `/ce-compound` candidate once the design lands. +- Mirror the "one identity model, one user table, no second source of truth" precedent (`embedded-ssh-proxy-for-vscode-remote.md`) when attributing Deuce + subagents. +- Re-run the route-authorization audit (`broadening-resource-visibility-requires-per-route-authorization-audit.md`) for any new "invoke skill" / "spawn subagent" / "set active agent" endpoint; `UpdateSessionAgents`/`StopAgent` are already write/live-gated. + +## Topic Axes + +- Agent identity & data model +- Skills (capability units) +- Subagents (parallel dispatch) +- Invocation & UX +- Migration & harness fit + +## Ranked Ideas + +### 1. Skill specialization rides the prompt channel — the keystone that ships on Pi today +**Description:** Since Pi RPC won't honor a system-prompt override but `EnqueueParams.Prompt` *does* reach Pi, a "skill" becomes a preparation block prepended to the task prompt (and/or a `pi.registerTool()` extension, the same seam `ask-user.ts` uses). "Behave like a reviewer" stops being a dropped DB column and becomes injected instruction Pi actually executes. +**Axis:** Skills +**Basis:** `direct:` — `system_prompt` is fetched (`messages.go`) but silently dropped on the Pi path; `EnqueueParams.Prompt` is the one channel that reaches Pi; `ask-user.ts` proves the `registerTool` seam works. +**Rationale:** Simultaneously pays down the live "prompt does nothing" debt and is the load-bearing primitive every skill depends on. Without it, "skills" have nowhere to attach. +**Downsides:** Prompt-prepend is weaker than a true system prompt (model can drift); registerTool path needs per-container extension provisioning. +**Confidence:** 90% **Complexity:** Low–Medium **Status:** Unexplored + +### 2. Strangler collapse: 5 rows → skill-presets → one runtime, then delete +**Description:** Don't big-bang the schema. Keep the 5 `agents` rows as thin UI presets that all resolve to one `Key{SessionID}` runtime "Deuce"; convert each role's prompt into a skill; let `@Coder` desugar to "Deuce + coder skill." Collapse the rows later once telemetry shows nobody depends on the distinction. +**Axis:** Agent identity & data model (+ migration) +**Basis:** `reasoned:` + `direct:` — grounding lists heavy coupling (DB schema, `pirun.Key`, `EnqueueParams.AgentID`, mention loop, WS `agentId` payloads, sidebar/colors); a strangler keeps every `agentId` contract intact while redefining what it resolves to, so historical sessions still render and rollback stays cheap. +**Rationale:** Turns a destructive migration into an additive, reversible one; the 5 proven role prompts become the seed skill library instead of being discarded. +**Downsides:** Temporary two-model period (rows that lie about being agents); telemetry needed to know when to cut. +**Confidence:** 85% **Complexity:** Medium **Status:** Unexplored + +### 3. One Pi process per session; subagents as the parallelism escape hatch +**Description:** Drop `AgentID` from `pirun.Key` → one persistent `pi --mode rpc` per session. The concurrency you get today from 5 keyed processes relocates downward to subagents spawned on demand (foreground `@tintinweb/pi-subagents`, or the server spawning child Pi processes — which is literally what the 5-agent model already does). "Deuce" owns ordering/continuity of the channel; subagents are bounded parallel workers that report back as Deuce. +**Axis:** Subagents (parallel dispatch) +**Basis:** `direct:` + `external:` — today's concurrency comes from per-(session,agent) processes with serial per-key queues; Pi has no native subagent dispatch but the multi-process machinery already exists; Unix fork/exec (identity in parent, specialization in child) and Anthropic's "subagents are a performance tier, not a role taxonomy." +**Rationale:** The hardest part of subagents (lifecycle, isolation, message routing) is already built by the 5-agent model — collapsing roles frees that machinery to serve as the real parallelism tier. +**Downsides:** A single per-session process serializes one user's turns unless subagents are wired in; background/scheduled Pi subagents are documented as degraded in RPC. +**Confidence:** 75% **Complexity:** Medium–High **Status:** Unexplored + +### 4. Skills as git-versioned files compiled from docs/solutions/ — the compounding flywheel +**Description:** A skill is a directory (`SKILL.md` + optional `tool.ts`) discovered from the DevPod filesystem where Pi already loads `~/.pi/agent/extensions/`. A "skill-compiler" reads existing `docs/solutions/` learnings (already carrying `module`/`tags`/`problem_type` frontmatter) and emits skills — so every documented fix permanently upgrades the one Deuce agent. Match the Claude Code skill manifest shape for ecosystem interop. +**Axis:** Skills (capability units) +**Basis:** `external:` + `direct:` — Claude Code models skills as file-based bounded bundles; Pi loads extensions from the filesystem; the repo already ships structured `docs/solutions/` plus recently-committed compound-engineering plans. +**Rationale:** Distribution/versioning/review come free from git; an open-source project gets a community skill library with zero new infra; solve-once → document → auto-compile → never re-learn. +**Downsides:** Skill-compiler is real work; manifest-compat is a moving target. +**Confidence:** 70% **Complexity:** Medium–High **Status:** Unexplored + +### 5. Hierarchical skill routing + auto-selection from day one (avoid the 21% shadowing cliff) +**Description:** Make skill selection automatic and invisible (Deuce routes by message content — the user never re-picks a "role"), and structure skills as category→skill from the start (mirroring the `docs/solutions/` category folders). Optionally enforce conjunctive preconditions per skill so only 1–2 fire per turn. +**Axis:** Skills / Invocation & UX +**Basis:** `external:` — flat skill libraries degrade up to 21% from selection failure at scale; the documented fix is hierarchical routing (arXiv:2605.24050). +**Rationale:** The single-agent move invites skill proliferation; designing the two-tier router before the library grows avoids a painful re-shard and an accuracy regression that would make Deuce feel dumber than the 5-role version. +**Downsides:** Auto-selection hurts discoverability ("what can Deuce do?"); a routing hop adds latency/cost. +**Confidence:** 70% **Complexity:** Medium **Status:** Unexplored + +### 6. Keep the menu, drop the entities — roles become slash-presets; provenance badges replace role colors +**Description:** Users may have liked five buttons, not five beings. Reframe roles as invocation presets (`/review`, `/plan`) over one Deuce. Replace the per-agent message border colors (meaningless when one agent speaks) with a provenance chip — "via review skill" / "subagent: tester" — so the visual signal tracks behavior invoked, not identity picked. Also fixes the silent near-miss `@mention` no-op. +**Axis:** Invocation & UX +**Basis:** `external:` + `direct:` — UX research (arXiv:2601.18275) found users liked named personas for engagement but suffered attribution confusion; role colors and `@Name→UUID` mention-matching are concrete coupling points today. +**Rationale:** Ship the collapse while preserving the discoverable-specialization users actually valued. +**Downsides:** Loses the multi-participant "team in a channel" feel that STRATEGY.md bets on — needs a deliberate product call. +**Confidence:** 70% **Complexity:** Medium **Status:** Unexplored + +### 7. Causal-independence as the explicit "is this overkill?" boundary +**Description:** The principled answer to the "probably overkill" intuition: collapsing roles is safe exactly when they don't feed each other's output mid-session — and risky (15–40% accuracy drop) when downstream depends on upstream (Planner→Coder→Reviewer chains). Encode it as a per-skill property ("produces feedback consumed mid-session: yes/no") that doubles as the runtime rule for when Deuce runs inline vs. spawns an isolated subagent. +**Axis:** Migration & harness fit / Subagents +**Basis:** `external:` — arXiv:2601.04748 causal-independence criterion. In Deuce the human drives the next step, so roles are largely decoupled → collapse is mostly safe. +**Rationale:** Replaces taste ("feels like overkill") with a measurable boundary; the same formalism that licenses the migration becomes the dispatch policy for skills-vs-subagents. +**Downsides:** Classifying feedback-dependence per skill is fuzzy; over-applying it re-introduces complexity you're trying to shed. +**Confidence:** 65% **Complexity:** Medium **Status:** Unexplored + +## Rejection Summary + +| # | Idea | Reason Rejected | +|---|------|-----------------| +| 1 | "Deuce always listening" / no @mentions / ambient presence-trigger | Interesting but a separate product bet (presence model); better as a brainstorm variant than bundled with the collapse | +| 2 | Delete `agents` table outright / Zero-Agent / implicit singleton | Right endpoint, but Idea 2 (strangler) reaches it more safely; merged | +| 3 | 1000 Deuces / identity-per-spawn ephemeral children | Captured by Idea 3's subagent model; standalone it's speculative UI churn | +| 4 | Stem-cell potency, pharmacokinetic half-life, improv "yes-and", orchestra conductor, pin-tumbler, Swiss-army, ED triage | Framing devices — substance folded into Ideas 1/3/5; analogies alone aren't actionable moves | +| 5 | Auto-derive behavior from repo context (kill system_prompt) | Overlaps Ideas 1/4; "environment is the prompt" is downside-mitigation, not its own move | +| 6 | skill.json = Claude Code manifest subset | Folded into Idea 4 as the interop sub-point | + +All five axes have survivors; no deliberate coverage gaps. diff --git a/docs/plans/2026-06-08-002-feat-hide-agent-chat-messages-plan.md b/docs/plans/2026-06-08-002-feat-hide-agent-chat-messages-plan.md new file mode 100644 index 0000000..e86759e --- /dev/null +++ b/docs/plans/2026-06-08-002-feat-hide-agent-chat-messages-plan.md @@ -0,0 +1,131 @@ +--- +title: "feat: Hide agent replies from main chat, surface them only in super thread cards" +type: feat +status: completed +date: 2026-06-08 +depth: lightweight +--- + +# feat: Hide agent replies from main chat, surface them only in super thread cards + +## Summary + +Agent task replies currently render **twice**: as a `MessageBubble` in the main chat list *and* as the reply text in the inline super-thread card (`AgentTaskCard`). Both come from the same `reply` string emitted at task finalization. This plan hides the chat-bubble copy so agent replies live only on the thread surfaces (the inline card and the agent thread drawer), while leaving everything else in chat — human messages and system/operational notices — untouched. + +This is a **display-only** change. Agent messages remain persisted (they feed the agent's own conversation-history context) and keep flowing over WebSocket; they are simply filtered out of the rendered chat list. + +--- + +## Problem Frame + +When a user @mentions an agent, the agent's final reply is surfaced in two places at once: + +1. **Inline super-thread card** — `task.reply` shown in [`AgentTaskCard`](src/components/super-threads/AgentTaskCard.tsx#L100-L125) under the triggering message, and in the drawer `Turn` ([AgentThreadDrawer.tsx:108-138](src/components/super-threads/AgentThreadDrawer.tsx#L108-L138)). +2. **A full chat `MessageBubble`** — posted as a normal `authorType: "agent"` chat message and rendered in [`ChatView`](src/components/chat/ChatView.tsx#L422-L467). + +Both originate from the *same* `reply` string in [`runtime.go:406-419`](server/internal/agent/runtime.go#L406-L419): the `task_completed` event carries it into `task.reply`, and `replyPoster` posts it as the chat message. The duplication clutters the chat and undercuts the super-thread surface as the place to follow agent work. + +**Desired behavior:** agent replies do not appear in the chat outside of threads. The last reply is shown in the super-thread card (which it already is). Human messages and system notices stay in chat. + +--- + +## Scope Boundaries + +**In scope** +- Filter real agent replies out of the rendered main chat list in `ChatView`. +- Keep human messages and system/operational notices (e.g. "Workspace is still starting", "Agent queue is full") visible. +- Preserve inline `AgentTaskCard` rendering, message-header grouping, empty-state, and auto-scroll behavior. + +**Out of scope / non-goals** +- No change to message persistence, the WebSocket broadcast, or the backend reply-posting path. Agent messages must remain in the DB and the store because `buildChatHistory` ([messages.go](server/internal/handler/messages.go)) uses them for agent context continuity. +- No change to `AgentTaskCard` / `AgentThreadDrawer` reply rendering — they already show `task.reply`. + +### Deferred to Follow-Up Work +- **Unread-count behavior for hidden agent replies.** `addMessage` ([session-store.ts:202-225](src/stores/session-store.ts#L202-L225)) still bumps the session unread count when a (now hidden) agent reply arrives. This is arguably correct — the card *is* new content — but if the team wants unread to track only visible chat, that's a separate, intentional change. + +--- + +## Key Technical Decisions + +**KTD1 — Frontend display filter, not a backend/store change.** +The agent reply and the card's `task.reply` are the *same string* posted together at finalization, so hiding the chat copy loses no information. Backend or store-level removal would break `buildChatHistory` context continuity and the empty-reply fallback ("(The agent finished without a text response.)"). A render-time filter in `ChatView` is the minimal, reversible change. *(see [runtime.go:406-419](server/internal/agent/runtime.go#L406-L419))* + +**KTD2 — Distinguish agent replies from system notices by author identity, not just `authorType`.** +`postSystemMessage` ([messages.go:297-313](server/internal/handler/messages.go#L297-L313)) posts operational notices with `authorType: "agent"` but a **nil author ID** that matches no session agent. Per the product decision, those notices stay visible. The filter therefore hides a message only when it is `authorType === "agent"` **and** its `authorId` matches a real agent in `session.agents`. Nil-author system notices fall through and remain in chat. This reuses the same author-resolution that `ChatView` already performs. + +**KTD3 — Extract the predicate as a pure, tested function.** +Per the repo convention ([repo.test.ts](src/lib/repo.test.ts) — Vitest-style specs that type-check today and run once a runner is wired), the visibility decision lives in a small pure module with a colocated spec, rather than as an inline `.filter` arrow that can't be tested in isolation. + +--- + +## Implementation Units + +### U1. Pure chat-message visibility predicate + +**Goal:** A pure, testable function that decides whether a message renders in the main chat list — hiding real agent replies while keeping human messages and system notices. + +**Requirements:** Core behavior of the feature; backs KTD2 and KTD3. + +**Dependencies:** none. + +**Files:** +- `src/components/chat/message-visibility.ts` (new) +- `src/components/chat/message-visibility.test.ts` (new) + +**Approach:** +- Export a predicate, e.g. `isVisibleInChat(message, agentIds: Set): boolean`, plus a thin `visibleChatMessages(messages, agentIds)` helper that filters while preserving order. +- A message is hidden iff `message.authorType === "agent" && agentIds.has(message.authorId)`. Everything else (human messages, and `agent`-typed messages whose author is not a session agent — i.e. nil-author system notices) is visible. +- `agentIds` is derived by the caller from `session.agents` (a `Set` of agent IDs). Keep the module free of store/session imports so it stays pure. + +**Patterns to follow:** Pure selector style of [`src/stores/agent-runs.ts`](src/stores/agent-runs.ts#L198-L227); spec style and the "no runner yet" note from [`src/lib/repo.test.ts`](src/lib/repo.test.ts). + +**Test scenarios:** +- Human message (`authorType: "human"`) → visible, regardless of `agentIds` contents. +- Agent reply whose `authorId` is in `agentIds` → hidden. +- System notice (`authorType: "agent"`, `authorId` = nil UUID, not in `agentIds`) → visible. +- Agent-typed message whose `authorId` is not in `agentIds` (defensive: agent removed from session) → visible. +- Empty `agentIds` set → no agent messages are hidden (all visible). +- `visibleChatMessages` on a mixed array → returns only visible messages in original order, drops hidden ones, does not mutate the input. + +**Verification:** Spec cases above pass under `vitest` when a runner is present; `npx tsc --noEmit` clean today. + +--- + +### U2. Apply the filter in ChatView + +**Goal:** Render the chat list from the filtered message set so agent replies no longer appear as bubbles, while inline task cards, header grouping, empty-state, and scrolling stay correct. + +**Requirements:** Delivers the user-visible outcome. + +**Dependencies:** U1. + +**Files:** +- `src/components/chat/ChatView.tsx` + +**Approach:** +- Build `agentIds` once from `session.agents` and derive `visibleMessages = visibleChatMessages(sessionMessages, agentIds)`. +- Drive the message `.map` and the `prevMsg`/`showHeader` grouping ([ChatView.tsx:422-467](src/components/chat/ChatView.tsx#L422-L467)) off `visibleMessages`, so author-change/time-gap headers are computed against the *visible* sequence (no phantom gaps from removed agent bubbles). +- **Keep `cardsByAnchor` and inline `AgentTaskCard` rendering unchanged** — cards are keyed on the *human* anchor message (`anchorMessageId`), which is never filtered, so cards remain anchored under the @mention that spawned them. +- Empty-state check ([ChatView.tsx:400](src/components/chat/ChatView.tsx#L400)) uses `visibleMessages.length`. +- **Auto-scroll:** keep the scroll effect ([ChatView.tsx:335-339](src/components/chat/ChatView.tsx#L335-L339)) triggered on the *raw* `sessionMessages.length` (not `visibleMessages.length`) so an arriving agent reply still scrolls the freshly-updated card into view even though the bubble is hidden. + +**Patterns to follow:** existing `sessionMessages` derivation and the author-lookup already in [ChatView.tsx:431-436](src/components/chat/ChatView.tsx#L431-L436). + +**Test expectation:** none at the unit level — `ChatView` has no component-test harness. Covered by U1's pure-function tests plus the manual verification below. + +**Verification (manual / visual):** +- @mention an agent → the agent's reply no longer appears as a chat bubble; the inline super-thread card under the @mention shows the reply text in its done state. +- A system notice path (e.g. send an agent request while the workspace is starting) → the "Workspace is still starting" notice **still appears** in chat. +- Consecutive human messages still collapse headers correctly (no stray avatars/timestamps from removed agent bubbles). +- A session whose only agent output is hidden still renders its human messages and cards; empty-state shows only when there are genuinely no visible messages. +- A drawer-initiated (anchorless) task still shows its reply in the agent thread drawer `Turn` — this is the intended "inside the thread" surface for tasks with no inline card. +- `npm run lint` and `npx tsc --noEmit` clean. + +--- + +## Verification Strategy + +1. `npx tsc --noEmit` — type safety across the new module and `ChatView` edits. +2. `npm run lint`. +3. U1 spec cases (run under vitest if/when the runner is wired; type-checked now). +4. Manual walkthrough of the U2 verification scenarios against a running dev server (`npm run dev` + backend `make dev`). diff --git a/docs/solutions/architecture-patterns/pi-loads-agent-skills-standard-in-rpc-mode.md b/docs/solutions/architecture-patterns/pi-loads-agent-skills-standard-in-rpc-mode.md new file mode 100644 index 0000000..896a1ce --- /dev/null +++ b/docs/solutions/architecture-patterns/pi-loads-agent-skills-standard-in-rpc-mode.md @@ -0,0 +1,79 @@ +--- +title: Pi loads Agent-Skills-standard skills in --mode rpc; provision them to ~/.pi/agent/skills and invoke via /skill: prompt expansion +date: 2026-06-08 +category: architecture-patterns +module: agent/pirun +problem_type: architecture_pattern +component: agent-harness +severity: medium +applies_when: + - "You want one generic Pi agent whose behavior is modulated by skills/subagents instead of multiple role-specific agent rows" + - "You are deciding whether the compound-engineering plugin (or any Claude Code / Agent Skills bundle) can run on the Pi harness Deuce drives in --mode rpc" + - "You need to know how to provision and invoke skills inside a per-session DevPod container over the existing devpod ssh channel" +related_components: + - agent-harness + - devpod + - workspace-provisioning +tags: + - pi + - pi-rpc + - agent-skills + - skills + - subagents + - compound-engineering + - claude-code-compat + - append-system-prompt +--- + +# Pi loads Agent-Skills-standard skills in `--mode rpc` + +## Context + +Deuce models AI helpers as five role rows in an `agents` table (Coder/Reviewer/Planner/Tester/Designer), but under the Pi harness `role` and `system_prompt` are fetched and never forwarded — Pi runs as one generic coding agent regardless. The open question was whether collapsing to a single "Deuce" agent **modulated by skills and subagents** is viable on the Pi harness Deuce already drives (`pi --mode rpc` launched in each session's DevPod container — see [devpod_launcher.go](../../../server/internal/agent/pirun/devpod_launcher.go)). + +The compound-engineering plugin was exported to Pi's agent-config layout (in `server/tmp/agent/`: `skills//SKILL.md`, `agents/.md`, `AGENTS.md`, `install-manifest.json`). This documents what a spike against the real `pi` 0.74.2 binary established about whether that bundle works in RPC mode. + +## Findings (verified) + +1. **Pi implements the [Agent Skills standard](https://agentskills.io/specification)** — the same standard Claude Code uses. A skill is a directory with a `SKILL.md` (YAML frontmatter `name` + `description`, then markdown instructions). Bundles authored for Claude Code drop in unmodified. + +2. **Skills auto-discover from `~/.pi/agent/skills/`** (Pi's documented global location), which is the exact sibling of `~/.pi/agent/extensions/` where Deuce already provisions `ask-user.ts` via `InstallPiExtension` ([manager.go](../../../server/internal/workspace/manager.go)). Pi can also be pointed at `~/.claude/skills` via the `skills` settings array, or given explicit `--skill ` (additive even with `--no-skills`). No launcher-flag change is required to pick up the global dir. + +3. **`--mode rpc` is an output format, orthogonal to skill loading.** Skills load identically in text and rpc modes. Empirically: driving `pi --mode rpc --skill /skills` and sending `{"type":"get_commands"}` returned all 36 exported skills as `skill:` commands (`source: "skill"`), each invocable as `/skill:`. This needs no API key — loading happens at startup before any model call. + +4. **Two invocation paths, one deterministic:** + - *Model-driven:* at startup Pi injects each skill's name+description into the system prompt (progressive disclosure per the spec); on a matching task the model uses `read` to load the full `SKILL.md`. Models "don't always do this." + - *Deterministic (preferred for Deuce):* send `{"type":"prompt","message":"/skill:ce-commit ..."}` — **skill commands are expanded before the prompt is processed** (Pi rpc.md). Arguments after the command are appended as `User: `. This does not depend on the model choosing to bite. + - `get_commands` (not `get_state`) enumerates available skills/prompts/extension-commands — use it to populate UI or validate `@mention`→skill mapping. `get_state` does **not** list skills. + +5. **`--system-prompt` / `--append-system-prompt` are real launch flags.** A single "Deuce" identity prompt can be set at process launch. (There is still no *mid-session per-task* system-prompt override via an RPC command — per-task specialization rides `/skill:` expansion instead, which is cleaner than prepending prompt text.) + +## Not yet verified + +- **Live model triggering** of the model-driven path needs an API key (absent in the spike env). The deterministic `/skill:` path sidesteps this. +- **Subagents.** The export's `agents/*.md` personas are consumed by the **`pi-subagents`** npm extension (provides the `subagent` tool), *not* Pi core — `AGENTS.md` lists it as required (`pi install npm:pi-subagents`), with `pi-ask-user` recommended (the published equivalent of Deuce's hand-rolled `ask-user.ts`). A `pi-subagents`-spawned subagent forks a child Pi instance **inside the container**, bypassing Deuce's `pirun.Key{SessionID, AgentID}` supervisor, serial queue, and WS `agentId` attribution. Whether subagents run under Pi (cheap, invisible to Deuce) or are orchestrated by Deuce (attributable, more work) is an open design decision. + +## Integration path for Deuce + +1. **Provision skills:** generalize `InstallPiExtension` ([manager.go](../../../server/internal/workspace/manager.go)) to push the `skills/` tree (and later `agents/`) into `~/.pi/agent/skills/` — same `mkdir -p` + base64-over-`devpod ssh` mechanism, a tar instead of a single file. Wire it into `provisionAgentTools` ([workspace.go](../../../server/internal/handler/workspace.go)) beside the ask-user install. No launcher change (auto-discovery). +2. **Invoke:** map `@Coder` → a `prompt` command carrying `/skill:coder`; enumerate with `get_commands` to validate. +3. **Identity:** add `--append-system-prompt` to the launch command ([devpod_launcher.go](../../../server/internal/agent/pirun/devpod_launcher.go)) for the single Deuce persona. + +## Reproduce + +`pi` 0.74.2; bundle at `server/tmp/agent`; harness `server/tmp/spike-pi-skills.sh`: + +```bash +printf '%s\n' '{"type":"get_commands"}' \ + | pi --mode rpc --skill server/tmp/agent/skills --offline --no-session \ + | grep -o '"source":"skill"' | wc -l # => 36 +``` + +Pi's own docs are vendored with the npm package at +`@earendil-works/pi-coding-agent/docs/{skills,rpc,extensions}.md` — authoritative reference for discovery locations and the RPC command surface. + +## Related + +- [[broadening-resource-visibility-requires-per-route-authorization-audit]] — any new "invoke skill" / "spawn subagent" route must carry an explicit auth gate. +- [[embedded-ssh-proxy-for-vscode-remote]] — same host→container provisioning topology; "one identity model, one user table" precedent for attributing Deuce + subagents. +- Ideation: `docs/ideation/2026-06-08-single-deuce-agent-ideation.md`.